Review on Micro-climate Changer with Sensor Broadcasted Data

Prof. Vikas Nandgaonkar, Prof.Prashant Dongare
Pratap Madane, Priyanka Rasal,Aniket Shilimkar, Vaibhav Waghmode

Abstract : micro-environment changer using sensor broadcasted data gives human centric context ( e.g indoor, outdoor, at home/in office, driving/walking)have been extensively researched, few attempts have studied from phones perspective(e.g. on table/sofa, in pocket/bag/hand). We refer to such immediate surroundings as micro-environment, usually several to a dozen of centimeters, around a phone In this study, we design and implement a micro environment sensing platform that automatically records sensor hints and characterize the micro-environment of Smartphone. The platform runs as a daemon process on Smartphone and provide finer-grained environment to upper layer applications via programming interface. micro-environment is unified framework converting the major cases of phone usage, placement , attitude. And interaction in practical uses with complicated user habits. As a long-term running middleware, considers both energy consumption and user friendship. We prototype micro-environment on Android OS. The preliminary results show that -environment changer using sensor broadcasted data achieves low energy cost, rapid system deployment, and competitive sensing accuracy.
Keywords— Sherlock, broadcasted data, Proximity Sensor, web sensing, micro -environment of smartphones.
I. Introduction
Now days the use of mobile phone is increasing rapidly According to the version of mobile different sensors are inbuilt. The Smartphone has many inbuilt sensors like GPS, Proximity, Accelerometer, Gyro scope, Magnetic etc. using this sensors we develop many applications for different purpose.
In Smartphone sensors continuously broadcasted data. We will be developing various applications using that data for security as well as for saving the battery of mobile. Sherlock is a unified framework covering the major cases of phone usage, placement, attitude, and interaction in practical uses with complicated user habits. We prototype
Sherlock on Android OS and systematically evaluate its performance with data collected. Sherlock achieves low energy cost, rapid system deployment, and competitive sensing accuracy. Sherlock runs as a daemon process. Most context-sensible applications are human-centric, recognizing contexts from users point of view e.g., indoor/outdoor[9] , at home/in office, driving/walking[2] .Such information provide services according to user situation. For example, if a mobile phone is in a bag or pocket, it is useless to light up the screen when a phone call is coming. In addition, if a phone is placed on a sofa rather than on a desk, it is better to turn-up ring volume to avoid missing calls. Given accurate micro-environment Information, a phone can adapt its behaviour automatically and properly. when a mobile phone detects if its user is holding it in hand for safety .[2] When a user enters a building, it is unnecessary to keep his phone’sGPS[10] working to save energy.
II. NTRODUCTION TO SMART PHONE
Smartphone has open operating systems, such as Windows Mobile, Symbian, and Linux and scalable hard ware-software multi-function. Mobile phones and other wireless devices are becoming increasingly popular and that world expanded tremendously. [5]With the development of information technology, Smartphone have become the mainstream in the mobile market and have gradually occupied the market steadily. For getting new features traditional phones are replaced by Smart phones. Smartphone has several advantages over the traditional mobile phones: Keep full functionality of the traditional mobile phones (e.g. phone conversation, text message and so on). With the ability of plugging into the Internet . It is a kind of cell phone which includes personal information manager, schedule control, multimedia application and internet connection.[13]
A Android features
Reuse and replacement of components
Integrated browser
Optimized graphics
Media support
GSM Telephony
Bluetooth, EDGE, 3G, and WiFi .
III.ROPOSED SYSTEM
A. System Overview:

1) Input:
Microenvironment also known as a micro habitat, a very small, specific area in a habitat, distinguished from its immediate surroundings by factors such as the amount of incident light, the degree of moisture, and the range of temperatures. In our system there are different micro – environments such as phone placement, pattern recognition, pressure on touch screen, [5] phone interaction etc. are sensing by sensors.
2) Processing:
There are 3 steps

Smart phone sensors
Action listener
Business Logic

Smart phone which contains many built in sensors
these sensors are Magnetic sensor, Camera and GPS, Pressure sensor, Proximity sensor etc. [2] These sensors sense the provided input environment and sends sensing data to Action listener which triggers actions. These actions are processing in Business Logic. Applications extract data from business logic and generate output according to applications[5].
3) Output:
From the input and processing blocks various applications will be generated such as No vibration and increasing ringtone volume, [7] Theft detection, women’s security and Automatic call acceptance.
B. System Architecture:
Hardware layer is lower layer. It consists of all sensors which are used in Smartphone. There are different types of sensors like Accelerometer, Camera, [1][2] Proximity, Gyroscope etc.The sensor continuously broadcast the data and captures the mobile environment and gives captured data as input to upper layer i.e. middleware layer. According to the data received from the hardware layer it detects the behavior of the user and performs action accordingly.[7][15]
There are two types of detection.
Phone Placement:- [9] This detection detects the placement of the mobile. It detects where the mobile is placed in hand, in pocket, on desk etc. [11]Phone interaction detection:- This detection detects whether the user is interacting or not. The interaction can be receiving call,[3][6] browsing. Backing material detection:-This detection detects the backing material of the phone where it is placed. The material can be glass, wood, leather. pressersensor, gyroscope etc. are behind these Smartphone’s. Let us understand how each sensor works with respect to its operating principle.
1) Proximity Sensor:
[4]The main function of this proximity sensor is to detect how close your Smartphone’s screen is to your bod y. [9] When you use your Smartphone, it detects the position of ear with respect to screen and turn s off the light of screen and saves battery. Also proximity sensor stops the accidental touch, unwanted input during talk. [5] These sensors also detect the signal strength, interference sources and amplify or filter by use of Beam Forming Technique.

Fig2: System Architecture[17]
According to detection pattern output from the middleware layer is given as input to upper layer i.e. application layer. From the input the application layer performs the actions[13][14].
C. Sensors:

[16]
D. Introduction to Sensors:
Since the beginning of race in mobile communication, a new model is being launched every day into the world with different features. These new features and specifications gain enough fame of users to survive in the competition of mobile technology. [10]Today different manufacturers like Samsung, Apple, Sony, HTC and many more manufactures of smart phones and became competitors. [13][11]One of the features that attract the mobile phone buyer is the smart work it does. Different types of sensors like accelerometer, ambient light sensor, GPS sensor, compass, proximity[11].
2) GPS (Global Positioning System) sensor:
GPS short form of Global Positioning System, originally developed and setup for military operations and was made available for everyone in 1980s by government[8].
3) Ambient Light Sensor:
This sensor optimizes the light of screen when it exposed to normal light with different intensity. [3] Ultimate function of ambient light sensor is to adjust the display brightness, which at the end saves the battery power and life too.
4) Accelerometer
The main function of accelerometer rise to sense the changes in the orientation of[13][11] Smartphone with respect to datum and adjust the orientation to suits the viewing angle of operator. For example, when you are looking for web-page with increased width, you can get this landscape view from changing the orientation of phone to horizontal.[11][8] These features are then utilized to determine whether the phone is in motion. There are plenty of moving detection schemes that can successfully vibration patterns: 1) the phone’s mechanical motion and 2) the acoustical features, which can be captured by embedded accelerometer and microphone, respectively.
To this end, Sherlock extracts a series of lightweight features from acceleration/acoustic traces in both time and frequency domain, and classifies backing materials like leather chair, wood desk or glass table.
5) Gyros or Gyroscope:
This function is to maintain and control the sensor’s This position, level or orientation based on the principle of angular momentum.[6] When ‘Gyros’ used along with senses motion from six axes i.e. right, left, accelerometer up, down, forward and backward.
CONCLUSION
In this paper we present the design, implementation and evaluation of Sherlock simple yet practical platform for micro-environment sensing for Smartphone via collaboration among built-in sensors.[11] The platform automatically collects sensor hints and characterizes the immediate surroundings of Smartphone at centimeter level accuracy, providing fine-grained environment in formation to upper layer application.
 
REFERENCES
[1] J. Yang, S. Sdhom, G. Chandrasekaran, T. Vu, H. Liu, N. Cecan, Y. Chen, M. Gruteser and R. Martin, Detecting Driver Phone Use Leveraging Car Speakers. In MOBICOM’11, 2011.
[2] S. Nath. ACE: Exploiting Correlation for Energy-Efficient and Continuous Context Sensing. In MobiSys’12, 2012.
[3] T. Yan, D. Chu, D. Ganesan, A. Kansal, and J. Liu. Fast app launching for mobile devices using predictive user context. In MobiSys’12, 2012.
[4] C. Qin, X. Bao, R. Roy Choudhury, and S. Nelakuditi. Tagsense: a smartphone-based approach to automatic image tagging. In MobiSys’11, 2011.
[5] H. Lu, W. Pan, N. D. Lane, T. Choudhury, and A. T. Campbell. Soundsense: scalable sound sensing for people-centric applications on mobile phones. In MobiSys’09, 2009.
[6] H. Lu, J. Yang, Z. Liu, N. D. Lane, T. Choudhury, and A. T. Campbell. The jigsaw continuous sensing engine for mobile phone applications. In SenSys’10, 2010.
[7] M. Azizyan, I. Constandache, and R. Choudhury. SurroundSense: Mobile phone localization via ambience fingerprinting. In MOBICOM’ 09, 2009.
[8] A. Rai, K. Chintalapudi, V. Padmanabhan, and R. Sen. Zee: Zero-Effort Crowdsourcing for Indoor Localization. In MOBICOM’12, 2012.
[9] P. Zhou, Y. Zheng, Z. Li, M. Li, and G. Shen. IODetector: A Generic Service for Indoor Outdoor Detection. In SenSys’12, 2012.
[10] X. Zhu, Q. Li, G. Chen. APT: Accurate Outdoor Pedestrian Tracking with Smartphones. In INFOCOM’13, 2013.
[11] P. Mohan, V. Padmanabhan, and R. Ramjee. Rich Monitoring of Roads and Traffic Using Mobile Smartphones. In SenSys’08, 2008.
[12] A. Thiagarajan, L. Ravindranath, K. LaCurts, S. Madden, H. Balakrishnan, S. Toledo, and J. Eriksson. Vtrack: accurate, energyaware road traffic delay estimation using mobile phones. In Sen-Sys’09, 2009.
[13] C. Tacconi, S. Mellone, L. Chiari. Smartphone-based applications for investigating falls and mobility. In PervasiveHealth’11, 2011.
[14] J. Dai, X. Bai, Z. Yang, Z. Shen, D. Xuan. PerFallD: A Pervasive Fall Detection System Using Mobile Phones. In PervasiveHealth’10, 2010.
[15] S. Salvador, P. Chan, Toward accurate dynamic time warping in linear time and space, In Journal Intelligent Data Analysis, 2007.
[16] web references,www.digikey.com/
[17] Sherlock: Micro-environment Sensing for Smartphones
Zheng Yang, Member, IEEE
 

Analyzing of Economic Data Using Big Data

N.Rajanikumar, Dr.A.Suresh babu, Mr.G.Murali

 
Abstract: Big data can help at the e commerce data. The big-picture problems, the economic indicator many investors, business fortunate and judges are rely on are just too outdated by the time they’re out. People “pitch to the number,” but the world has often moved since it was considered and they won’t know it until the next report comes out. Take, for example, the case of increasing food prices in India and China that are pouring up price rises for a major percentage of the world’s residents. But principle claims to have been seeing the movement shaping up for weeks. Premise is able to capture economic data in close to real-time in some cases or at least much closer to it in others thanks to the technology trifecta of e-commerce, cloud computing and Smartphone’s. However, while e-commerce data is supportive for gauging the prices of certain goods in certain economies, it doesn’t really touch emerging economies where the vast popular of transactions are still local and cash-based. If groceries prices are rising across Asia, for example, that likely income, along with other things, inferior health and less money to spend on non-essential end user goods. That’s where mobile devices come into play in the form of Premise’s Android host. The company has more than 700 contributors in 25 cities, mostly in Asia and Latin America, who go into stores and markets and capture data about exact items on which Premise desires data. “We use them as sort of detection agents. The contributors take a picture of the item either on the shelf or in a market stall; it syncs with Premise’s servers in the blur; and Premise’s system is then able to extract information from the photos. It can verify information such as price, brand and quality of the items, and even ecological information such as how clean the store is and how stocked the shelves are. Interestingly, but not without warning, the app that contributors use is only for Android phones.
Keywords: Apache Hadoop API Using HDFS, Mapreduce, Pig, Hive, Linux-Unix, windows,Eclips.
1. INTRODUCTION
This paper mainly focuses on how to manage huge amount of data and how to analyse the data. The technology used for this is hadoop technology . In this project the data taken is Economic data from various E-commerce websites. Then the data is stored into HDFS( hadoop distributed file system) format in the form of clusters. After the storage is done, then the processing of data can be done based on the user requirements. The processing of data can be done using many modes. Hadoop basically contains many ecosystems which provide different ways of processing or analyzing the data in different environments. There are two basic methods of Hadoop are HDFS and MapReduce. HDFS is used to stock up the data and MapReduce is used to progression the data. In MapReduce we write codes in java to analyze the data in whatever way we want to. The ecosystems in Hadoop are also for processing and analyzing the data. The different ecosystems of hadoop are pig, hive, chukwa, HBase, ZooKeeper, sqoop etc. Here pig, hive and sqoop have been implemented. So the first ecosystem implemented is pig. Pig is scripting language. It can process both structured and unstructured data. In this pig scripts are written on the data to get results. Then hive is a query language, it can handle only structured data. In this queries are written on to the data to analyze it. Then finally sqoop, it is actually a support for hadoop rather than an ecosystem. It is used to transfer data from one data base to other. And after the processing of data the results are displayed.
2. What Is Big Data?
Big Data refers to the data sets whose size makes it difficult for commonly used data capturing software tools to interpret, manage, and process them within a reasonable time frame. Big data sizes are a continually moving target, as of 2012 ranging from a few dozen TERABYTES to many PETABYTES of data in a single data set. With this difficulty, new platforms of “big data” tools are being developed to handle various aspects of big quantities of data.
BIG DATA concept means a datasets which continues to grow so much it difficult to manage it using existing database management concept and tools. The difficulty can be related to retrieve the capturing of data, storage, searching and virtualization, etc.
The challenges associated with Big Data are the “4 V’s”:
Volume, velocity, Variety, and value.
The Volume challenges exist because most businesses generate much more then what their system were designed to handle.
The velocity challenge exists if company’s data analysis or data storage runs slower than its data generation.
The variety challenge exists because of the need to process difference types of data to produce the desired insights.
The value challenge applies to deriving valuable insights from data, which is the most important of all V’s in my view.

Fig1. 4V’s of Big Data
3. What is E-Commerce?
A type of trade model, or part of a larger business model, that enables a firm or individual to perform business over an electronic network, typically the internet. Electronic commerce operates in all four of the major market segments: business to business, business to consumer, consumer to consumer and consumer to business. It can be thought of as a more advanced form of mail-order purchasing through a catalogue. Almost any product or service can be offered via ecommerce, from books and music to financial services and plane tickets. Investopedia explains ‘Electronic Commerce: e-commerce’
E-commerce has approved firms to set up a market existence, or to improve an active market spot, by providing a cheaper and more capable distribution chain for their products or services.
4. Why Big Data is a must in ecommerce
The buzz nearby Big Data is far away from being needless. Not only does it permit merchants to gain deeper insights into customer behavior and industry trends, but it also lets them make more precise decisions to improve just about every feature of the business, from selling and publicity, to merchandising, operations, and even customer maintenance.
Below are a few more points that deeper explain the impacts of Big data in the Ecommerce empire. From improving customer familiarity to developing better products or marketing campaigns, it’s no question that Big Data is the next big thing for online businesses.
5. Characteristics of Big Data
A Big data proposal can give a solution which is planned specifically with the needs of the venture.
The following are the basic characters of the Big data:

Comprehensive – It should offer a broad platform, and address all three dimensions velocity, volume and variety.
Enterprise Ready – It should include the performance, reliability, performance and security features.
Integrated – It should enable integration with information supply chain including databases, data warehouses and business intelligence applications.
Open Source Based — It should be open source technology with enterprise class functionality.
Low latency.
Robust and reliability.
Scalability.
Extensibility.
Allows adhoc queries.
Minimal Maintenance.

6. BIG DATA OFFERS
There are many vendors offering BIG DATA Analytics are IBM, KOGNITO, etc. Here in this paper I have discussed about the IBM Platform.

Fig -2: IBM Platform of BIG DATA
7. Big Data Challenges
There are focal challenges of BIG DATA are data variety, velocity, volume and analytical workload intricacy
More number of organizations is belligerent to compact with many problems with the large amount of data. In order to solve this problem, the organizations need to ease the amount of data being stored and develop new storage techniques which can improve storage use.
8. Uses of Big Data for Online Retailers
Most minute merchants’ think that Big Data analysis is for well-built companies. In fact, it is essential for minute businesses, too, as they attempt to partake with the larger ones. This becomes even more important as online retailers proceed together with their customers in real time. Note, however, that management large sets of data can increase a site’s load time. A slow site troubles every aspect of the shopping procedure.
Here are six uses of Big Data for online retailers.
Personalization, Dynamic pricing, Customer service, Managing fraud, Supply chain visibility,Predictive analytics.
‘Big Data’ and e-commerce

Tuesday 25 September 2012
9. Conclusion
The expansion of information particularly of unstructured dataposes a special challengeas the volumeand diversity ofdata. One of the most promise technologies is the Apache Hadoop and Map Reduce structure for dealing with this big data problem.
Big Data is a popular trend in business and in marketing. The concept can indicate different things to different businesses. For ecommerce, retailers should seek to use Big Data to collect big information, if you will, that may be used to make better marketing decisions,.
10. REFERENCES
[1] Ecommerce.about.com
[2] bloomreach.com/2012/05/ecommerce-challenges-that-can-be-solved-by-hadoop-and-big-data-apps/
[2] Ziff Davis, “E-Commerce.” Software World, 2003, vol. 30,pp. 207-212.
[3] X. J. Tong, W. Jiang, “Research of Secure System of Electronic Commerce Based on Mix Encryption,” Microprocessors, 2006, vol. 4, pp. 44-47.
[4] S. H. Qing, Cryptography and Computer Network Security. Beijing: Tsinghua University Press, 2001.
[5] Y. P. Hu, Y. Q. Zhang, Symmetric Cryptography. Beijing: Machinery Industry Press, 2002.
[6] S. Z. Guan. Public Key Infrastructure PKI and Certification Authority. Beijing: Publishing House of Electronics Industry, 2002.
 

Cambridge Analytica Data Scandal: Quality Management

Executive Summary
This report is prepared to study the issue Facebook recently faces. It is in news because of the Cambridge Analytica data scandal in which the personal information of Facebook users have been improperly shared with Cambridge Analytica – a data mining and political strategy firm. When the scandal exposed the CEO and Chairman of Facebook, Mark Zuckerberg had to apologize publically for the data breach and said that it was a mistake made by Facebook for not designing a process to restrict third party developers to work on Facebook API. He also pledged to make changes in the design and reform the privacy policy. This study gives an understanding of loopholes in Facebook’s quality management system and how it could have been prevented if they have followed the theories of quality management gurus. By understanding different theories described by these gurus, a strong quality management system can be placed from the design stage itself. It also states that customer loyalty is a very important value which can be gained by continuous improvement in quality management system. A poor system result in loss of company reputation, customers and monetary value.
1.0 Introduction
Facebook is an American social media company providing social networking services to people around the world. It was founded in 2004. Mark Zuckerberg is the Chairman and CEO of the company. It has more than 2.2 billion active users. People use Facebook to stay connected to their friends and family and to share and express their views.
2.0 Issue of data breach
Recently Facebook’s data privacy scandal
came into limelight where Facebook members’ data were improperly shared with Cambridge Analytica, a data mining and political
strategy firm.  These data were accessed
during Donald Trump’s presidential campaign. Cambridge Analytica accessed the
data for more than 2 years. This is the biggest public relation crisis Facebook
has faced.
In April 2010, Facebook launched a
platform called Open Graph to third party apps. This allowed the external
developers to reach out to Facebook users and request permission to access
their personal data (CNBC, 2018).
In the year 2013, Cambridge University’s
researcher named Aleksandr Kogan created an App called “thisisyourdigitallife”. 
The app prompted users to answer questions for a psychological profile. About
270,000 people downloaded the ‘app’ and shared their personal information. This
gave Aleksandr Kogan to access data of not only Facebook users but also of the
users’ friends. These information were shared with Cambridge Analytica and used
to know about the personality of the people and to effectively target political
advertising on people. Cambridge Analytica obtained the information in total
violation of Facebook’s rule and didn’t tell anybody that the data will be used
for political campaigning.  (Casey, 2018)

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

In the same year Facebook was made aware
about this violation of accessing data of not only who installed this app but
also of their friends. Facebook demanded Cambridge Analytica to delete all the
data and they agreed to delete all the data. Aleksandr Kogan in reality never
deleted the data and later on Facebook never investigated whether they have
deleted the data as promised (Casey, 2018).
In 2014 Facebook changed their rules for
external developers and restricted them from accessing user’s friends’ data
without taking permission from them.
With the approaching 2016 Presidential
elections, Cambridge Analytica did not have time to create its own data for
election campaign. It went to Aleksandr Kogan who created Facebook app that
paid users to take a personality test (The Guardian,
2018).
In the year 2016 “The Guardian” reported that
Cambridge Analytica is helping Ted Cruz’s presidential election by sharing
psychological data based on their previous research. Facebook waited for more
than two years before suspending Cambridge Analytica even after knowing about
the data breach.  
In mid – March 2018, this scandal was
exposed by The Guardian and The New York Times.
Facebook admitted that it did not read the
terms of the app that accessed the data of 87 million people and apologised for
the “breach of trust”.  Facebook’s CTO
Mike Schroepfer told U.K. lawmakers that Facebook did not notify the U.K.’s data protection
watchdog after it learned of the sharing of data with Cambridge Analytica and
it was their mistake. (Ryan, 2018).
According to U.K.’s data protection law,
sale or use of personal data without user’s consent is banned. In 2011, after
Federal Trade Commission complaint, Facebook agreed to get clear consent from
the users before sharing their data. The FTC now started investing whether
Facebook violated privacy protection of their users. The U.S.A. and U.K.
lawmakers are investing in their own way. Mark Zuckerberg apologized on behalf
of Facebook by publishing a personal letter in all major newspapers and make
changes and reform the privacy policy to prevent such kind of breaches.
By doing this Facebook has breached the trust
of users and privacy policy law.  A
customer or user shares information with a company trusting that personal
details are safe. A company’s name and reputation makes people to trust on
them.  Quality of the brand is very
important in building and growing a company.
Facebook is a very well know networking
site and it has monopoly in the market. People joined Facebook and disclosed
their personal details knowing that whatever information they share about
themselves and their friends will be confidential and will not be disclosed
anywhere outside without their consent.
3.0 Referring to statements made by Quality management Gurus
It is very important for a company to have
a well defined quality management system in place. For a company like Facebook,
where the personal data of people are at risk, there is a continuous risk of
hijacking the data for misuse.
How a good quality management system can
be placed, has been described by many Quality management gurus. Mentioned below
are some of the points stated by these gurus:
According to quality management guru W. Edwards Deming there are seven deadly diseases that are described as barriers in understanding the basic quality management system statistical principals. One of the diseases says that – A company runs on visible figures only. Deming argued that apart from the visible figures there are many costs and figures that are not know and cannot be calculated.  Customer loyalties gained as a result of continuous quality improvement are the numbers that are unknowable and management has to consider this (Deming, 2012).
It is also very important to gain confidence of the customers by building trust. In case of Facebook data breach scandal, it was very important for the company to monitor and improve the system. Once Aleksandr Kogan accessed the data of the Facebook users and their friends, it was important for Mark Zuckerberg to monitor and improve the system putting a barrier for third party developers to access data. This would have resulted in maintaining the confidentiality of the users’ data.
Based on the quality management guru J M Juran’s trilogy, it is very important to plan, improve and control quality.
Quality Planning – A proper quality plan
should be in place. This involves creating a process that will be able to meet
the goals .Once the process is in place it will not be difficult to respond to
customer needs.
Quality Improvement – It is important to continuously
improve the quality and run the process with optimal effectiveness.
Quality Control – To control and maintain good quality it is important to create a process that required minimal monitoring. This will help in running the operations in accordance with quality plan (Juran, 1986).
Facebook should have
created a process to maintain the privacy of their users’ information. This
process should have barred the third party developers to run their app in
Facebook API.
Quality guru Genichi Taguchi emphasized on improving the quality of the product and process at the design stage rather than achieving quality through inspection. Taguchi also developed a concept of quality loss and worked on it rather than just quality.  He defined quality loss as loss to the company cost such a reworking on design, scrapping and maintenance and also loss to customer through poor product or service and low reliability. (Taguchi, n.d)
After the scandal exposed, Facebook has faced huge loss in terms of its reputation, breaking customers’ trust and monetary value. Many users deleted their Facebook    account feeling that their personal information are not secured and can be misused by the company.         
4.0 Conclusion
It is very important for a company to develop a quality system at its designing stage and to control and improve the quality system with minimal inspection requirement. It is also important to know and understand the unknown costs and figures like customer loyalty that can be gained by continuously improving quality. A proper system should be in place with zero defects. A poor quality management system will result in loss of reputation, customers and monetary value.
5.0 References:
(2018, March). Retrieved from The Guardian:
https://www.theguardian.com/commentisfree/2018/mar/19/facebook-data-cambridge-analytica-privacy-breach
(2018,
April). Retrieved from CNBC: https://www.cnbc.com/2018/04/10/facebook-cambridge-analytica-a-timeline-of-the-data-hijacking-scandal.html
Casey, N.
(2018, April). Retrieved from
https://www.theverge.com/2018/4/10/17165130/facebook-cambridge-analytica-scandal
Deming, E.
(2012, Janauary). Retrieved from https://www.qualitymag.com/articles/88324-quality-management-2-0-deming-s-7-deadly-diseases-of-management
Juran, J.
(1986, May). Retrieved from The Quality Trilogy:
http://app.ihi.org/FacultyDocuments/Events/Event-2930/Presentation-16071/Document-12762/Tools_Resource_C7_Juran_trilogy1.pdf
Ryan, B.
(2018, April). Retrieved from CNBC:
https://www.cnbc.com/2018/04/26/facebook-cto-admits-firm-didnt-read-terms-of-aleksandr-kogans-app.html
Taguchi, G.
(n.d). Retrieved from British Library: https://www.bl.uk/people/genichi-taguchi
 

Analysis of Tools for Data Cleaning and Quality Management

Data cleaning is needed in process of combining heterogeneous data sources with relation or tables in databases. Data cleaning or data cleansing or data scrubbing is defined as removing and detecting errors along with ambiguities existing in files, log tables. It is done with the aim to improve quality of data. Data quality and data cleaning are both related terms. Both are directly proportional to each other. If data is cleansed timely then quality of data will get improved day by day. There are various data cleaning tools that are freely available on net. The tools include Winpure Clean and Match, OpenRefine, Wrangler, Data cleaner and many more. The thesis presents information about WinPure Clean and Match data cleaning tool, its benefits and applications in running environment due to its three filtered mechanism of cleaning data. Its implementation has been done by taking user defined database and results are presented in this chapter.
WinPure Clean and Match
It is one of easiest and simplest three phase filtered cleaning tool to perform data cleansing and data de-duplication. It is designed in such a way that running this application saves time and money. The main benefit of this tool is that we can import two tables or lists at same time. The software uses fuzzy matching algorithm technique for performing powerful data de-duplication. The functions of this tool are as follows:

Removes redundant data from databases in faster way.
Correct misspellings and incorrect email addresses. It also converts words to uppercase or lowercase depending on user’s demand.
Removes unwanted punctuation and spelling errors.
Helps to relocate missing data and gives statistics in form of 3D chart. This option can be proven useful in finding population percentage of particular area.
It automatically capitalizes first alphabet of every word.

Advantages

Increases accuracy and utilization of database (either professional database, user defined database or consumer database).
Eliminate duplicity from databases using fuzzy matching de-duplication technique.
Increases industry perspectives by using standard naming conventions with facility of removing duplicate data from original data.
Export given file into various formats like access, excel(95), excel (2007), outlook systems etc.

Applications

The software is made for use from normal users to IT professionals. It is ideal for marketing, banking, universities and various IT organizations.

Working of WinPure Clean and Match
Clean and Match is made of three components- Data, Clean and Match. Data gives us imported list of tables. Clean option consists of seven modules each having different purposes. The clean section is basically used to analyze, clean, correct and correctly populate given table without removing duplicity. It has separate cleansing modules like Statistics Module, Case converter, Text cleaner, Column cleaner, E-mail cleaner, column splitter and column merger.
Match section is used to detect duplicity using fuzzy matching de-duplication technique. WinPure Clean and Match contains a unique 3 step approach for finding duplications in given list or database.
Step 1: The first step is to specify which table/s and columns you would like to use to search for possible duplications.
Step 2: The second step is to specify which matching technique you would like to use either basic (telephone numbers, emails, etc) or advanced de-duplication with or without fuzzy matching (names, addresses, etc.
Step 3: The final step is to specify which viewing screen you would like to use, WinPure Clean & Match offers two unique viewing screens for managing the duplicated records.
Limitations of WinPure Clean and Match
(a) It has nothing to deal with connectivity and networking of dataset. It simply removes redundant words by cleaning and matching data.
(b) It is not derived from any expert systems like Simile Longwell CSI and lacks client server terminology.
(c) It means modifying/updating dataset is not possible once data is imported in tool.
Google Refine
Google refine overcomes the limitations of WinPure Clean and Match. It was earlier called as OpenRefine. It is powerful tool for working with dirty data and cleans, transforms data along with various services to link it to databases like Freebase. OpenRefine understands a variety of data file formats. Currently, it tries to guess the format based on the file extension. For example,.xmlfiles are of course in XML. By default, an unknown file extension is assumed to be either tab-separated value (TSV) or comma-separated value (CSV).
Once imported, the data is stored in OpenRefine’s own format, and original data file is left undisturbed.
Google Refine Architecture
OpenRefine is a web application that is intended to be run on one’s own machine and used by oneself. The machine has server as well as client side. The server-side maintains states of the data (undo/redo history, long-running processes, etc.) while the client-side maintains states of the user interface (facets and their selections, view pagination, etc.). The client-side makes GET and POST Ajax calls to modify and fetch data related information from server side.
The architecture has come into existence from expert systems like Simile Long well CSI, a faceted browser for RDF data. It provides a good separation of concerns (data vs. Universal interface) and also makes it quick and easy to implement user interface features using familiar web technologies.

5.6. Using Data Quality Services in connecting databases
This section is to provide high quality data by introducing data quality services (DQS) in Microsoft SQL Server. The data-quality solution provided by Data Quality Services (DQS) enables an IT professional to maintain the quality of their data and ensure that the data is suited for its business usage. DQS is a knowledge-driven solution that provides both computer-assisted and interactive ways to manage the integrity and quality of your data sources. DQS enables you to discover, build, and manage knowledge about your data. You can then use that knowledge to perform data cleansing, matching, and profiling.It is based on building of knowledge base or test bed to identify the quality of data as well as correcting bad quality of data. Data Quality Services is a very important concept of SQL Server.
Utilisation of data cleaning and quality phases
The process of data cleaning starts from the starting phase when user chooses data from random dataset from internet or some books. A framework showing utility of these processes is described below in form of sequential steps listed below:
Step 1) Choose random dataset
Step 2) Shorten it as per user requirements
Step 3) Find whether data contains dirty bits or not.
Step 4) Cleanse data by testing it on application platforms like WinPure Clean and Match and Google Refine.
Step 5) Then the task of creating high quality data is initiated.
Step 6) Connect refined database with SQL server.
Step7) Install Data Quality Services (DQS).
Step 8) Knowledge base is built through DQS interface.
Step 9) After building database, process of knowledge discovery has been started.
Step 10) In knowledge discovery process, normalization of string values has been done to replace incorrect spellings and errors.
Step 11) It leads to production of high quality data by removing dirty bits of data.
Shortcomings of the existing tools

WinPure Clean and Match simply clean data by removing redundant words. It does not give information about synonyms and homophones.
This data cleaning tool produces moderate correctness level. The tool only gives details of incorrect words and matched words instead of removing similar words. It leads to wastage of memory and less accuracy.
Data Quality Services (DQS) is somewhat complex for non technical users. A normal person cannot use this quality software without having knowledge of databases.
DQS improves data quality with human intervention. If user selects correct spelling of given word, then DQS approves it else reject it.
There is no automatic system for detection of strings and synonyms. One has to create set up of SQL in machine to use it.
Both tools work syntactically rather than semantically. That is the reason they are unable to find synonyms.
These tools corrects given data according to predefined syntaxes like spelling errors, omitting commas etc.

Keeping the above shortcomings in consideration, the study has proposed data cleaning algorithm by using String detection Matching technique via WordNet.
 

Design of Manchester Serial Data Communications Channel

The Design of Manchester Serial Data Communications Channel Based on Vivado (Systemverilog)
Abstract
As the explosive growth of wireless communication system and also with the proliferation of laptop and palmtop computers, the requirement of high quality data communication channel is also growing rapidly. By transforming line voltage frequently and proportioning to the clock rate, the Manchester coding is able to help recover the clock and data. It is now widely used in many domains.
This project studies the function of the clock divider, the pseudo random bit sequence generator (PRBSG), the shift register and the finite state machine (FSM), then comprise them together into a Manchester serial data communications channel. It is used for recovering clock signal from the encoded data.
The further application is setting up a bit error rate (BER) tester to detect the condition of the whole system. If the bit error rate (BER) is high, which means the whole system is not integrated; if low, the integrality of the system is great.
 
1.1 Background
In modern life, wireless communication develops rapidly in many aspects, especially in the communication industry. So, it has achieved lots of attention from media and public. The development of cellular phones is also swift and violent. During the whole world, the cellular phones have experienced geometric growth over the last decade and the number of cellular phone users will grows up to a billion in the foreseeable future. In fact, by replacing out-dated wireless systems, cellular phones are becoming much more widely used, and they have already played a very important role in business domain, also the indispensable part of everyday life. Besides, wired networks in many businesses and campuses are now replaced or supplemented by local area wireless networks for officers and students to use it more convenient. Numbers of new applications such as wireless sensor networks, smart homes and appliances, automated highways and factories and remote telemedicine, are becoming reality, which is a huge improvement of technology. The conditions such as the explosive growth of wireless systems and the proliferation of laptop and palmtop computers show a bright future of wireless networks, not only in independent systems but also in larger networking infrastructure. However, in order to support the required performance of emerging applications, it is quite challenging to design, analysis and solve any problems that occurs in wireless networks.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

With the development of wireless communication system, Manchester encoding is widely used. Due to its development at the University of Manchester, it is known as a synchronous clock encoding technique that used by the physical layer for encoding the clock and data of a synchronous bit stream. At the very first beginning, it was used to save data on the magnetic drum of the Manchester Mark one computer. In Manchester code, the binary data that need to be transmitted over the cable will not sent as a sequence of logic “0” and “1”, which is also called Non Return to Zero (NRZ). However, if the bits are transformed into different format, then it will have lots of advantages than the straight binary form only like Non Return to Zero (NRZ).
However, in digital transmission, noise, interference, distortion or bit synchronization errors are the main factors that affect the number of bit errors. Every time when transmitting data though a data link, there is a possibility of errors being introduced into the system. If errors are introduced into the data, which means the signal will be interfered, and the system would not be integrated. So for this situation, it is necessary to assess the performance of the system, and bit error rate (BER) provides an ideal way to achieve the requirements. The bit error rate (BER) is the number of bit errors that occurs every unit time, and the bit error ratio (BER) is defined as the number of bit errors that occurs divided by the total number of transferred bits during a controllable study time period. It is a unit less performance measure, which is always expressed in percentage form. Bit error rate (BER) assesses the full end to end performance of a system which includes the transmitter, receiver and the medium between the two. Due to this situation, bit error rate (BER) enables to test the actual performance of an operating system. It is different from other forms of assessment and works in a much better way.
1.2 Objectives

Figure 1 The Diagram of Physical Components Connection
The figure 1 above shows the components used in the system and the connection configuration of the system. First of all, the signal generator outputs data into the T junction chip (signal emitter) which includes the clock divider, prbsgen and the logic xor gate. Secondly, the data will be transmitted through the transmission channel which is made up by two vertical metal bars, one is LED light for transmitting data, the other is light sensor for receiving data. Thirdly, the data will be sent into the signal analysis and recover part which includes the shift register, pattdet and fsm components. Finally, the recovered signal and original data both will be sent into the oscilloscope to check the difference and make sure if the result is satisfied.
The objective of this project is setting up a Manchester serial data communications channel based on the vivado operation system which using System Verilog language to match this physical system. It can be used as radio channel, bit error rate tester and etc. In this project, the application of the system is designed as a bit error rate (BER) tester. As shown in figure 2 below is the design of bit error rate tester. During a complete simulation period, once received the number of errors that occurs and total number of bits that sent, then the bit error rate is available.

Figure 2 Bit Error Rate Tester Design
1.3 Theory
a. Bit Error Rate
Bit error rate (BER) is a key parameter that is used for transmitting digital data from one location to another in assessing systems. It is widely used to monitor the state of digital signal in different applications, such as radio data links, fibre optic data systems, Ethernet and those who transmit data through some form of networks. Generally, it affected by noise, interference and phase jitter.
Although these systems work in different ways, and have disparate impact on the bit error rate, the basics of bit error rate are still the same.
Every time when transmitting data though a data link, there is a possibility of errors being introduced into the system. If errors are introduced into the data, which means the signal will be interfered, and the system would not be integrated. So for this situation, it is necessary to assess the performance of the system, and bit error rate (BER) provides an ideal way to achieve the requirements.
Bit error rate (BER) assesses the full end to end performance of a system including the transmitter, receiver and the medium between the two. Because of this, bit error rate (BER) enables to test the actual performance of an operating system. It is different from other forms of assessment and works in a much better way.
Bit error rate (BER) is defined as the rate at which errors occur in a transmission system. It can be translated into the number of errors that occur in a string of a stated number of bits directly. The definition of bit error rate in simple formula is:

If the medium between the transmitter and receiver is good and the signal to noise ratio (SNR) is high, the bit error rate will become very small, which means the error barely has noticeable effect on the overall system and could be ignored. However, if the number of errors is big, the signal to noise ratio (SNR) is low, and then the bit error rate needs to be considered. In another word, the system has been affected by noise.
Noise and the propagation path change (radio signal paths are used) are two main reasons that cause the degradation of data channel and generate the corresponding bit error rate (BER). However the two effects affect in different ways. For example the noise following a Gaussian probability function while the propagation model follow a Rayleigh model. Which means it is very necessary using statistical analysis techniques to undertake the analysis of the channel characteristics.
For fibre optic systems, bit errors usually caused by the imperfections in the components such as the optical driver, receiver, fibre and connectors that used for making the link. However it may also be introduced by optical dispersion and attenuation. What’s more, the optical receiver may detects the noise, this will also interfere the system. Typically, the fibre optical system will use sensitive photodiodes and amplifiers to respond to very small changes, and there is a possibility that high noise level will be detected.
The phase jitter that present in the system is another possible factor which enable the sampling of the data altered.
A number of factors are able to affect the bit error rate (BER). To optimize the system and acquire the required performance levels, it is very necessary to manipulate the controllable variables. Normally, in order to adjust the performance parameters at the initial design concept stages, this should be undertaken in the design stages of a data transmission system.

Interference: The interference levels in the system are usually controlled by external factors, and can not be changed by optimizing the system design. However, the bandwidth of the system is a controllable factor. The level of interference will be reduced if the bandwidth is reduced. However the disadvantage is the achievable data throughput will be low when the bandwidth gets reduced.
Increase transmitter power: To increase the power per bit, the power level of the system should be increased at the same time. Factors like the interference levels to other users and the impact of increasing the power output on the size of the power amplifier and overall power consumption and battery life, the impact of them should be reduced to help control the bit error rate (BER).
Lower order modulation: Lower order modulation schemes are thinkable way to balance the bit error rate. However the achievable data throughput will reduce.
Reduce bandwidth: Another adoptable approach is reducing the bandwidth of the system to reduce the bit error rate (BER). As a result, the system will receive lower levels of noise and the signal to noise ratio (SNR) will be improved. However, the achievable data throughput will reduce as well.

However it is not possible to achieve all the requirements, sometimes needs to do some trade-offs. In order to achieve the required bit error rate (BER), it is very necessary to balance all the available factors. When the bit error rate (BER) is lower than expected, under the risk of receiving unsatisfied levels of error correction that are introduced into the data being transmitted, further trade-offs are still necessary. Even though it may need higher levels of error correction when sending more redundant data, but the effect of any bit errors can be masked, as a result, the overall bit error rate (BER) will improve.
As radio and fibre optic system, in order to detect the indication of the performance of a data link, bit error rate (BER) is an excellent parameter for that. It is also one of the main parameter of interest in data links that detects the number of errors that occurs. Other features of the link such as the power and bandwidth, etc are able to get the performance that required after adjusting with the knowledge of the bit error rate (BER).
b. Shift Register
The Shift Register is another type of sequential logic circuit that can be used to save or transfer data in the form of binary numbers. It loads data that present on its inputs and then moves or “shifts” data to its output during every clock cycle.
Basically, a shift register is comprised by numbers of single bit “D-Type Data Latches”, one for each data bit, either a logic “0” or a “1”. The connection arrangement type is serial chain, which is able to keep turning every output from data latch become the input of the next latch.
In shift register configuration, the data bits are able to work in several ways such as fed in or out from either the left or right direction one by one or all together in parallel at the same time.
Usually, the most widely used construction of a single shift register is made up by eight individual data latches to match eight bits (one byte) data, which means the number of individual data latches is decided by the number of bits that need to be stored. While a shift register may comprise numbers of individual data latches, but all of them are driven by one common clock (CLK) signal, which makes those latches working synchronously.
Shift registers normally used in computers or calculates for storing or transferring data. The principle of it is converting data from serial to parallel or from parallel to serial format. For example, if saving data inside computer, shift registers can store all the binary numbers before them added together.
In order to set or reset the state of shift register, it always contains an additional connection with the required function. There are four different operation modes for shift register to transfer data.

Serial-in to Serial out (SISO) – either a left or right direction, with the same clock, the data is shifted serially “IN” and “OUT” of the register one bit at a time. The figure 3 below shows an example of it which transfer data from left to right.

Figure 3 4-bit Serial-in to Serial-out Shift Register

Serial-in to Parallel-out (SIPO) – one bit at a time, the data is loaded in register serially, and available to output together in parallel way. The figure 4 below shows an example of it but with 4 bits data input and output and the data transferred from left to right.

Figure 4 4-bit Serial-in to Parallel-out Shift Register

Parallel-in to Parallel-out (PIPO) – the parallel data is introduced together into the register at the same time, and then transferred to each correspondent outputs together under the same clock pulse. The figure 5 below shows an example of it with 4 bits parallel data input and output and the direction of data movement is from left to right.

Figure 5 4-bit Parallel-in to Parallel-out Shift Register

Parallel-in to Serial-out (PISO) – the parallel data is introduced together into the register in the meantime, and then one bit at a time, shifted out serially under the control of clock. The figure 6 below shows an example of it with 4 bits data input which transfer data from left to right.

Figure 6 4-bit Parallel-in to Serial-out Shift Register
c. Pseudo Random Bit Sequence Generator (PRBSGEN)
A random bit generator is a device or algorithm that used to output a sequence of independent and unbiased binary digits in statistics. Meanwhile, a pseudo random bit sequence generator (PRBSG) is a deterministic algorithm, which means if a truly random binary sequence of length X is given, the binary sequence output of length Y >> X would be random. The input of the pseudo random bit sequence generator (PRBSG) is normally called the seed, while the output of it is called a pseudo random bit sequence. The pseudo random bit sequence generator (PRBSG) can be used as random because the value of an element of the sequence is not related to the values of any of the other elements.
However, the output of a pseudo random bit sequence generator (PRBSG) is not truly random. With all possible binary sequences of length Y, the number of possible output sequences is a small fraction maximally. After N elements, the sequence starts to repeat itself, which means it is deterministic. The aim is to receive a small truly random sequence and then expand it into a sequence with much larger length.
Generally, the implementation of pseudo random bit sequence generator (PRBSG) is based on the linear feedback shift register (LFSR). The pseudo random bit sequence generator (PRBSG) makes a sequence of logic “0” and “1” under the same probability. A sequence of serial n*(2^n -1) bits use one data pattern, and this pattern will repeat itself over time.
In the Manchester serial data communications channel, the pseudo random bit sequence generator (PRBSG) is implemented in System Verilog programming language, and used to sample two bit input data and managed through a logic xor gate, then introduce the result into the first bit of the sequence as the feedback. The output of the pseudo random bit sequence generator (PRBSG) was taken from all the nine bits of the shift register. The feedback connections of the pseudo random bit sequence generator (PRBSG) are shown in appendix A. As a result, the output of the pseudo random bit sequence generator (PRBSG) cycles between 0 to 511.

Figure 7 Principle of Pseudo Random Bit Sequence Generator (PRBSG)
d. Manchester Coding
The Manchester coding is well known because of the development in the University of Manchester. It is used to save data on the magnetic drum of the Manchester mark one computer.
In signal transmission domain, Manchester coding is widely used. However in order to achieve the same data rate but less bandwidth, the more complex codes are created such as 8B/10B encoding. Meanwhile the disadvantages of them are in the transmitter device, not able to have high tolerant of frequency errors and jitter, and receiver reference clocks. The worst problem is the Manchester encoding is not suitable for higher data rate because it will introduce some difficult frequency errors into the system. But the advantage of Manchester coding is helping recover the clock by transforming line voltage frequently, which is proportional to the clock rate directly.
It is very convenient to transmit data by media like Ethernet without a DC component because the DC component of encoded signal is not determined by the data that transmitted, which means no information will be transmitted in signal. The figure 8 below shows the principles of Manchester coding, which are:

Each bit is transmitted once a period.
Logic “0” expresses a low-to-high transition, logic “1” expresses a high-to-low transition.
At the midpoint of a period, logic “0” or “1” will be interconverted.
The transformation at the beginning of a period does not mean the data.

Figure 8 Principle of Manchester Encoding

Figure 9 The Circuit Design  
The figure 9 above is the complete design of whole circuit. All the components that required comprising a Manchester serial data communications channel are designed successfully. The data will be divided in the clock divider (Divclk) component, sampled in the pseudo random bit sequence generator (Prbsgen) component, and then altered into Manchester signal by a logic xor gate, through the transmission channel, the data will be sent into the shift register, combine into 10 bits DATA signal, after analysed in the Pattdet component, 4 states will be sent into finite state machine (FSM) component and be recovered as the signal of RBC and RNRZ. In this system the clock frequency is 100MHZ and the reset will set at logic “1” before the system work.
The programs of all components used in the system are shown below.

Figure 10 Clock Divider Program
The figure 10 above is the click divider program. This component is designed for dividing the clock signal into two different clock signal div_out and div_out2. These two output signals are shown in figure 13. In which the signal div_out gets one clock of high pulse every 10 clocks, the frequency is 10MHZ, and works as the specific input o the Prbsgen component. While signal div_out2 gets 5 clocks of high pluses per 5 clocks, also the frequency is 10MHZ.

Figure 11 Prbsgen Program
The figure 11 above is the Prbsgen program. It works as a pseudo-random bit sequence generator, which records 10 bits of data each clock, when signal div_out gets high impulse, sampling the 4th and 8th data into a logic xor gate and then put the result into the 1st data position as the feedback of the sampling function. Finally, output the prbs signal (as shown in figure 7) or NRZ signal (in figure 13).

Figure 12 Logic Xor Gate Program
The figure 12 above is the logic xor gate program. In order to combine the NRZ and Bit_clk signal together and output the signal T (Manchester code) which is shown in figure 13 below. When NRZ gets high and Bit_clk gets low, output Manchester is high; when NRZ gets high and Bit_clk gets high, Manchester is low; when NRZ gets low and Bit_clk gets high, Manchester is high; when NRZ gets low and Bit_clk gets low, Manchester is low.

Figure 13 Manchester Signal
As the figure 13 shown, the clock divider, the pseudo random bit sequence generator (PRBSG), and logic xor gate all work well, the output signal div_out and div_out2 are both divided as required, while the prbs signal (NRZ) is as expected and the T signal (Manchester code) is the same as the signal that xors with div_out2 and prbs (NRZ) signal.

Figure 14 Transmission Delay Program
The figure 14 above is the transmission delay program. It is used to simulate the data transmission delay during the real life. Normally, errors like noise, interference and phase jitter are introduced into the data through this part, while the time of transmission delay depends on the distance between the signal emitter and receiver. In this system, the parameter of time delay set at 1.5e-6 in seconds.

Figure 15 Shift Register Program
The figure 15 above is the signal register program. The function of it is compressing and storing the Manchester data and then transfer into pattdet component. It starts working only when reset is logic “0”, input en is logic “1”.

Figure 16 DATA Signal
The output of 10 bits data (DATA) is the same as required, which means the program of shift register works well.

Figure 17 Pattdet Program
The figure 17 above is the pattdet program. It is used for analysing the DATA signal, and the output follows the principle which shown in table 1 below.

Data

00000 00000

00000 11111

11111 00000

11111 11111

State

S1

10’h3EQ

S2

10’h01F

S3

10’h3FF

S4

10’h000

Table 1 The Working Principle of Pattdet Component

Figure 18 4 States
From figure 18 above, 4 states of s1, s2, s3, s4 are outputted separately and successfully.

Figure 19 Finite State Machine (FSM) Program
The figure 19 above is the finite state machine (FSM) program. The function of it is analysing the 4 states and recovering the bit_clk, bit_EN and NRZ signal, and the principle of it is shown in below figure 20. From the figure 20, when signal NRZ turns to logic “0” from logic “0”, state s1 turns to s2; when signal NRZ turns to logic “1” from logic “0”, state s1 turns to s4; when signal NRZ turns to logic “1” from logic “1”, state s2 turns to s1; when signal NRZ turns to logic “0” from logic “1”, state s2 turns to s3.

Figure 20 The Principle Of FSM

Figure 21 The Bit Error Rate Tester (BERT) Program
The figure 21 above is the catalogue program of bit error rate tester. It contains the clock divider, prbsgen, encoder (logic xor gate), shift register, pattdet and fsm program file.

Figure 22 The Test Bench Program
The figure 22 above is the test bench program. It defines all the factors in the system and especially the period of reset and clock.

Figure 23 The Implemented Design
This is the implemented design figure, which shows the service condition of devices.

Figure 24 The Schematic Design

Figure 25 The Detailed Figure of FSM
The figure 25 above is the schematic design which shows the real used state of every component. However, the part of clock divider and pseudo random bit sequence generator (PRBSG) is not satisfied one. The problem may be caused by the vivado operation system software issue or the definition of clock divider and pseudo random bit sequence generator (PRBSG) is not recognised by the software.

Find Out How UKEssays.com Can Help You!
Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.
View our services

The Manchester serial data communications channel built up successfully. In figure 26, the signal RBC, RNRZ and RBE are all recovered, just the same as the original signal bit_clk, NRZ and bit_en but with some time delay. The next objective is developing applications for the Manchester serial data communications channel. The chosen target is a bit error rate tester. By setting up a noise component for introducing random noise into the Manchester signal and then an error counter inside the finite state machine (FSM) for counting the number of errors that occurs and the total number of bits sent. As a result, the bit error rate (BER) will be able to count in the system.
Figure 26 Recovered RNRZ and RBC Signal
The figure 26 above is the final simulation result. Signal of recovered non return to zero (RNRZ), recovered bit_clock (RBC) and recovered bit_en (RBE) are all the same as their original data but with time delays.

Figure 27 The Signal of Input and Recovered
From the figure 27 above, the recovered signal is almost the same as the original input data. These two figures prove the design of Manchester serial data communications channel is successful. After setting up this communication channel, the next step is developing applications for further requirements.
The theory of pseudo random bit sequence generator (PRBSG), Manchester coding, shift register and bit error rate are proved to be feasible. This project is a great opportunity to practise from research ideas to concrete systems. With the explosive growth of wireless communication system,

Ciletti, M. (2011). ‘Advanced digital design with the Verilog HDL’. 1st ed. Boston: Prentice Hall. (Accessed: October 9 2016).
Menezes, A., Van Oorschot, P. and Vanstone, S. (1997). ‘Handbook of applied cryptography’. 1st ed. Boca Raton: CRC Press, p.chapter. Available at: http://cacr.uwaterloo.ca/hac/about/chap5.pdf (Accessed: December 25 2016).
MUKHERJEE, S. and PANDEY, R. (2007). ‘DESIGN AND IMPLEMENTATION OF PRBS GENERATOR USING VHDL’. bachelor. Department of Electronics & Communication Engineering National Institute of Technology Rourkela. Available at: https://pdfs.semanticscholar.org/42e7/490ec8905ea8afe618c6882f2b050ece2ae4.pdf (Accessed: October 14 2016).
Malviya, S. and Kumari, P. (2014). ‘Implementation of Pseudo-Noise Sequence Generator on FPGA Using Verilog’. [online] Dept of Electronics and Communication, Sobhasaria Group of Institution, Sikar, Rajasthan. Available at: https://www.ripublication.com/irph/ijeee_spl/ijeeev7n8_16.pdf [Accessed 

OpenMP Based Fast Data Searching with Multithreading

V.Karthikeyan, Dr. S.Ravi and S.Flora Magdalene
 
Abstract
The multiprocessor cores with multithreaded capability are continuing to gain a significant share and offer high performance. The use of OpenMP applications on two parallel architectures can identify architectural bottlenecks and introduces high level of resource sharing in multithreading performance complications. An adaptive run-time mechanism provides additional but limited performance improvements on multithreading and is maximized the efficiency of OpenMP multithreading as required by the runtime environment and the programming interface. This paper handles the task of data searching efficiently and a comparative analysis of performance with and without OpenMP is made. Experimental result shows accelerated performance over the existing methods in terms of various performance criteria.
Keywords: OpenMP (Open Multi Processing), Multithreading, Fast Data Searching, Multicore
Introduction
OpenMP is an adopted shared memory parallel programming interface providing high level programming constructs that enable the user to easily expose an application task and loop level parallelism. The range of OpenMP applicability is significantly extended by the addition of explicit tasking features.OpenMP is used for enhanced portability computation, where a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is reported by [5] statically partitioned into independent subtrees to reduce memory synchronization overhead. It improves the performance of a workload predictive thread assignment strategy and a false cache line sharing prevention method is required. OpenMP is a collection of compiler directives and library functions that are used to create parallel programs for shared-memory computers. It combines with C, C++ or Fortranto create a multithreaded program where the threads share the address space and make easier for programmers to convert single-threaded code to multithreaded. It has two key concepts namely;
Sequential equivalence: Executes using one thread or many threads.
Incremental parallelism: A programming that evolves incrementally from a sequential program to a parallel program.
OpenMP has an advantage in synchronization over hand-threading where it uses more expensive system calls than present in OpenMP or the code efficient versions of synchronization primitives. As a shared-memory programming paradigm, OpenMP is suitable for parallelizing applications on simultaneous multithreaded and multicore processors as reported in [11]. It is an API (application program interface) used for explicitly direct multi-threaded, shared memory parallelism to standardize programming extensions for shared memory machines is shown in Figure 1.

Figure1:Model for OpenMP Program using threading
At high-end, the microprocessors encompass aggressive multithreading and multicore technologies to form powerful computational building blocks for the super computers. The evaluation uses detailed performance measurements and information from hardware performance counters to architectural bottlenecks of multithreading and multicore processors that hinder the scalability of OpenMPin which OpenMP implementations can be improved to better support execution on multithreading processors. The thread scheduling based model with kernel and user space is shown in Figure 2.OpenMP applications can efficiently exploit the execution contexts of multithreading processors. The multi-threading models are;

Master-Slave model,
Worker-Crew model and
Pipeline model

Figure 2:Multithreading processors using Kernel and User space
OpenMP Issues with Multithreading Approach
OpenMP specification includes critical, atomic, flush and barrier directives for synchronization purposes as shown in Table 1.
Table 1:OpenMP synchronization specification

Functions

Purposes

User-inserted locks

Analogous to mutex locks in threads library.

Critical directive

Associates with a name and all unnamed critical sections map to the same name.

Flushes directive

Avoid false sharing and placed on different cache lines contiguously.

Ordered directive

Perform I/O in a sequential order and a parallel loop is avoided in threads.

Effects of OpenMP for Multithreading Process
The effects of OpenMP for multithreading process arelisted in Table 2.
Table 2:Effects of OpenMP

Context

Utilization/Accomplished Strategy

Scaling of OpenMP on multithreading

Effortless by the effects of extensive resource sharing.

Multicore architectures with multithreading cores

Gain designs to achieve the balance between energy.

Execute threads within multithreading

Utilized a single level of parallelism.

Optimization criteria

Energy, die area and performance

The multithreading is required a solution which is scalable in a number of dimensions and achieve speedups. An efficient parallel program usually limits the number of threads to the number of physical cores that create a large number of concurrent threads. It describes the low-level Linux kernel interface for threads and the programs are invoked by a fork system call which creates a process and followed by an exec system call and loads a program to starts execution. Threads typically end by executing an exit system call, which can kill one or all threads.
Related Works
Daniel, et al., [2010] presented the compilation of synchronous programs to multi-threaded OpenMP-based C programs and guarded actions which are a comfortable intermediate language for synchronous languages. J. Brandt and K. Schneider [2009] presented separate compilation of synchronous programs. The target deterministic single-threaded code directly executes synchronous programs on simple micro-controllers. K. Schneider [2009] proposed the problem to generate multi-threaded C-code from synchronous guarded actions, which is a comfortable intermediate format for the compilation of synchronous programs. PranavandSumit [2014] proposed the performances (speedup) of parallel algorithms on multi-core system using OpenMP. C.D. Antonopoulos, et al., [2005] proposed multigrain parallel delaunay mesh generation and opportunities for multithreaded architectures. H. Jin, et al., [1999] proposed the OpenMP implementation of NAS parallel benchmarks and its performance.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

M. Lee, et al., [2004] presented peak performance of SPEC OMPL benchmarks using maximum threads demonstration and compared with a traditional SMP. Zaid, et al., [2014] presented to implemented the bubble sort algorithm using multithreading (OpenMP) and tested on two standard data sets (text file) with different sizeF. Liu and V. Chaudhary [2003] presented a system-on-chip (SOC) design integrates processors into one chip and OpenMP is selected to deal with the heterogeneity of CMP.M. Sato, et al., [1999] proposed the compiler is installed to support OpenMP applications and GCC acts as a backend compiler.T. Wang, et al., [2004] presented the current flat view of OpenMP threads is not able to reflect the new features and need to be revisited to ensure continuing applicability.Cristiano et al., [2008] proposed reproducible simulation of multi-threaded workloads for architecture design exploration.Vijay Sundaresan, et al., [2006] proposed experiences with multi-threading and dynamic class loading in a java just-in-time compiler. Priya, et al., [2014] proposed to compare and analyze the parallel computing ability offered by OpenMP for Intel Cilk Plus and MPI(Message passing Interface). Sanjay and Kusum [2012] presented to analyze the parallel algorithms for computing the solution of dense system of linear equations and to approximately compute the value of OpenMP interface. S.N. TirumalaRao [2010] focuses on performance of memory mapped files on Multi-Core processors and explored the potential of Multi-Core hardware under OpenMP API and POSIX threads.
Explicit Multithreading Using Multithreads
The Explicit multithreading is more complex compared to OpenMP and dynamic applications need to be implemented effectively so as to allow user control on performance. The explicit multithreading based multithreads with C coding are shown in Figure 3.

Figure3: Explicit multithreading based coding in C
Scheduling for OpenMP
OpenMP supports loop level scheduling that defines how loop iterations are assigned to each participating thread. The scheduling types are listed in Table 3.
Table 3: Scheduling Types

Scheduling Types

Description

Static

Each thread is assigned a chunk of iterations infixed fashion (round robin).

Dynamic

Each thread is initialized with a chunk of threads, when each thread completes its iterations, it gets assigned the next set of iterations.

Runtime

Scheduling is deferred until run time. The schedule type and chunk size can be chosen by setting the environment variable OMP_SCHEDULE

Guided

Iterations are divided into pieces that successively decrease exponentially, with chunk being the smallest size.

Pseudo code:
#pragma omp parallel sections
{
#pragma omp section
do_clustering(0);
#pragma omp section
do_clustering(1);
#pragma omp section
do_clustering(2);
#pragma omp section
do_clustering(3);
#pragma omp section
do_clustering(4);
}
Optimizing Execution Contexts on Multithreading Process
The selection of the optimal number of execution contexts for the execution of each OpenMP application is not trivial on multithread based multiprocessors. Thus, a performance-driven, adaptive mechanism which dynamically activates and deactivates the additional execution contexts on multithreading processors to automatically approximate the execution time of the best static selection of execution contexts per processor. It used a mechanism than the exhaustive search, which avoids modifications to the OpenMP compiler and runtime and identifies whether the use of the second execution context of each processor is beneficial for performance and adapts the number of threads used for the execution of each parallel region. The algorithm targets identification of the best loop scheduling policy which is based on the annotation of the beginning and end of parallel regions with calls to runtime. The calls can be inserted automatically, by a simple preprocessor. The run-time linking techniques such as dynamic interposition can be used to intercept the calls issued to the native OpenMP runtime at the boundaries of parallel regions and apply dynamic adaptation even to un modified application binaries. It modifies the semantics of the OpenMP threads environment variable,using it as a suggestion for the number of processors to be used instead of the number of threads.
Results and Discussion
The experimental results of data searching with OpenMP tools (multithreading) and without OpenMP (no multithreading) tools are shown in Figure 4and Figure 5 respectively. In both the cases search time for data is evaluated and established OpenMP based implementation which is fast compared to data searching done without OpenMP tools.

Figure 4:Search time with OpenMP (Multithreading)

Figure5:Search time without OpenMP (No Multithreading)
The percentage of improvement in data searching with OpenMP (multithreading) tools is given in Table 4 and its graphical representation shown in Figure 6.
Table 4:Improvement with Multithreading

With OpenMP

0.00046

0.00075

0.00758

Without OpenMP

0.00049

0.00089

0.00949

Improvement Ratio (%)

7

16

21

Figure6:Improvement in data Searching with OpenMP (in %)
The time elapsed to write data in file which is experimented with OpenMP and without OpenMP (search data) shown in Figure 7 and Figure 8 respectively.

Figure 7:Search datawith OpenMP

Figure 8:Search datawithout OpenMP
Conclusion
Searching a data in large data base has been a profound area for researchers. In this research work OpenMP Tools is used to perform multithreading based search. The motive to use OpenMP is that the user can specify a paralliazation strategy for a program. Here an experiment of data searching using multithreading is conducted for a data base. The experiments are conducted with and without OpenMP and their performance is presented. The results obtained shows that the time required for searching a data using OpenMP is less compared to data searching without OpenMP. The method presented shows improved performance over existing methods in terms of performance and parallaziation can be done in future. The main limitation of the research work is that its practical implementation requires same number of multicore units as that of the number of threads. Future research shall focus on use of parallel threads for high performance systems.
References

Daniel Baudisch, Jens Brandt and Klaus Schneider, 2010, “Multithreaded Code from Synchronous Programs: Extracting Independent Threads for OpenMP”, EDAA.
J. Brandt and K. Schneider, 2009, “Separate compilation of synchronous programs”, in Software and Compilers for Embedded Systems (SCOPES), ACM International Conference Proceeding Series, Vol. 320, pp. 1–10, Nice, France.
K. Schneider, 2009, “The synchronous programming language Quartz”, Internal Report 375, Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany.
PranavKulkarni, SumitPathare, 2014, “Performance Analysis of Parallel Algorithm over Sequential using OpenMP”, IOSR Journal of Computer Engineering, Vol. 16, No. 2, pp. 58-62.
C. D. Antonopoulos, X. Ding, A. Chernikov, F. Blagojevic, D. S. Nikolopoulos and N. Chrisochoides, 2005, “Multigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures”, in Proceeding of the 19thACM International Conference on Supercomputing (ICS’2005), Cambridge, USA.
H. Jin, M. Frumkin and J. Yan, 1999, “The OpenMP Implementation of NAS Parallel Benchmarks and its Performance”, Technical Report NAS-99-011, NASA Ames Research Center.
M. Lee, B. Whitney and N. Copty, 2004, “Performance and Scalability of OpenMP Programs on the Sun FireTM E25K Throughput Computing Server”, WOMPAT 2004, pp. 19-28.
ZaidAbdiAlkareemAlyasseri, Kadhim Al-Attar and Mazin Nasser, 2014, “Parallelize Bubble Sort Algorithm Using OpenMP”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 4, No. 1, pp. 103-110.
F. Liu and V. Chaudhary, 2003, “Extending OpenMP for heterogeneous chip multiprocessors Parallel Processing”, Proceedings of International Conference on Parallel Processing, pp. 161-168.
M. Sato, S. Satoh, K. Kusano and Y. Tanaka, 1999, “Design of OpenMP compiler for an SMP cluster”, Proc. of the 1st European Workshop on OpenMP, pp.32-39.
T. Wang, F. Blagojevic and D. S. Nikolopoulos, 2004, “Runtime Support for Integrating Pre-computation and Thread-Level Parallelism on Simultaneous Multithreaded Processors”, the Seventh Workshop on Languages, Compilers, and Run-time Support for Scalable Systems, Houston, TX.
Cristiano Pereira, Harish Patil and Brad Calder, 2008, “Reproducible simulation of multi-threaded workloads for architecture design exploration”, in Proceedings of the IEEE International Symposium on Workload Characterization, pp. 173-182.
Vijay Sundaresan, Daryl Maier, PramodRamarao and Mark Stoodley, 2006, “Experiences with multi-threading and dynamic class loading in a java just-in-time compiler”, in International Symposium on Code Generation and Optimization, pp. 87–97, San Francisco, USA.
Priya Mehta, Sarvesh Singh, Deepika Roy and M. Manju Sharma, 2014, “Comparative Study of Multi-Threading Libraries to Fully Utilize Multi Processor/Multi Core Systems”, International Journal of Current Engineering and Technology, Vol. 4, No. 4.
Sanjay Kumar Sharma and Kusum Gupta, 2012, “Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP”, International Journal of Computer Science, Engineering and Information Technology, Vol. 2, No. 5.
S.N. TirumalaRao, E.V. Prasad, N.B. Venkateswarlu, 2010, “A Critical Performance Study of Memory Mapping on Multi-Core Processors: An Experiment with k-means Algorithm with Large Data Mining Data Sets”, International Journal of Computer Applications, Vol. 1, No. 9.

 

Water Meter Data Management System Analysis

SYSTEM ANALYSIS

EXISTING SYSTEM

The conventional billing system for water usage involves person visits each residential and read the meter data manually. The collected data are used for billing purpose. Manual readings can cause error and can lead to corruption. Thus the billing system can become inaccurate and inefficient. There are chances of leaks and theft which could not be identified. A traditional water meters provide only total consumption of water and provides no information about when the water was consumed at each meter site. Traditional water meters requires back end billing which may not provide accurate billing.

PROPOSED SYSTEM

Water Meter Data Management provides several benefits to both utilities and customer. It involves long-term meter data management for vast quantity of data received from smart meters. The data is then validated according rule engine and stored in database for billing purpose.
Water Meter Data Management (WMDM) involves smart meter data collection, planning and management. It fetches and records water meter reading periodically to identify amount of water is being used by the consumer. It also creates awareness among consumers about the consumption of water. Water meter readings are collected automatically without human intervention.
After manufacture, meters will have a universally unique ID (UUID) which will be printed on meter and will act as part of the meter’s serial number. Under normal operating conditions the Data Concentrator Unit will query a meter periodically to read its meter data. It is Data Concentrator Unit which always initiates the communication between meters.
Meter commands will be sent over radio frequency to various meters from DCU and responses is sent by meters. DCU periodically communicates with meters and collects data from them and sends to Head End Server (HES) typically through HTTP.
A WMDM system performs accurate data storage and fast management of vast quantities of data delivered by smart metering systems. This data primarily consists of usage and events that are imported from the Head End Servers and that manage the data collection in Automatic meter reading (AMR) systems. A WMDM system will mainly import the data, then validate, cleanse errors and estimates it and makes it available for analysis and billing purpose.
Each meter is integrated with SIM, WMDM make use of Existing Global System for Mobile communications (GSM) networks for sending request and receiving data. It promises fast and accurate billing. System offers alerts on leaks and suspected theft.
MODULES DESCRIPTION
Project contains the following modules:

Head End System
Data Collection
Validation and Error Estimation
Visualization

Head End System
HES is used receive stream of meter data from DCU through the Advanced Meter Infrastructure (AMI). Data Concentrator Unit (DCU) communicates with several numbers of meters and collects the data from them and transmits to HES.
The data is sent in multiple frame formats and frames are of constant size of 20 bytes. The frame consists 4 bytes of header, 2 bytes of data size, 1 byte of frame id, 2 bytes of flags, 4 bytes of source address, 4 bytes of destination address, 1 byte of checksum, last 2 bytes of CRC.
HES periodically collect data from DCU and store it in different file formats such as CSV, XML, and TXT.
HES pings DCU to check whether water meter is responding or not responding. This is one of main advantage in WMDM where it alerts in case if meter is not working but water is being consumed.
READ command is used to get particular meter readings among several number of meters using meter serial number.
It is DCU which always initiates the communication between sets of meters.
DATA COLLECTION
Data collection allows data to be stored easily and efficiently. It easy-to-use data acquisition solution for collecting water usage information and for display and reporting purposes. It mainly concentrates on acquiring various sets of data from different file formats stored in database. Rule engines are developed to convert raw data into respective formats, processed and stored on to database.

Radio Frequency

HTTP

Raw data

Converted to
Native formats

Exact data
VALIDATION AND ERROR ESTIMATION
Rule-based algorithms are developed to validate meter readings stored in database. It provides either actual data or the best possible estimate. Invalid data can be analysed to further identify the root causes of any problem.
Multiple rules can be executed simultaneously and can be prioritized to match business needs. Estimation based on consumption profiles or historical data are automatically calculated as a substitute for missing data.
VISUALIZATION
This module mainly concentrates on meter data interpretation fetched from database and visualized by hourly, daily, monthly data using graphs. Visualization module is also used to Compare meter data of different customers.
Visualization is more user-friendly and also creates awareness by comparing meter data of different customer.

REQUIREMENT SPECIFICATION

FUNCTIONAL REQUIREMENT
Meters:

DCU communicates with meters to collect and store meter readings according to interval of 30 minutes or hourly.
Provides a capability to remotely access meters readings to support customer billing, service and system operation.
Provide processing at the meter or within system necessary for customer service or system operation application.
Allows customer to view meter data using graphs.

Utility Data Processing:

Entry, update and monitoring of data on installation and replacement of meters.
Data stored according to regular intervals are validated in accordance with billing standards and updated to database.
Validated data must be integrated to support customer billing and other system functions.

AMI Network System:

It provides a capability to manage vast meter data collection schedules, and alerts in case of meters problem and all other system maintenance and operations.

NON-FUNCTIONAL REQUIRMENTS
Availability:

Water meter Data Management System is available 24/7.
Customer can view their water usage anytime.

Reliability:

The reliability of the overall application depends on the reliability of the meter data being collected.

Maintainability:

In case of a failure, the meter data can be requested from DCU.
Vast amount data can be easily stored and updated.

Extensibility:

New features can be added and system can be upgraded to meet business requirements.

Performance:

Response times – application loading, screen open, refresh times, etc are highly responsive.
Processing times –Calculations of bill, importing and exporting data are done in less amount of time.
Query and Reporting times – The application initial loads and subsequent loads are done fast.

Fraud Tolerance

System identifies the tampering in meters automatically.

 

Healthcare Technology and Big Data

Introduction
As technology advances, medical devices are able to record increasing amounts of information. These devices are also becoming much more assessable to consumers than in the past. In Adam Tanner’s article “Health Entrepreneur Debates Going To Data’s Dark Side,” he discusses the company Safe Heart. Safe Heart is developing medical devices for consumer use. These devices are able to measure values like blood oxygen saturation, heart rate, and perfusion index. Being able to collect these massive amounts of data, places these devices in the realm of big data. Although the topic of big data imposes its own issues, the medical nature of the data creates an additional set of important issues.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

Safe Heart is not the first organization to develop devices that collect “big” quantities of data. In recent years, many organizations have begun to capture and use large quantities of medical data. Hospitals, credit agencies and researchers have all started to use medical data to the advantage of either the patient or their own corporation. With all the data being captured, there are legal and ethical issues that become apparent.
Main Issues
The most prominent issue related to big healthcare technology data is a legal one. The Health Insurance Portability and Accountability Act (HIPAA), protects health data that is transmitted by a certain groups and organizations [1]. It states that consent must be obtained from the patient to distribute any information to a third party. The organizations included are health plans, health care clearinghouses, and some health care providers [1]. This would mean that non-health organizations transmitting health information would not be subject to HIPAA. The previously mentioned organization Safe Heart, would not be subject to HIPAA because they are not an organization covered by the act. Safe Heart would be able to transmit data in a variety of ways and not be limited by the restrictions of HIPAA. Another act that has the power to govern patient data, but is not optimized for current technologies, is the Privacy Act [1]. The Privacy Act protects data that is distributed by the federal government. To distribute data, the government must remove personally identifying information from the records [1]. After the information is removed, this allows the government to distribute massive amounts of civilian health data publically. As long as explicitly identifying attributes like name and address have been removed, the Privacy Act does not limit how much, or where the data can be distributed. There are few bounds on what the government can do, making this a pressing legal issue.
Big data also imposes several ethical issues on healthcare technology. Even though health agencies may anonymize data in accordance with the Privacy Act, it is still possible to associate the data back to the individual. The Massachusetts Group Insurance Commission released a dataset in the 1990s, and they assured the public that the data had been completely anonymized. A graduate student at the time combined this dataset with voting data and was able to associate medical data back to the correct patient. Shortly after this, it was shown that an American can be identified with only their zip code, birthdate and sex [2]. This imposes a myriad of issues on medical technology companies like Safe Heart. If a released dataset is not properly anonymized, the large amounts of data collected by the devices can be associated back to the patients. This also has powerful ethical implications when considering the results of a study done by the Privacy Rights Clearinghouse. This organization studied a collection of mobile health and fitness applications for both iOS and Android operating systems. The study found that many of the applications transmitted data, without user notification, to third parties. The data transmitted included items like latitude, longitude, and zip code data [3]. Since many of the developers were not medical entities, the data sharing is not limited. The medical data can be used for marketing of products and can be sold to third parties for other uses. This is a large invasion of user privacy and creates one more way to link consumers to their already existing medical data that has been “anonymized.”
Major Stakeholders
The winners here are largely marketing and advertising agencies. After buying a, or using a publically available, dataset marketers can use the few remaining pieces of identifying information like location, age and gender to target specific consumers. With improved consumer targeting, marketing and advertising agencies can increase their revenue and further their own product line. The consumers are also winners depending on how their data is handled. If the data is handled correctly, the profits from the distribution of the data would allow companies, like SafeHeart, to subsidize the cost of the medical devices [4]. Subsidized devices would allow medical technological companies to reach a broader demographic, providing increased public benefit. The data gathered by the consumer medical devices can also be used to enhance medical research providing additional benefit to the consumers [5]. Finally, the collection of data can benefit consumers because it enables improved tracking of diseases among an entire population [6]. If diseases can be detected faster, a large portion of the public would benefit.
Although consumers can reap a large number of benefits from big data in healthcare, they are losers as well. There will be many consumers who do not want their data to be affiliated with marketing or advertising agencies. To these consumers, this is viewed as an extreme invasion of privacy. In addition to the undesired sharing, these users may be subject to the re-identification process. Even though the shared medical data contains few identifying attributes, the remaining information can be used to associate the original consumer with the appropriate medical record [2]. This too in an invasion of the consumer’s privacy, contrary to many of their desires. After consumers, some medical technology entities are also losers. For companies like Safe Heart, the profit from released datasets would reduce costs to the consumer. As a medical company, improving the public’s health is one of their primary missions. The potential that consumers may be re-identified, or targeted by marketing, with the data discourages release. The apprehension to release data limits data available to researchers making them losers as well. If data were released, researchers would be able to expedite research and provide solutions to prevalent health problems [5]. Consumers may resent the release of their data, but those trying to benefit them can produce worthwhile returns.
Summary
Advances in healthcare technology have also given birth to an increase in the amount of big data created by medical devices. Medical big data creates a unique set of legal and ethical issues that companies like Safe Heart must, and are, considering. Legally, acts like HIPAA and the Privacy Act do not sufficiently protect the data of patients. Data can move considerably freely and it is not always transferred in a completely anonymous state. It has been shown that organizations are not handling the data in an ethical manner. The release and negligent handling of the data completely invades the privacy of the patient. For marketers, this aids when trying to increase revenue. Due to many of these issues, companies have started to limit what data they share when medical devices generate it. Without accessible data sets, progress of researchers is slowed and the standard of care for the public falls. Both the benefits and risks must be considered when medical big data is involved.
Conclusions
Health devices transmitting big data are already involved in our lives. It is a serious legal issue that HIPAA and the Privacy Act do not govern our health data properly. It is critical that our laws catch up with this rapidly developing technology. A reasonable person may argue that health data should be completely restricted and there should be no transmission, or distribution, at all. It is true that data laws need to be revisited and improved, but complete restriction would be an extreme waste of the potential that medical big data stores. After the laws have been optimized for the technology, the data has the ability to improve health care throughout the nation. Big data can be extremely useful for entities like hospitals. Using patient data, hospitals can monitor a patient’s condition and know more quickly when they are due to worsen [7]. Advanced algorithms can also predict and help to prevent conditions like renal failure, infections, and negative reactions to drugs [7]. When physicians are combined with big data indicators, more patients can be helped and conditions can be monitored more reliably than in the past. In conclusion, I think that big data in healthcare should be embraced, but not before we strengthen the laws governing it.
References
[1] Kalyvas, James R. and Overly, Michael R. Big Data: A Business and Legal Guide. Auerbach Publications. 55-58.
[2] Anderson, Nate. “Anonymized” data really isn’t—and here’s why not. 9/8/09. http://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/
[3] Njie, Craig Michael Lie. Technical Analysis of the Data Practices and Privacy Risks of 43 Popular Mobile Health and Fitness Applications. 7/15/2013 http://www.privacyrights.org/mobile-medical-apps-privacy-technologist-research-report.pdf.
[4] Tanner, Adam. Health Entrepreneur Debates Going To Data’s Dark Side. 9/16/14 http://www.forbes.com/sites/adamtanner/2014/09/16/health-entrepreneur-debates-going-to-datas-dark-side/
[5] Standen, Amy. How Big Data Is Changing Medicine. 9/29/14. http://blogs.kqed.org/science/audio/how-big-data-is-changing-medicine/
[6] Schmarzo, Bill. Big Data Technologies and Advancements in Healthcare. 3/25/14. https://infocus.emc.com/william_schmarzo/big-data-technologies-and-advancements-in-healthcare/
 

Methods Of Data Collection For Primary Data

Once the researcher has determined his research objective, research question, and the corresponding hypothesis for his research project, what he needs to do now is to collect the required data. Data is information from sample that the researcher would analyse in order to meet his research objective, facilitate his research question, and prove his research hypothesis. For example, the data or information about customers are gender, age, qualification, marital status, number of kids, monthly income, brand of car, type of house, religion, hobby, sports, leisure activities, credit cards, golf membership etc.
Figure 5.1: Methods of data collection (primary data)
Method for Collecting Primary Data
Survey Method
Observation Method
Face-to-Face Interview
Telephone Interview
Computerised Questionnaires
Direct Observation
Mechanical Observation
Content Analysis
Personal Interview
Mail Survey
Self-Administered Questionnaires
Actually, what variable to include in the study depends on your research objectives, research questions, and the corresponding research hypotheses. The researcher should always refer to the three elements above when determining what data to collect in order to avoid collecting the unnecessary data, or worse, not collecting the required data.
Based on Figure 6.1, the method of data collection can be classified into survey methods and observation methods.
A) Survey method
According to Zikmund, Babin, Carr and Griffin (2010), survey is a research technique in which a sample is interviewed in some form or the behaviour of respondents is observed and described in some way.
In survey method, questionnaires are given to respondents to elicit information for the study. Respondents are asked the questions based on the information needed by the study. The questions may be asked in the verbal forms (interview), writing (mail questionnaire), or through computer (internet or e-mail).
Several advantages are
Quick
Efficient
Inexpensive
Accurate means of assessing information about a population
B) Types of survey methods
i) Personal interview: face to face communication in which an interviewer asks respondents to answer questions (Zikmund, Babin, Carr and Griffin (2010).
Face to face interview
In the face-to-face interview (sometimes called personal interview), the researcher will prepare the questions to be asked during the interview with respondents. Each question represents the variable that the researcher wants to obtain its data. The questionnaires cover all variables required from a respondent. Before the interview begins, the researcher would explain the objective of research, ask for cooperation, and give assurance that the response given is only for research purpose and the information is treated with confidential.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

This is important since personal data is confidential, and no one will reveal his personal information if the confidentiality is not assured. The interviewer should posses good personality so that the interview session proceeds smoothly and in a friendly atmosphere. During the interview, the interviewer will read the question and record the response. Personal interviews may be conducted at the respondents’ home, offices, or anywhere. Below are the common examples of places which are basically used to conduct the interview.
Door-to-door Interview
Door-to-door interview refers to the interview in which the respondents are interviewed face-to-face in their homes. The major advantage of this interview is high participation rate, but the disadvantage is high cost.
Mall Intercept Interview
Mall intercept interview refers to the personal interviews conducted in shopping malls. Interviewers typically intercept shoppers at a central point normally at the entrance to the mall. The advantage of this method is low cost since no travel required to the respondent’s home.
Mall intercept interview is appropriate when the respondents need to see, touch, or taste the product before they can provide meaningful information.
Computer-Assisted Personal Interview (CAPI)
This interview uses a computer to get the information from the respondents using several user-friendly electronic packages to design questions easier for the respondent to understand. However, this method is classified as personal interview technique because an interviewer is usually present to serve as a host and to guide the respondents. CAPI is normally used to collect data at shopping malls, product clinics, conferences, and trade shows.
Advantages of face-to-face interview
Higher response rate. With proper plan and approach, the respondents will not turn down the request for an interview. The interviewer must be friendly and creative in getting cooperation from respondents. The interview method normally achieves a response rate of more than 70%.
Data more accurate. The face-to-face meeting allows the interviewer to clarify terms or anything which might confuse the respondents. Once the respondent understands the question, he will provide an accurate response.
The interviewer can note specific reactions by respondents during interview. The physical reaction and facial expression by respondent can tell whether he is providing accurate responses or not. The interviewer can also note the physical environment surrounding the interview such as the respondents’ office, the house, the dress etc that should tally with his response.
People will usually respond with good gesture and provide accurate responses when approached in person. The Malay culture of not saying “no” still holds true when someone comes to the door with polite and peaceful manner.
An experienced interviewer can sense out if the respondent is trying to hide some information. In this case, he will use his creativity to clarify the intention or terminate the interview.
Disadvantages of face-to-face interview
High cost. Interviewers must be given specific training on methods such as the art of making personal approach, the art of asking questions, the art of requesting cooperation etc, which is expensive and time consuming. The interviewer must be confident enough to work on his own. At the same time, the daily allowance for travelling, food, lodging etc is very high.
Incomplete response. Any small mistake by interviewer can cause error in the response. The mistake such as using a wrong approach, bad facial expression, coming at the wrong time, not sensitive to certain issue in the conversation sometimes could hinder respondents from giving truthful response.
Error in recording. This happens especially when the interview session is in a hurry due to time constrain on part of respondents. The interviewer who needs to read the question and record the response quickly and simultaneously is prone to making mistakes.
Require close supervision. The interviewers’ work should be supervised closely to avoid interviewer cheating. Sometimes, due to many reasons such as time constraint, respondent is difficult to contact or the respondent is not available during appointment time, the interviewer will fill the response on his own.
Telephone interview
Sometime it is possible to collect data through telephone conversation. This method is possible if the researcher has complete directory of telephone numbers of population under study. If the respondents are government officers, employees of private firm, professional people such as doctors, lawyers, accountants etc, then the researcher have the option of using this method.
Traditional Telephone Interview
In traditional telephone interview, respondents are called through the telephone and the interviewer will ask a series of questions and record the responses. Respondents are more willing to provide detailed and reliable information on a variety of personal topics over the telephone than with personal interviews.
Computer Assisted Telephone Interview (CATI)
CATI uses a computerized questionnaire administered to respondents over the telephone. The interviewer would contact respondents over the telephone, read questions posed on the computer screen, and record the respondent’s answers directly into the computer memory bank. The computer systematically guides the interviewer and checks the responses for appropriateness and consistency.
Advantages of telephone interview
Less expensive compared to the face-to-face interview. The financial cost for travelling, lodging, and outstation allowance is not involved.
Less time consuming. The number of respondents interviewed through telephone in one day is much higher than the number interviewed through face to face.
Easy monitoring. The researcher can monitor the interviewers’ work more easily since he can check the telephone numbers of respondents and the time called.
Disadvantages of telephone interview
Lower response rate. The rate of response is lower since the respondents can just “hang-up” the call when he realised it is time consuming, or unsure of the confidentiality of the conversation.
Fewer questions could be asked. Usually the conversation through the telephone cannot take long especially when the topic of discussion is not interesting, especially for the respondent since he has no particular interest in it.
Difficult to get good cooperation. Conversation through the phone is not convincing enough especially when trying to get cooperation from the respondent. It is difficult to convince someone when they cannot see you in person, your facial expression, you body gesture etc.
2) Self-administered questionnaire
In this technique, the researcher distributes questionnaires to respondents personally, through mail service, by inserting in the newspapers, or by sending to the email address. The difference between self-administered questionnaires compared to personal interview is in the self-administered questionnaire, the respondents make an effort to read and respond to the questions. However, the effectiveness of self-administered questionnaire will depend on the efficiency of the written words rather than the soft skill of interviewers.
Mail Survey
A mail survey is a self-administered questionnaire sent to pre-selected respondents through the mail. Basically, a mail interview package consists of the outgoing envelope, cover letter, questionnaire, return envelope, and possibly an incentive.
Mail Panel
A mail panel consists of a large, nationally representative sample of households that have agreed to participate in periodic mail questionnaires and product tests. The data on the panel members is updated every year and households are compensated with various incentives. Mail panel is appropriate for longitudinal design studies which allow obtaining information from the same respondents repeatedly.
Advantages of mail survey
Low cost.
No interviewer bias
Disadvantages of mail survey
Low response rate
Slow data collection speed
Structured questionnaires
One of the most popular methods of collecting research data is through the structured questionnaire. These questionnaires are self-explained and self-administered. In using this method, the researcher designs carefully a series of questions that cover the variables of interest in the study such as the respondents’ demographic background, their opinions concerning certain issues, their perception concerning certain service performance, and their intentions to do in the future etc. Structured refers to the degree of standardization imposed in the process of data collection (questionnaires). In other words, the researcher arranges the questions properly on a paper together with the cover letter to explain the purpose of data collection, the instruction to the respondents on how to respond to the questions, and the assurance of confidentiality of information provided.
Advantages of a questionnaire
Lowest cost incurred. The cost is low due to no interview, no training of interviewers, no travelling, no lodging allowance involved.
No monitoring cost incurred. Usually the sending of questionnaires and receipt of responses from respondents are done by the researcher himself.
More respondents and wider area can be covered. The researchers can send his questionnaires to as many respondents as he likes since the cost for each respondent is very small.
Response more accurate. Since there is no influence, no gesture, no facial expression, and no interruption from the interviewer, the respondents can answer the questions at his own convenience. This manner will assure the accuracy of responses.
Disadvantages of a questionnaire
The poor rate of response from respondents. Since the researcher does not have gestures or face-to-face contact with respondents, the respondents can just throw away the questionnaires. In this method, the researcher should make a regular follow up contact with respondents using letter or telephone call. Usually the researcher would call the respondents earlier to inform them that he is sending a questionnaire to obtain data for his specific objective. The call is made as a gesture in order to obtain good cooperation from respondents.
Once the respondent has difficulty regarding certain terms or questions in the questionnaire, he will not bother completing the remaining questions since nobody is available to explain. Sometimes the respondents simply send back the incomplete response.
There is no assurance that the person who responds to the questionnaire is the intended respondent or the legitimate respondent. This matter will result in sample not being representative of the population.
Private agents who conduct surveys found that people are more likely to respond to mail questionnaire that has professional appearance and come together with attractive prizes such as lucky draws for the respondents who send the completed questionnaire in time. Some questionnaires obtain good responses from respondents especially the questionnaire that comes together with the warranty cards when customers purchase certain products. Customers will have to answer series of questions on the warranty cards before sending back to the manufacturer for product warranty.
Computerised questionnaire
Internet survey
Lately, the internet poll has become one of the popular methods to obtain information from the public especially their opinion concerning certain issue of public interest. In internet survey, the researcher brings an issue to attention and requests the opinion from the public. The public can respond by voting the statement that closely resembles their opinion. At the same time they can view the current standing in term of the most popular opinion and the corresponding vote obtained.
Internet survey appears when a computer user is asked to go to a particular Web site location and answer the series of questions displayed in the website. In this technique, the respondents are not selected using specific technique but those who visit the Web site where the survey is posted are invited to participate in the survey.
2. E-mail survey
In the e-mail survey, questionnaires are sent out to the respondents directly through their e-mail address. The respondents would reply the email by providing their response to each item on the questionnaire. The professional market research groups are using the internet to send their questionnaires to the respondents’ email. The respondents would complete their respective questionnaires and also return through the internet to the researcher’s e-mail address. Normally, these research agencies would provide certain rewards in terms of discount coupons etc in order to encourage the respondents to participate in the study.
Advantages of computerised questionnaire
Low cost.
Very high data collection speed.
Non interviewer bias
Disadvantages of computerised questionnaire
Very low response rate
Low control of data collection method
B) The observation methods
According to Zikmund, Babin, Carr and Griffin (2010), observation is the systematic process of recording the behavioural patterns of people, objects and occurrences as they are witnessed. Several types of observation methods are:
Direct observation
In this method, the researcher will identify his respondents and records the required data based on what he observes. This method is suitable for a research to study the behaviour of respondents. For example, the research is carried out to identify how car drivers behave on the road during traffic jam. In his observation, the researcher will record the type of drivers who follow the traffic rules properly, and the type of drivers who choose to ignore rules such as queue jumping or overtaking using emergency lanes. The researcher may also be interested to know the types of vehicles (motorcycles, cars, buses, and lorries) which always ignore traffic rules during traffic jams. Another area where this method is suitable is to observe customers’ behaviour in the supermarket. In the market research study for example, the researcher is interested to know how customers make decisions in choosing which shampoo to buy. The researcher will stand in the area where hundreds of shampoos of different brands are placed on the rack. He will record how the customers choose the shampoo. Most probably some customers have decided earlier which brand to buy; some customers will compare prices, packaging, and even the smells before buying the shampoo. The researcher will record specific characteristics of customers who prefer certain brands of shampoo etc.
Advantages of direct observation
The data obtained reflect the actual behaviour of respondents. The respondents will tend to hide their actual behaviour when approached in person or when answering the questionnaires.
The researcher gets a clearer picture and a better feeling of the situations for his study. Hence, he will be in a better position to make a proper recommendation regarding the underlying phenomena in the study.
Disadvantages of direct observation
The respondent will not act his normal behaviour if he knows that he is being observed.
The data collection process is cumbersome and tedious.
Mechanical observation
Sometimes certain mechanical devices such as video cameras, rather than human observers to observe are used and record customer behaviours. The devices do not require the respondents’ direct participation in the study but they will capture the respondents’ behaviour for analysis. The early application of this technique is in a study to determine the level of comfort among train passengers by taping or recording how they sat and moved in their seats.
Advantages of mechanical observation
It offers high disguise by using the hidden camera. However, other mechanical device such as the use of psycho galvanometers is very difficult to disguise.
Low observation bias since mechanical observation involves the non-human observer.
Disadvantages of mechanical observation
Low ability to observe in a natural setting. However, it depends on the mechanical tools used in the observation. The degree is low when using psycho galvanometer but high if using turnstiles.
Less flexible.
Content analysis
Content analysis is usually used to study communication rather than behaviour, or physical objects. It is defined as the objective, systematic, and quantitative description of the manifest content of a communication. Content analysis obtains data by observing and analyzing the contents or messages of advertisements, newspaper articles, television programs, etc.
It involves analysis as well as observation which systematically analyze people’s communication to identify the specific information contents and other characteristics such as words, characters (individuals or objects), themes (propositions), space and time measures (length or duration of the message), or topics (subject of the message).
Advantages of content analysis
High degree of disguise since the data is collected over the situation to be observed.
High degree of observation specification and measurement.
Disadvantages of content analysis
Low ability to observe in natural setting because observation takes place after the behaviour has occurred.
Potential of observation bias because human observer is involved in data collection process.
5.3 Factors determining choice of survey methods.
Selecting the type of survey you are going to use is one of the most critical decisions in many social research contexts. You’ll see that there are very few simple rules that will make the decision for you — you have to use your judgment to balance the advantages and disadvantages of different survey types. There are several factors needs to be considered:
Population issues
The first set of considerations has to do with the population and its accessibility.
Can the population be specified?
For some populations, you have a complete listing of the units that will be sampled. For others, such a list is difficult or impossible to compile. For instance, there are complete listings of registered voters or person with active drivers’ licenses. But no one keeps a complete list of homeless people. If you are doing a study that requires input from homeless persons, you are very likely going to need to go and find the respondents personally. In such contexts, you can pretty much rule out the idea of mail surveys or telephone interviews.
Is the population literate?
Questionnaires require that your respondents can read. While this might seem initially like a reasonable assumption for many adult populations, we know from recent research that the instance of adult illiteracy is alarmingly high. And, even if your respondents can read to some degree, your questionnaire may contain difficult or technical vocabulary. Clearly, there are some populations that you would expect to be illiterate. Young children would not be good targets for questionnaires
Are there language issues?
We live in a multilingual world. Virtually every society has members who speak other than the predominant language. Can you produce multiple versions of your questionnaire? For mail instruments, can you know in advance the language your respondent speaks, or do you send multiple translations of your instrument? Can you be confident that important connotations in your instrument are not culturally specific? Could some of the important nuances get lost in the process of translating your questions?
Will the population cooperate?
People who do research on immigration issues have a difficult methodological problem. They often need to speak with undocumented immigrants or people who may be able to identify others who are. Why would we expect those respondents to cooperate? Although the researcher may mean no harm, the respondents are at considerable risk legally if information they divulge should get into the hand of the authorities. The same can be said for any target group that is engaging in illegal or unpopular activities.
What are the geographic restrictions?
Is your population of interest dispersed over too broad a geographic range for you to study feasibly with a personal interview? It may be possible for you to send a mail instrument to a nationwide sample. You may be able to conduct phone interviews with them. But it will almost certainly be less feasible to do research that requires interviewers to visit directly with respondents if they are widely dispersed
Sampling issues
The sample is the actual group you will have to contact in some way. There are several important sampling issues you need to consider when doing survey research.
What data is available?
What information do you have about your sample? Do you know their current addresses? Do you have their current phone numbers? Do you have an up to date contact lists?
Can these respondents be found?
Can your respondents be located? Some people are very busy. Some travel a lot. Some work the night shift. Even if you have an accurate phone or address, you may not be able to locate or make contact with your sample.
Who is the respondent?
Who is the respondent in your study? Let’s say you draw a sample of households in a small city. A household is not a respondent. Do you want to interview a specific individual? Do you want to talk only to the “head of household” (and how is that person defined)? Are you willing to talk to any member of the household? Do you state that you will speak to the first adult member of the household who opens the door? What if that person is unwilling to be interviewed but someone else in the house is willing?
Can all members of population be sampled?
If you have an incomplete list of the population (i.e., sampling frame) you may not be able to sample every member of the population. Lists of various groups are extremely hard to keep up to date. People move or change their names. Even though they are on your sampling frame listing, you may not be able to get to them. And, it’s possible they are not even on the list.
Are response rates likely to be a problem?
Even if you are able to solve all of the other population and sampling problems, you still have to deal with the issue of response rates. Some members of your sample will simply refuse to respond. Others have the best of intentions, but can’t seem to find the time to send in your questionnaire by the due date. Still others misplace the instrument or forget about the appointment for an interview. Low response rates are among the most difficult of problems in survey research. They can ruin an otherwise well-designed survey effort
Question issues
Sometimes the nature of what you want to ask respondents will determine the type of survey you select.
What types of questions can be asked?
Are you going to be asking personal questions? Are you going to need to get lots of detail in the responses? Can you anticipate the most frequent or important types of responses and develop reasonable closed-ended questions?
How complex will the questions be?
Sometimes you are dealing with a complex subject or topic. The questions you want to ask are going to have multiple parts. You may need to branch to sub-questions.
Will the screening of questions be needed?
A screening question may be needed to determine whether the respondent is qualified to answer your question of interest. For instance, you wouldn’t want to ask someone their opinions about a specific computer program without first “screening” them to find out whether they have any experience using the program. Sometimes you have to screen on several variables (e.g., age, gender, experience). The more complicated the screening, the less likely it is that you can rely on paper-and-pencil instruments without confusing the respondent.
Can question sequence be controlled?
Is your survey one where you can construct in advance a reasonable sequence of questions? Or, are you doing an initial exploratory study where you may need to ask lots of follow-up questions that you can’t easily anticipate?
Will lengthy questions be asked?
If your subject matter is complicated, you may need to give the respondent some detailed background for a question. Can you reasonably expect your respondent to sit still long enough in a phone interview to ask your question?
Will long response scales be used?
If you are asking people about the different computer equipment they use, you may have to have a lengthy response list (CD-ROM drive, floppy drive, mouse, touch pad, modem, network connection, external speakers, etc.). Clearly, it may be difficult to ask about each of these in a short phone interview.
Content issues
The content of your study can also pose challenges for the different survey types you might utilize.
Can the respondents be expected to know about the issue?
If the respondent does not keep up with the news (e.g., by reading the newspaper, watching television news, or talking with others), they may not even know about the news issue you want to ask them about. Or, if you want to do a study of family finances and you are talking to the spouse who doesn’t pay the bills on a regular basis, they may not have the information to answer your questions.
Will respondent need to consult records?
Even if the respondent understands what you’re asking about, you may need to allow them to consult their records in order to get an accurate answer. For instance, if you ask them how much money they spent on food in the past month, they may need to look up their personal check and credit card records. In this case, you don’t want to be involved in an interview where they would have to go look things up while they keep you waiting (they wouldn’t be comfortable with that).
Bias issues
People come to the research endeavor with their own sets of biases and prejudices. Sometimes, these biases will be less of a problem with certain types of survey approaches.
Can social desirability be avoided?
Respondents generally want to “look good” in the eyes of others. None of us likes to look like we don’t know an answer. We don’t want to say anything that would be embarrassing. If you ask people about information that may put them in this kind of position, they may not tell you the truth, or they may “spin” the response so that it makes them look better. This may be more of a problem in an interview situation where they are face-to face or on the phone with a live interviewer.
Can interviewer distortion and subversion be controlled?
Interviewers may distort an interview as well. They may not ask questions that make them uncomfortable. They may not listen carefully to respondents on topics for which they have strong opinions. They may make the judgment that they already know what the respondent would say to a question based on their prior responses, even though that may not be true.
Can false respondents be avoided?
With mail surveys it may be difficult to know who actually responded. Did the head of household complete the survey or someone else? Did the CEO actually give the responses or instead pass the task off to a subordinate? Is the person you’re speaking with on the phone actually who they say they are? At least with personal interviews, you have a reasonable chance of knowing who you are speaking with. In mail surveys or phone interviews, this may not be the case.
vi) Administrative issues
Last, but certainly not least, you have to consider the feasibility of the survey method for your study.
costs
Cost is often the major determining factor in selecting survey type. You might prefer to do personal inter
 

Secure Data Retrieval Based on Hybrid Encryption

SECURE DATA RETRIEVAL BASED ON HYBRID ENCRYPTION FOR DISRUPTION-TOLERANT NETWORK
Kollipara Durgesh, Dr.P. Sriramya
 
I. ABSTRACT
Military network is one of the most important network in any country but this network mostly suffers from intermittent network connectivity because of the hostile region and the battlefield. To solve the network problem faced by the military network we use Disruption-tolerant network (DTN) technologies which is widely becoming the successful solution. This technology allows the people to communicate with each other to access the confidential data even in the worst network by storing the data in the storage node. Some of the most challenging issues in this scenario are the enforcement of authorization policies and the policies update for secure data retrieval. Two types of encryption are used for the security. The two algorithms are Advanced Encryption Standard (AES) and Java Simplified Encryption (Jasypt). These two algorithms are combined to provide the secure data which is even more difficult to decrypt the confidential data by unauthorized people. In this paper, we propose a secure data retrieval scheme by generating a new secret key each time when the user sends a secure data to the destination, this proposed method enhances the security of the confidential data. We demonstrate how to apply the proposed mechanism to securely and efficiently manage the confidential data distributed in the disruption-tolerant network.
Keywords: Disruption-tolerant network (DTN), Advanced Encryption Standard (AES), Java Simplified Encryption (Jasypt), secure data retrieval
II. INTRODUCTION
In most of the military network it is very difficult for the soldiers and majors to communicate with each other because of the difficult network environment and even if there is no proper to end-to-end connection between the sender and the receiver. Disruption-tolerant network (DTN) are widely used in the networks were there is no proper end-to-end connection between the sender and the receiver. In this paper we choose DTN to communicate between the soldiers and the others. Initially, if the end-to-end connection is missing between the source and destination pair the data from the source node has to wait until the network is recovered in the intermediate node which can be easily hacked by the third party user hence to solve this critical problem we use storage node which is introduced in the Disruption-tolerant network where in only the authorized users can access the respective data.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

Most military data are said to very confidential and hence we use confidential access control methods that are cryptographically enforced. Here we provide different access services for different users that is the admin decides in who access the data based on the respective designation of the users. The registration of the user is completed only if the admin accepts and verifies the users’ account to be valid but if the user is not authorized he will not be allowed to access the data in spite of the registration. For example if the “user 1” sends a data to the “user 2” the data will be encrypted by combining two algorithms such as AES and Jasypt and the resulted data will be encrypted and stored in the storage node where even if there is no end-to-end connection between the source and the destination pair the data will be secured in the storage which can be accessed by the respective authorized person.
111. ALGORITHM
A. Advanced Encryption Standard (AES)
The Advanced Encryption Standard (AES) algorithm is used in this paper to provide secure data retrieval scheme. AES algorithm is chosen to be used in this paper because it is said to be more secured which supports most of the secure retrieval scheme. This algorithm is considered to be more secured because it is more widely used by the U.S. government to protect classified information and is implemented in hardware and software throughout the world to encrypt secure and confidential data.
AES comprises three block ciphers, AES-128, AES-192 and AES-256. Each cipher encrypts and decrypts data in blocks of 128 bits using cryptographic keys of 128-, 192- and 256-bits, respectively. (Rijndael was designed to handle additional block sizes and key lengths, but the functionality was not adopted in AES.) Symmetric or secret-key ciphers use the same key for encrypting and decrypting, so both the sender and the receiver must know and use the same secret key. All key lengths are deemed sufficient to protect classified information up to the “Secret” level with “Top Secret” information requiring either 192- or 256-bit key lengths. There are 10 rounds for 128-bit keys, 12 rounds for 192-bit keys, and 14 rounds for 256-bit keys — a round consists of several processing steps that include substitution, transposition and mixing of the input plain text and transform it into the final output of cipher text.
Various researchers have published attacks against reduced-round versions of the Advanced Encryption Standard, and a research paper published in 2011 demonstrated that using a technique called a biclique attack could recover AES keys faster than a brute-force attack by a factor of between three and five, depending on the cipher version. Even this attack, though, does not threaten the practical use of AES due to its high computational complexity.
In this paper AES is used along with the DTN technologies because of the many security and privacy challenges. Since some users may change their associated attributes at some point (for example, moving their region), or some private keys might be compromised, key revocation (or update) for each attribute is necessary in order to make systems secure. For example, if a user joins or leaves an attribute group, the associated attribute key should be changed and redistributed to all the other members in the group.
B. Java Simplified Encryption (Jasypt)
The other algorithm used in this paper is Java Simplified Encryption (Jasypt), it is chosen for the hybrid encryption for a full secured mode to provide secure data retrieval of confidential data. This algorithm is combined with the AES algorithm to provide hybrid encryption. The final challenge in this paper is to provide a new secret key each time a user sends a secret data to the receiver. The secret key generated is unique and it generates a new key each time, which is even more secured for the secure data retrieval. The admin plays a vital role here to manage the overall source and destination pair but the admin is not authorized to access the information because the secret key is generated automatically which is sent to the receiver’s personal account which is not managed by the admin.

Fig 1. Architecture of secure data retrieval in Disruption Tolerant Network (DTN)
IV. EXISTING SYSTEM
The existing system comprises a concept of attribute-based encryption (ABE) is a promising approach that fulfills the requirements for secure data retrieval in DTNs. ABE features a mechanism that enables an access control over encrypted data using access policies and ascribed attributes among private keys and ciphertexts. Especially, ciphertext-policy ABE (CP-ABE) provides a scalable way of encrypting data such that the encryptor defines the attribute set that the decryptor needs to possess in order to decrypt the ciphertext. Thus, different users are allowed to decrypt different pieces of data per the security policy.
The problem of applying the ABE to DTNs introduces several security and privacy challenges. Since some users may change their associated attributes at some point (for example, moving their region), or some private keys might be compromised, key revocation (or update) for each attribute is necessary in order to make systems secure. However, this issue is even more difficult, especially in ABE systems, since each attribute is conceivably shared by multiple users (henceforth, we refer to such a collection of users as an attribute group).
V. PROPOSED SYSTEM
In the proposed system we use hybrid encryption by combining two algorithms and hence we enhance the security of confidential data. Here the admin keeps in track of all the users account hence even if the attribute of the particular user is changed, the admin makes the notification of the changes. Thus, the disadvantages of the existing system is solved. Unauthorized users who do not have enough credentials satisfying the access policy should be deterred from accessing the plain data in the storage node. In addition, unauthorized access from the storage node or key authorities should be also prevented. If multiple users collude, they may be able to decrypt a ciphertext by combining their attributes even if each of the users cannot decrypt the ciphertext alone.
VI. MODULES

Key Authorities module

The key generation module generates secret key where the hybrid encryption occurs using AES and Jasypt algorithm. This key generation is very efficient because it combines the two encryption and produces the secret code. In this paper to increase the security of the military network, the secret key generated by the hybrid encryption is sent to the users personal email id, so that even the admin who manages the entire network will not be able to access the confidential data.

Storage node module

In the storage node module, the data from the sender is stored even when there is no stable network between the sender and the receiver since, we use Disruption Tolerant Network (DTN). The storage node consists of the encrypted data where only the corresponding receiver can access the respective data. To access the data from the storage node the receiver has to specify the secret code which is generated by the hybrid encryption and is secretly mailed to the receiver.

Sender module

The sender module is the one who holds the confidential data and wishes to store them into the external data storage node for ease of sharing or for reliable delivery to users in the extreme networking environments. A sender is responsible for defining (attribute based) access policy and enforcing it on its own data by encrypting the data under the policy before storing it to the storage node.

Fig 2. Hybrid Encryption of secret message

User Module

This the last module which tends to access the confidential data from the sender which is stored in the storage node. The receiver has to provide the correct secret key which will be sent to his corresponding mail id. If a user possesses a set of attributes satisfying the access policy of the encrypted data defined by the sender, and is not revoked in any of the attributes, then he will be able to decrypt the cipher text and obtain the data.
VII. CONCLUSION
DTN technologies are becoming successful which allows for the communication between devices which do not have stable network and hence this can be more efficiently used in the military network. AES and Jasypt are scalable cryptographic solution to the access control and secure data retrieval. In this paper we proposed efficient data retrieval method using hybrid encryption by combining two algorithms. The encrypted data is then stored in the storage node which can be accessed only by the corresponding user by providing the respective secret key. In addition admin monitors all the attributes of the users which allows fine-grained key revocation for each attribute group. We demonstrate how to apply the proposed mechanism to securely and efficiently manage the confidential data distributed in the disruption-tolerant military network.
VIII. REFERENCES
[1] J. Burgess, B. Gallagher, D. Jensen, and B. N. Levine, “Maxprop: Routing for vehicle-based disruption tolerant networks,” in Proc. IEEE INFOCOM, 2006, pp. 1–11. [2] M. Chuah andP.Yang,“Nodedensity-basedadaptiveroutingscheme for disruption tolerant networks,” in Proc. IEEE MILCOM, 2006, pp. 1–6.
[3] M. M. B. Tariq, M. Ammar, and E. Zequra, “Mesage ferry route design for sparse ad hoc networks with mobile nodes,” in Proc. ACM MobiHoc, 2006, pp. 37–48.
[4] S.Roy and M.Chuah,“Secure data retrieval based on cipher text policy attribute-based encryption (CP-ABE) system for the DTNs,” Lehigh CSE Tech. Rep., 2009.
[5] M. Chuah and P. Yang, “Performance evaluation of content-based information retrieval schemes for DTNs,” in Proc. IEEE MILCOM, 2007, pp. 1–7.
[6] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and K. Fu, “Plutus: Scalable secure file sharing on untrusted storage,” in Proc. Conf. File Storage Technol., 2003, pp. 29–42.
[7] L. Ibraimi, M. Petkovic, S. Nikova, P. Hartel, and W. Jonker, “Mediated ciphertext-policy attribute-based encryption and its application,” in Proc.WISA, 2009, LNCS 5932, pp. 309–323.
[8] N. Chen, M. Gerla, D. Huang, and X. Hong, “Secure, selective group broadcast in vehicular networks using dynamic attribute based encryption,” in Proc. Ad Hoc Netw. Workshop, 2010, pp. 1–8.
[9] D. Huang and M. Verma, “ASPE: Attribute-based secure policy enforcement in vehicular adho cnetworks,” AdHocNetw.,vol.7,no.8, pp. 1526–1535, 2009.
[10]A.LewkoandB.Waters,“Decentralizing attribute-based encryption,” Cryptology ePrint Archive: Rep. 2010/351, 2010.
[11] A. Sahai and B. Waters, “Fuzzy identity-based encryption,” in Proc. Eurocrypt, 2005, pp. 457–473.
[12] V. Goyal, O. Pandey, A. Sahai, and B. Waters, “Attribute-based encryption for fine-grained access control of encrypted data,” in Proc.ACMConf.Comput.Commun.Security,2006,pp.89–98.
[13] J. Bethencourt, A. Sahai, and B. Waters, “Ciphertext-policy attributebased encryption,” in Proc. IEEE Symp. Security Privacy, 2007, pp. 321–334.
[14] R. Ostrovsky, A. Sahai, and B. Waters, “Attribute-based encryption with non-monotonic access structures,” in Proc. ACM Conf. Comput. Commun. Security, 2007, pp. 195–203.
[15] S. Yu, C. Wang, K. Ren, and W. Lou, “Attribute based data sharing with attribute revocation,”in Proc.ASIACCS,2010,pp.261–270.
[16] A. Boldyreva, V. Goyal, and V. Kumar, “Identity-based encryption with efficient revocation,”inProc.ACMConf.Comput.Commun.Security, 2008, pp. 417–426.
[17]M.Pirretti,P.Traynor,P.McDaniel,andB.Waters,“Secure attribute based systems,”inProc.ACMConf.Comput.Commun.Security,2006, pp. 99–112.
[18]S.RafaeliandD.Hutchison,“A survey of key management for secure group communication,” Comput. Surv., vol. 35, no. 3, pp. 309–329, 2003.
[19] S. Mittra, “Iolus: A framework for scalable secure multicasting,” in Proc. ACM SIGCOMM, 1997, pp. 277–288.
[20] P.Golle, J.Staddon, M. Gagne,and P. Rasmussen,“A content-driven access control system,” in Proc. Symp. Identity Trust Internet, 2008, pp. 26–35.
[21] L.Cheungand C.Newport,“Provably secure cipher text policy ABE,” inProc.ACMConf.Comput.Commun.Security,2007,pp.456–465.
[22] V.Goyal, A.Jain,O.Pandey, and A.Sahai,“Bounded cipher text policy attribute-based encryption,”inProc.ICALP,2008,pp.579–591.