Nowadays a lot of massive data is stored and typically, the data itself contains a lot of non-trivial but useful information. Data mining techniques can be used to discover this information which can help the companies for decision-making. However, in real life applications, data is massive and is stored over distributed sites. One of my major research topics is to protect privacy over this kind of data. Previously, the important characteristics, issues and challenges related to management of the large amount of data has been explored. Various open source data analytics frameworks that deal with large amount of Data analytics workloads have been discussed. Comparative study between the given frameworks and suitability of the same has been proposed. Digital universe is flooded with huge amount of data generated by number of users worldwide. These data are of diverse in nature, come from various sources and in many forms. To keep with the desire to store and analyze ever larger volumes of complex data, relational databases vendors have delivered specialized analytical platforms that come in many shapes and sizes from software only to analytical services that run in third party hosted environments. In addition new technologies have emerged to address exploding volumes of complex data, including web traffic, social media content and machine generated data including sensor data, global positioning system data.

Big data is defined as large amount of data which requires new technologies and architectures so that it becomes possible to extract value from it by capturing and analysis process. Due to such large size of data it becomes very difficult to perform effective analysis using the existing traditional techniques. Big data has become a prominent research field, especially when it comes to decision making and data analysis. However, Big data due to its various properties like volume, velocity, variety, variability, value and complexity put forward many challenges. Since Big data is a recent upcoming technology in the market which can bring huge benefits to the business organizations, it becomes necessary that various challenges and issues associated in bringing and adapting to this technology are brought into light. Another challenge is that data collection may not have enough accuracy which will lead to a non-consistent analysis that can critically affect the decision based on this analysis. Moreover, it is clearly apparent that organizations need to employ data-driven decision making to gain competitive advantage. Processing, integrating and interacting with more data should make it better data, providing both more panoramic and more granular views to aid strategic decision making. This is made possible via Big Data exploiting affordable and usable Computational and Storage Resources. Many offerings are based on the Map-Reduce and Hadoop paradigms and most focus solely on the analytical side. Nonetheless, in many respects it remains unclear what Big Data actually is; current offerings appear as isolated silos that are difficult to integrate and/or make it difficult to better utilize existing data and systems. Since data is growing at a huge speed making it difficult to handle such large amount of data (Exabyte). The main difficulty in handling such large amount of data is because that the volume is increasing rapidly in comparison to the computing resources. The Big data term which is being used now a days is kind of misnomer as it points out only the size of the data not putting too much of attention to its other existing properties.

If data is to be used to make accurate decisions in time it becomes necessary that it should be available in accurate, complete and timely manner. This makes the data management and governance process bit complex adding the necessity to make Data open and make it available to government agencies in standardized manner with standardized APIs, metadata and formats thus leading to better decision making, business intelligence and productivity improvements.

This paper presents a discussion and evaluation for the most prominent techniques used in the processes of data collection and analysis in order to identify the privacy defects in them that affects the accuracy of big data. Depending on the results of this analysis, recommendations were provided for improving data collection and analysis techniques that will help to avoid if not all then most of the problems facing the use of big data in decision making. Big Data, Big Data Challenges, Big Data Accuracy, Big Data Collection, Big Data Analytics.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error