Big Data and Data Mining


“What is big data?

Big data is term for large volume of data that is either structured or unstructured.

Where does all this data coming from?

Every day 2.5 Quintillion bytes of data are created.90% of the data in the world today has been created in the last two years alone.(https://www-01.ibm.com/software/data/bigdata/what-is-big-data.html)We are generating a massive amount of data from many sources, from ourselves and our daily lives.When we shop with credit cards, using on-line systems, social networking,sharing videos,capturing traffic flow, measuring pollution, using security cameras, and so on.Most of the data have been developed by governments, NGOs, research institutions, or companies.Every single things that we do leaves a digital trace.

Why are we so interested in Big Data?

First of all data without analysis has little value for us.Organisations ,businesses, healthcare, government etc.all recognized the importance of data analysis. Big data analytics are used to identify trends, detect patterns and glean other valuable findings from the massive amount of information that is available.
Big data present opportunities and challenges for businesses.As I mentioned before to extract value from big data, first it needs to be processed and analyzed.The goal is to use analytics to improve the efficiency and effectiveness of every decision and/or action.Download Big Data Pdf

How does Big Data Analytics changing our life?

Lets look at our healthcare system first.The big data revolution has changed the way how healthcare institutions are able to analyze electronic medical records and extract information to determine future patterns and trends.There are a vast of information and source available that can be also useful for predict and prevent health problem related issues.Organizations are gathering data from social media, surveys, mobile apps and able to predict for example outbreaks of disease such as flu, cancer by population and locations or other epidemics in real time.
I have decided to test the internet if there is any way to find out what are the chances for a flue outbreak where I live.I came across a website called flue near you.And there it was…Fortunately I don’t have to worry much about flu at the moment.
Flu near you is also available in mobile apps too.The users report get collected and analyzed based on how they feel and than results are mapped.

Flu near you
Big Data can also be used to forecast weather or it can provide information about climate change.These information can be used by scientist and government to take actions to protect populations from disasters.Advertising companies also take advantage of analyzing consumer behaviors and trends to best tailor and target potential customers.Today basically no area in life that it can`t be measured or analyzed , potentially the sky’s the limit.

What is Data Mining

“An analytic process designed to explore data (usually large amounts of data – typically business or market related – also known as “big data”) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data”.Usman, Muhammad ( 3 Aug 2015)

Stages of Data Mining.

•Pre-processing

•Model building or pattern identification

•Model evaluation 

•Deployment

Preprocessing is basically transforming raw ,unstructured data into usable format for the purpose of the user. These tasks in data preprocessing are data cleaning, data integration, data transformation, datareduction and data discretization.
Data Modelling
It is a technique of building a model in one situation where you know the answer and then applying it to another situation that you don’t.Once the model is built it can then be used in similar situations where you don’t know the answer.
Data modeling contains a series of structured steps.
•Establish the scope of the data model.

•Identify the ‘things of interest’ that are within the scope.

•Determine the relationships between them.

(a good example tool for data modelling would be ERD-Entity Relationship diagram).

Model evaluation 
It is part of the model development process.It helps to determine the best model that represent our data and help to predict how well our data will work in the future.
Deployment
Deployment refers to the application of a model for prediction to new data.

Data mining Tools

There are many Data analytics tools to choose from.If you are working with Big Data you have think about what is the right tool for your project with the right amount of storage space and management.Here are a few example questions before you choose what tool to use.
Is it Open source/do they offer free trial?
How much does it cost?
Does it work with Big Data?
Compatible with other products?
How hard or easy to use?
Is there any technical support provided?
Can work in batch ?
Platform-independent?
Support programming language such as C++, Python, Java, Perl rather than internal, ad-hoc language?
Easy to upgrade?
Speed of computations, efficiency in the way memory is used?(Vincent Granville. (May 18, 2013))
List of data mining and learning analytics tools can be found here.Tools

References
Usman, Muhammad ( 3 Aug 2015). My library My History Books on Google Play Improving Knowledge Discovery through the Integration of Data Mining Techniques. Pakistan: IGI Global. 306.
Vincent Granville. (May 18, 2013). 27 criteria to choose analytic tools. Available: http://www.datasciencecentral.com/forum/topics/how-to-choose-an-analytic-tool. Last accessed 27/03/2017.

Leave a comment

Your email address will not be published. Required fields are marked *