Big Data – Part 1 – Introduction


The term “big data” was first used in mid-1990s, by John Mashey, a Chief Scientist at Silicon Graphics to refer to analysis of massive data sets.

Here are the 9 major charactersitics of big data:

  1. Volume: Big data are huge in volume. This mamoth amount of data is being generated by mobile devices, instruments, devices at traffic signals, devices in medical use, data collected by e-commerce entities, government departments, businesses etc, and these are generating petabytes (PB) of data on hourly basis. 1 PB = 2^50 bytes.
  2. Variety: Data of different types is being generated, processed, and analysed. This data is structured, unstructured, text, picture, video, audio, location (spatial), temporal (time dependent), and more.
  3. Velocity: Data is being created in or near real-time. How fast the data is coming in is important to take timely decisions. E.g., data collected by IoT devices, traffic cameras, medical devices, social media data (tweets, posts, YouTube videos, etc), weather data, data in stock markets, etc.
  4. Value: Data per se is not useful. To what use that data can be put is important. The amount of valuable and reliable data that has be stored, processed, and analyzed to find insights, is important.
  5. Relational: Big data has common fields that enable joining of different data sets.
  6. Flexible: New fields can be added easily to capture additional traits or properties. For example, in an e-commerce domain, data about past purchases, clicks, other sites visited (collected from cookies stored on users’ device) all this can be collected and used to sell a product.
  7. Scalable: Can be rapidly expanded in size. More data can be added that adds value to the quality of data
  8. Exhaustive: Data is exhaustive in scope. Capture data of entire systems or population. Amount of data captured is far more than what traditional methods would permit under sample data.
  9. Resolution: Big data is fine-grained and allows as much detail as possible to be accessed.

In another post we will look at other aspects of big data.

Advertisements



Categories: Blog, Computer Science

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: