BigData Problem And the Solution

Vanshita Mittal
6 min readSep 16, 2020

--

In this article, I am going to explain to you the core concept of Big data and the Solution of Big Data problems.

Huge data storage problem :

  • As daily billions of user use internet, so huge amount of data is to be stored
  • We are currently in a data-driven economy where no organization can survive without analyzing the current and future trends. Whether it is a manufacturing firm or a retail chain, wrangling data has become a crucial job to be done before taking a single step further.
  • In social media chats, messages, photos everything is needed to be stored for user’s convenience. But how can this be a problem
  • Businesses like Netflix, Youtube, Amazon prime and other video streaming platforms use user data to give them video recommendation according to their choice. They also use user data for sending them ads related to their usage.

What Is Big Data?

Gartner’s definition, circa 2001 (which is still the go-to definition): Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is known as the three Vs.

The Three Vs of Big Data :

Volume:

The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.

Velocity:

Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.

Variety:

Variety refers to the many types of data that are available. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.

Big Data is not a technology we can say that it is just a umbrella of problems which occurs because of huge amount of data and in different formats. As data collected is very large and is available in different formats, the data is termed as BIG DATA

Types of Big-Data :

Big Data is generally categorized into three different varieties. They are as shown below:

  • Structured Data
  • Semi-Structured Data
  • Unstructured Data

How Big Data Works

Big data gives you new insights that open up new opportunities and business models. Getting started involves three key actions:

1. Integrate

Big data brings together data from many disparate sources and applications. Traditional data integration mechanisms, such as ETL (extract, transform, and load) generally aren’t up to the task. It requires new strategies and technologies to analyze big data sets at terabyte, or even petabyte, scale.

2. Manage

Big data requires storage. Your storage solution can be in the cloud, on premises, or both. You can store your data in any form you want and bring your desired processing requirements and necessary process engines to those data sets on an on-demand basis. Many people choose their storage solution according to where their data is currently residing. The cloud is gradually gaining popularity because it supports your current compute requirements and enables you to spin up resources as needed.

3. Analyze

Your investment in big data pays off when you analyze and act on your data. Get new clarity with a visual analysis of your varied data sets. Explore the data further to make new discoveries. Share your findings with others. Build data models with machine learning and artificial intelligence. Put your data to work.

Examples of how some MNCs are using Big Data Analytics :

1. Netflix

The entertainment streaming service has an abundance of information and examination, giving knowledge into the survey propensities for many global customers. Netflix utilizes this information to commission unique programming content that interests all around just as acquiring the rights to movies and arrangement boxsets that they realize will perform well with specific crowds.

For instance, Adam Sandler has demonstrated disliked in the US and UK showcases as of late, yet Netflix green-lit four new films with the on-screen character in 2015, equipped with the information that his past work had been effective in Latin America.

2. Amazon

The online retail goliath has got access to a gigantic measure of information on its clients; names, locations, installments, and search accounts are altogether documented in its information bank. While this data is put to use in publicizing calculations, Amazon likewise utilizes the data to improve client relations, a region that numerous big data users disregard.

Whenever you contact the Amazon help work area with an inquiry, don’t be astounded when the worker on the opposite end has already received a large portion of the relevant data about you close by. The applicable data takes into consideration a quicker, progressively practical client administration experience that does exclude illuminating your name multiple times.

Solution: Distributed Storage.

Distributed Storage System:

A Distributed Storage is an infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

Example:

If a machine receives about 400 GB of data and suppose it has about 100 GB of storage , so we can’t store that data in that machine for that we have to buy a storage device of about 400GB but the issue is that the I/O operations would be slower.

So what we can do, simply divide that data into multiple chunks and distribute to different nodes and store it over the network, this kind of topology solves the issue of I/O up to much extent and also the Volume problem as we can store the data parallelly in the nodes in the cluster.

Top 10 Open Source Big Data Tools in 2020 :

  1. Hadoop
  2. Apache Spark
  3. Apache Storm
  4. Cassandra
  5. RapidMiner
  6. MongoDB
  7. R Programming Tool
  8. Neo4j
  9. Apache SAMOA
  10. HPCC

Thanks for reading !!!

--

--

No responses yet