Table of Contents
Table of Contents

What Is Big Data? Definition, How It Works, and Uses

Big Data

Investopedia / Ellen Lindner

What Is Big Data?

Big data refers to large, diverse sets of information that grow at ever-increasing rates. The term encompasses the volume of information, the velocity or speed at which it is created and collected, and the variety or scope of the data points being covered (commonly known as the "Three V's" of big data). Big data provides the raw material used in data mining.

Key Takeaways

  • Big data involves a great quantity of diverse information that arrives in increasing volumes and with ever-higher velocity.
  • Big data can be either structured (often numeric, easily formatted, and stored) or unstructured (more free-form, less quantifiable).
  • Nearly every department in a company can utilize findings from big data analysis, but handling its clutter and noise can pose problems.
  • Big data can be collected from social networks and websites, from personal electronics, through questionnaires, product purchases, and electronic check-ins, among many other sources. It is sometimes collected with the user's consent, and sometimes not, often raising privacy concerns.
  • Big data is typically stored electronically and analyzed using software specifically designed to handle large, complex data sets.

How Big Data Works

Big data is often categorized as either structured or unstructured. Structured data typically consists of information held by the organization in easily accessed databases and spreadsheets; it is frequently numeric.

Unstructured data can be more qualitative in nature and is not as readily organized. According to IBM, examples of unstructured data may include "text, mobile activity, social media posts, Internet of Things (IoT) sensor data, among others."

There is also a third, in-between category of semi-structured data, which has some of the characteristics of each.

Whether structured, unstructured, or semi-structured, big data is collected in numerous ways. It can be obtained through questionnaires, product purchases on websites or at point-of-sale (POS) terminals, electronic check-ins, and users' personal electronics and apps, to name just a few.

Big data is typically stored electronically in what are sometimes referred to as data warehouses or data lakes. It is analyzed using software specifically designed to handle large, complex data sets. Many software-as-a-service (SaaS) companies specialize in managing this type of complex data.

Note

Many major tech companies, such as Alphabet (formerly Google) and Meta (formerly Facebook), use big data to generate advertising revenue by delivering targeted ads to users on social media platforms and websites.

The Uses of Big Data

Data analysts look at the relationship between different types of data, such as demographic data and purchase history, to determine whether a correlation exists.

Such assessments may be done in-house or externally by a third party that focuses on processing big data into digestible formats. Businesses often use the assessment of big data by such experts to turn it into actionable information.

Nearly every department in a company can utilize findings from data analysis, from human resources to production to marketing and sales.

The goals of big data can be to increase the speed at which products get to market, to reduce the amount of time and resources required to gain market adoption, to target the right audiences, and to keep customers coming back for more.

With the amount of personal data available on individuals today, it is crucial that companies take effective steps to safeguard it. This has become a topic of hot debate in recent years, particularly given the many highly publicized data breaches that companies (and their customers) have experienced.

Advantages and Disadvantages of Big Data

The increasing amount of data available today presents both opportunities and problems. In general, having more data on customers (and potential customers) should allow companies to better tailor their products and marketing efforts to deliver what customers want. This should benefit both producers and consumers.

While better analysis is a positive, big data can also create overload and noise, reducing its usefulness. Companies must handle ever-larger volumes of data and determine which data represents signals as opposed to noise. Determining at the outset what data may be relevant can be a key factor in deciding what data to analyze.

Furthermore, the nature and format of the data can require special handling before it is ready to be acted upon. Structured data, often consisting of numeric values, can be easily stored and sorted.

Unstructured data, which might come in the form of emails, videos, and text documents, may require that more sophisticated techniques be applied before they become useful.

What Is Predictive Analytics?

Predictive analytics refers to the collection and analysis of current and historical data to develop and refine models for forecasting future outcomes. Predictive analytics is widely used in business and finance as well as in fields like weather forecasting, and it relies heavily on big data.

What Is Data Mining?

Data mining can be defined as the process through which big data is turned into useful information, by looking for relevant patterns and trends.

What Is a Data Warehouse vs. a Data Lake?

A data warehouse refers to the place where a business or other organization stores its big data for analysis. A data warehouse can reside in the owner's in-house servers, with an outside specialist company, or in the cloud, and is most commonly associated with structured data. A data lake is a newer term for repositories that can accommodate both structured and unstructured data, as well as semi-structured data.

What Is the Cloud?

The cloud refers to networks of data servers where organizations or individuals can rent space to store large volumes of data. Cloud services have become a big business with the rise of big data, and major players in the field today include Amazon's Amazon Web Services, Microsoft's Azure, and Alphabet's Google Cloud, among others.

What Is the Role of Artificial Intelligence in Big Data?

Artificial intelligence can be useful in the analysis of big data. At the same time, big data is being used to train artificial intelligence to make it more effective.

The Bottom Line

Big data is only getting bigger. While it has demonstrated its usefulness in many fields, it has also raised serious privacy concerns over how it is collected and used as well as its potential vulnerability to cyber attacks and data breaches.

Article Sources
Investopedia requires writers to use primary sources to support their work. These include white papers, government data, original reporting, and interviews with industry experts. We also reference original research from other reputable publishers where appropriate. You can learn more about the standards we follow in producing accurate, unbiased content in our editorial policy.
  1. IBM. "Structured vs. Unstructured Data."

  2. SAS Institute Inc. "Big Data, What It Is and Why It Matters."