Skip to main content
Version: 1.0

Data Milky Way: A Brief History (Part 1) - OLTP vs. OLAP

History belongs in the past; but understanding it is the duty of the present

  • Shashi Tharoor

Basic Definitions

data-milky-way.png

Databases, NoSQL, Data Warehouses, Data Lakes, Spark…it’s too much!

You've probably come across at least a few of these terms, but you might be wondering what these technologies do and how do they fit together with one another in the vast landscape of Data Engineering and AI.

To answer that question, we must first understand if the data is OLTP or OLAP in nature.

OLTP vs. OLAP

OLTP (Online Transactional Processing)

Makes sure that the systems can keep up with high volumes of transactions but often very small and fast in nature (e.g. online banking, FinTech application)

Some use cases:

  • application databases
  • SQL and NoSQL ("Not Only SQL")
  • Hint: lots of diff tech in OLTP

OLAP (Online Analytical Processing)

Makes sure that you can crunch through millions or billions of rows of data for your complex and large theories, where they need to run some fancy aggregation or calcuations for data Analytics purposes

Some use cases:

  • Data Warehouse = a way to implement data models in a data base, to cater to OLAP style workloads and even within OLAP, the story doesn't end with warehouses.
  • Data Lakes = within the past 5-10 years, data lakes have strongly emerged as another predominent tech in the space

Watch: Understand your workload first: OLTP vs. OLAP

Read: The Difference Between OLTP and OLAP

Application Databases: NoSQL vs SQL

Database vs. Data Warehouse

Watch: Database vs. Data Warehouse

Read: The Difference Between a Database and a Data Warehouse

Predominant OLAP Technologies

Tech Choice is driven by Business Needs

  • When do I need this data?
    • Primary driver = Latency. How fast or frequent should my queries be?
  • How many people need this data?
    • Who are the users of my data? Data analysts? Data Scientists?
    • Effects costs (scaling, hosting)
    • Concurrency
    • HINT: Costs + Concurrency = why data lakes emerged
  • How many different ways do I want to visualize the data?
    • Dashboards? Machine Learning algorithms?
  • How much does it cost me to maintain?

Focus of this course - OLAP workloads

Why are we focusing more on OLAP than OLTP?

  • OLTP databases (sometimes interchangeably referred to as application databases) are often an advanced topic as they concern stability and performance of your operations / live applications
  • OLTP data models and tech stacks for different problems/businesses vary a lot more than those for OLAP workloads!
  • When people/customers talk about developing Data & AI, Analytics, Data Science, Machine Learning, they’re most likely referring to OLAP-style workloads.

However, feel free to check out some prominent NoSQL technologies: