This section takes about 24 minutes to complete.
Data Milky Way: A Brief History (Part 1) - OLTP vs. OLAP
We will walk through the history of data processing via a video lecture. Check out Part 1 below:
History belongs in the past; but understanding it is the duty of the present
- Shashi Tharoor
Basic Definitions
Databases, NoSQL, Data Warehouses, Data Lakes, Spark…and many other terms in this domain space!
You've probably come across at least a few of these terms, but you might be wondering what these technologies do and how do they fit together with one another in the vast landscape of Data Engineering and AI.
We'll get started with understanding if the data is OLTP or OLAP in nature.
OLTP vs. OLAP
OLTP (Online Transactional Processing)
Makes sure that the systems can keep up with high volumes of transactions but often very small and fast in nature (e.g. online banking, FinTech application)
Typical use cases and implementations:
- Application databases
- Caches
- SQL and NoSQL ("Not Only SQL") - lots of diverse technologies in the OLTP space.
Bonus Content: Application Databases: NoSQL vs SQL
If you do not know the key differences between SQL and NoSQL, have a look at the following overview (~10m detour): NoSQL vs SQL
OLAP (Online Analytical Processing)
Makes sure that you can crunch through millions or billions of rows of data for your complex and large theories, where they need to run some fancy aggregation or calculations for data Analytics purposes
Typical use cases:
- Data Warehouses: A way to implement data models in a database to cater to OLAP style workloads.
- Data Lakes: Have emerged strongly within the last 5-10 years to more exclusively match the requirements of large data workloads.
Watch: Understand your workload first: OLTP vs. OLAP
From Database to Data Lake
Step 1: Database to Data Warehouse
Please have a look at either the video or the article.
Watch: Database vs. Data Warehouse
Read: The Difference Between a Database and a Data Warehouse
Step 2: Data Warehouse to Data Lake
Let's take it one step further and have a look at the more commonly found implementation today, the Data Lake.
Focus of this course: OLAP workloads
Why are we focusing more on OLAP than OLTP?
- OLTP databases (sometimes interchangeably referred to as application databases) are a topic concerning stability and performance of your operations / live applications
- OLTP data models and tech stacks for different problems/businesses vary a lot more than those for OLAP workloads!
- When people/customers talk about developing Data & AI, Analytics, Data Science, Machine Learning, they’re most likely referring to OLAP-style workloads.