Data Milky Way: Distributed Data Systems (NoSQL & CAP)
We talked a little about NoSQL earlier. Let's just skim the surface and talk about some important concepts driving it.
NoSQL Technologies
Watch: "Introduction to NoSQL" by Martin Fowler (watch until 10:35, the rest is optional). (Direct link in case the embed above does not work: https://www.youtube.com/watch?v=qI_g07C_Q5I)
CAP Theorem
Watch: "CAP Theorem Illustrated" by Mark Richards
Summary: Pick Two:
- C (consistency)
- A (availability)
- P (partition tolerance)
But is it really true in the real world?
- What unrealistic assumption are we making here?
- Can we really assume that network communications won’t fail?
- Is there really such thing as a distributed system that won’t have partitions?
- In Reality: not a binary choice between C and A
- Several NoSQL solutions offer a tunable tradeoff between C and A
Further Reading (optional): What is the CAP Theorem?
Summary
a CP system will say “sorry, I can’t be sure yet” to the client, in order to avoid giving an out-of-date answer
an AP system tries to spits out an answer even if it might not be the most up-to-date one
From Wikipedia