Skip to main content
Version: 1.0

Data Milky Way: Distributed Data Systems (NoSQL & CAP)

We talked a little about NoSQL earlier. Let's just skim the surface and talk about some important concepts driving it.

NoSQL Technologies

Watch: "Introduction to NoSQL" by Martin Fowler (watch until 10:35, the rest is optional). (Direct link in case the embed above does not work: https://www.youtube.com/watch?v=qI_g07C_Q5I)

CAP Theorem

Watch: "CAP Theorem Illustrated" by Mark Richards

Summary: Pick Two:

  • C (consistency)
  • A (availability)
  • P (partition tolerance)

But is it really true in the real world?

  • What unrealistic assumption are we making here?
    • Can we really assume that network communications won’t fail?
    • Is there really such thing as a distributed system that won’t have partitions?
  • In Reality: not a binary choice between C and A
  • Several NoSQL solutions offer a tunable tradeoff between C and A

Further Reading (optional): What is the CAP Theorem?

Summary

a CP system will say “sorry, I can’t be sure yet” to the client, in order to avoid giving an out-of-date answer

cap.png

an AP system tries to spits out an answer even if it might not be the most up-to-date one