|
3ddd56d…
|
wyoung
|
1 |
# Fossil and the CAP Theorem |
|
3ddd56d…
|
wyoung
|
2 |
|
|
3ddd56d…
|
wyoung
|
3 |
[The CAP theorem][cap] is a fundamental mathematical proof about |
|
3ddd56d…
|
wyoung
|
4 |
distributed systems. A software system can no more get around it than a |
|
3ddd56d…
|
wyoung
|
5 |
physical system can get past *c*, the [speed of light][sol] constant. |
|
3ddd56d…
|
wyoung
|
6 |
|
|
3ddd56d…
|
wyoung
|
7 |
Fossil is a distributed system, so it can be useful to think about it in |
|
3ddd56d…
|
wyoung
|
8 |
terms of the CAP theorem. We won’t discuss the theorem itself or how you |
|
3ddd56d…
|
wyoung
|
9 |
reason using its results here. For that, we recommend [this article][tut]. |
|
3ddd56d…
|
wyoung
|
10 |
|
|
3ddd56d…
|
wyoung
|
11 |
[cap]: https://en.wikipedia.org/wiki/CAP_theorem |
|
3ddd56d…
|
wyoung
|
12 |
[sol]: https://en.wikipedia.org/wiki/Speed_of_light |
|
3ddd56d…
|
wyoung
|
13 |
[tut]: https://www.ibm.com/cloud/learn/cap-theorem |
|
3ddd56d…
|
wyoung
|
14 |
|
|
3ddd56d…
|
wyoung
|
15 |
|
|
991b089…
|
wyoung
|
16 |
<a id="ap"></a> |
|
3ddd56d…
|
wyoung
|
17 |
## Fossil Is an AP-Mode System |
|
3ddd56d…
|
wyoung
|
18 |
|
|
3ddd56d…
|
wyoung
|
19 |
As with all common [DVCSes][dvcs], Fossil is an AP-mode system, meaning |
|
3ddd56d…
|
wyoung
|
20 |
that your local clone isn’t necessarily consistent with all other clones |
|
3ddd56d…
|
wyoung
|
21 |
(C), but the system is always available for use (A) and |
|
3ddd56d…
|
wyoung
|
22 |
partition-tolerant (P). This is what allows you to turn off Fossil’s |
|
3ddd56d…
|
wyoung
|
23 |
autosync mode, go off-network, and continue working with Fossil, even |
|
3ddd56d…
|
wyoung
|
24 |
though only a single node (your local repo clone) is accessible at the |
|
3ddd56d…
|
wyoung
|
25 |
time. |
|
3ddd56d…
|
wyoung
|
26 |
|
|
3ddd56d…
|
wyoung
|
27 |
You may consider that going back online restores “C”, because upon sync, |
|
3ddd56d…
|
wyoung
|
28 |
you’re now consistent with the repo you cloned from. But, if another |
|
3ddd56d…
|
wyoung
|
29 |
user has gone offline in the meantime, and they’ve made commits to their |
|
67a4c1d…
|
wyoung
|
30 |
disconnected repo, *you* aren’t consistent with *them.* Besides which, |
|
67a4c1d…
|
wyoung
|
31 |
if another user commits to the central repo, that doesn’t push the |
|
67a4c1d…
|
wyoung
|
32 |
change down to you automatically: even if all users of a Fossil system |
|
67a4c1d…
|
wyoung
|
33 |
are online at the same instant, and they’re all using autosync, Fossil |
|
67a4c1d…
|
wyoung
|
34 |
doesn’t guarantee consistency across the network. |
|
3ddd56d…
|
wyoung
|
35 |
|
|
3ddd56d…
|
wyoung
|
36 |
There’s no getting around the CAP theorem! |
|
3ddd56d…
|
wyoung
|
37 |
|
|
3ddd56d…
|
wyoung
|
38 |
[dvcs]: https://en.wikipedia.org/wiki/Distributed_version_control |
|
3ddd56d…
|
wyoung
|
39 |
|
|
3ddd56d…
|
wyoung
|
40 |
|
|
991b089…
|
wyoung
|
41 |
<a id="ca"></a> |
|
3ddd56d…
|
wyoung
|
42 |
## CA-Mode Fossil |
|
3ddd56d…
|
wyoung
|
43 |
|
|
3ddd56d…
|
wyoung
|
44 |
What would it mean to redesign Fossil to be CA-mode? |
|
3ddd56d…
|
wyoung
|
45 |
|
|
3ddd56d…
|
wyoung
|
46 |
It means we get a system that is always consistent (C) and available (A) |
|
3ddd56d…
|
wyoung
|
47 |
as long as there are no partitions (P). |
|
3ddd56d…
|
wyoung
|
48 |
|
|
3ddd56d…
|
wyoung
|
49 |
That’s basically [CVS] and [Subversion][svn]: you can only continue |
|
67a4c1d…
|
wyoung
|
50 |
working with the repository itself as long as your connection to the central repo server functions. |
|
3ddd56d…
|
wyoung
|
51 |
|
|
67a4c1d…
|
wyoung
|
52 |
It’s rather trivial to talk about single-point-of-failure systems like |
|
67a4c1d…
|
wyoung
|
53 |
CVS or Subversion as |
|
3ddd56d…
|
wyoung
|
54 |
CA-mode. Another common example used this way is a classical RDBMS, but |
|
3ddd56d…
|
wyoung
|
55 |
aren’t we here to talk about distributed systems? What’s a good example |
|
3ddd56d…
|
wyoung
|
56 |
of a *distributed* CA-mode system? |
|
3ddd56d…
|
wyoung
|
57 |
|
|
3ddd56d…
|
wyoung
|
58 |
A better example is [Kafka], which in its default configuration assumes |
|
3ddd56d…
|
wyoung
|
59 |
it being run on a corporate LAN in a single data center, so network |
|
67a4c1d…
|
wyoung
|
60 |
partitions are exceedingly rare. It therefore sacrifices partition |
|
67a4c1d…
|
wyoung
|
61 |
tolerance to get the advantages of CA-mode operation. In its particular application of |
|
67a4c1d…
|
wyoung
|
62 |
this mode, a |
|
3ddd56d…
|
wyoung
|
63 |
message isn’t “committed” until all running brokers have a copy of it, |
|
67a4c1d…
|
wyoung
|
64 |
at which point the message becomes visible to the client(s). In that |
|
67a4c1d…
|
wyoung
|
65 |
way, all clients always see the same message store as long as all of the |
|
67a4c1d…
|
wyoung
|
66 |
Kafka servers are up and communicating. |
|
3ddd56d…
|
wyoung
|
67 |
|
|
3ddd56d…
|
wyoung
|
68 |
How would that work in Fossil terms? |
|
3ddd56d…
|
wyoung
|
69 |
|
|
3ddd56d…
|
wyoung
|
70 |
If there is only one central server and I clone it on my local laptop, |
|
3ddd56d…
|
wyoung
|
71 |
then CA mode means I can only commit if the remote Fossil is available, |
|
3ddd56d…
|
wyoung
|
72 |
so in that sense, it devolves to the old CVS model. |
|
3ddd56d…
|
wyoung
|
73 |
|
|
3ddd56d…
|
wyoung
|
74 |
What if there are three clones? Perhaps there is a central server *A*, |
|
3ddd56d…
|
wyoung
|
75 |
the clone *B* on my laptop, and the clone *C* on your laptop. Doesn’t CA |
|
3ddd56d…
|
wyoung
|
76 |
mode now mean that my commit on *B* doesn’t exist after I commit it to |
|
3ddd56d…
|
wyoung
|
77 |
the central repo *A* until you, my coworker, *also* pull down the copy |
|
3ddd56d…
|
wyoung
|
78 |
of that commit to your laptop *C*, validating the commit through the |
|
3ddd56d…
|
wyoung
|
79 |
network? |
|
3ddd56d…
|
wyoung
|
80 |
|
|
3ddd56d…
|
wyoung
|
81 |
That’s one way to design the system, but another way would be to scope |
|
3ddd56d…
|
wyoung
|
82 |
the system to only talk about proper *servers*, not about the clients. |
|
3ddd56d…
|
wyoung
|
83 |
In that model, a CA-mode Fossil alternative might require 2+ servers to |
|
3ddd56d…
|
wyoung
|
84 |
be running for proper replication. When I make a commit, if all of the |
|
3ddd56d…
|
wyoung
|
85 |
configured servers aren’t online, I can’t commit. This is basically CVS |
|
3ddd56d…
|
wyoung
|
86 |
with replication, but without any useful amount of failover. |
|
3ddd56d…
|
wyoung
|
87 |
|
|
3ddd56d…
|
wyoung
|
88 |
[CVS]: https://en.wikipedia.org/wiki/Concurrent_Versions_System |
|
3ddd56d…
|
wyoung
|
89 |
[Kafka]: https://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka |
|
3ddd56d…
|
wyoung
|
90 |
[svn]: https://en.wikipedia.org/wiki/Apache_Subversion |
|
3ddd56d…
|
wyoung
|
91 |
|
|
3ddd56d…
|
wyoung
|
92 |
|
|
991b089…
|
wyoung
|
93 |
<a id="cp"></a> |
|
3ddd56d…
|
wyoung
|
94 |
## CP-Mode Fossil |
|
3ddd56d…
|
wyoung
|
95 |
|
|
3ddd56d…
|
wyoung
|
96 |
What if we modify our CA-mode system above with “warm spares”? We can |
|
3ddd56d…
|
wyoung
|
97 |
say that commits must go to all of the spares as well as the active |
|
3ddd56d…
|
wyoung
|
98 |
servers, but a loss of one active server requires that one warm spare |
|
3ddd56d…
|
wyoung
|
99 |
come into active state, and all of the clients learn that the spare is |
|
3ddd56d…
|
wyoung
|
100 |
now considered “active.” At this point, you have a CP-mode system, not a |
|
3ddd56d…
|
wyoung
|
101 |
CA-mode system, because it’s now partition-tolerant (P) but it becomes |
|
67a4c1d…
|
wyoung
|
102 |
unavailable when there aren’t enough active servers or warm |
|
3ddd56d…
|
wyoung
|
103 |
spares to promote to active status. |
|
3ddd56d…
|
wyoung
|
104 |
|
|
3ddd56d…
|
wyoung
|
105 |
CP is your classical [BFT] style distributed consensus system, where the |
|
3ddd56d…
|
wyoung
|
106 |
system is available only if the client can contact a *majority* of the |
|
3ddd56d…
|
wyoung
|
107 |
servers. This is a formalization of the warm spare concept above: with |
|
3ddd56d…
|
wyoung
|
108 |
*N* server nodes, you need at least ⌊*N* / 2⌋ + 1 of them to be online |
|
3ddd56d…
|
wyoung
|
109 |
for a commit to succeed. |
|
3ddd56d…
|
wyoung
|
110 |
|
|
8a9facb…
|
wyoung
|
111 |
Many distributed database systems run in CP mode because consistency (C) and |
|
3ddd56d…
|
wyoung
|
112 |
partition-tolerance (P) is a useful combination. What you lose is |
|
3ddd56d…
|
wyoung
|
113 |
always-available (A) operation: with a suitably bad partition, the |
|
3ddd56d…
|
wyoung
|
114 |
system goes down for users on the small side of that partition. |
|
3ddd56d…
|
wyoung
|
115 |
|
|
3ddd56d…
|
wyoung
|
116 |
An optional CP mode for Fossil would be attractive in some ways since in |
|
3ddd56d…
|
wyoung
|
117 |
some sense Fossil is a distributed DBMS, but in practical terms, it |
|
3ddd56d…
|
wyoung
|
118 |
means Fossil would then not be a [DVCS] in the most useful sense, being |
|
3ddd56d…
|
wyoung
|
119 |
that you could work while your client is disconnected from the remote |
|
3ddd56d…
|
wyoung
|
120 |
Fossil it cloned from. |
|
3ddd56d…
|
wyoung
|
121 |
|
|
3ddd56d…
|
wyoung
|
122 |
A fraught question is whether the non-server Fossil clones count as |
|
3ddd56d…
|
wyoung
|
123 |
“nodes” in this sense. |
|
3ddd56d…
|
wyoung
|
124 |
|
|
3ddd56d…
|
wyoung
|
125 |
If they do count, then if there are only two systems, the central server |
|
3ddd56d…
|
wyoung
|
126 |
and the clone on my laptop, then it stands to reason from the formula |
|
3ddd56d…
|
wyoung
|
127 |
above that I can only commit if the central server is available. In that |
|
3ddd56d…
|
wyoung
|
128 |
scheme, a CP-mode Fossil is basically like CVS. |
|
3ddd56d…
|
wyoung
|
129 |
|
|
3ddd56d…
|
wyoung
|
130 |
But what happens if my company hires a coworker to help me with the |
|
3ddd56d…
|
wyoung
|
131 |
project, and this person makes their own clone of the central repo? The |
|
3ddd56d…
|
wyoung
|
132 |
equation says I still need 2 nodes to be available for a commit, so if |
|
3ddd56d…
|
wyoung
|
133 |
my new coworker goes off-network, that doesn’t affect whether I can make |
|
3ddd56d…
|
wyoung
|
134 |
commits. Likewise, if I go off-network, my coworker can make commits to |
|
3ddd56d…
|
wyoung
|
135 |
the central server. |
|
3ddd56d…
|
wyoung
|
136 |
|
|
3ddd56d…
|
wyoung
|
137 |
But what happens if the central server goes down? The equation says we |
|
3ddd56d…
|
wyoung
|
138 |
still have 2 nodes, so we should be able to commit, right? Sure, but |
|
3ddd56d…
|
wyoung
|
139 |
only if my laptop and communicate directly to my coworker’s laptop! If |
|
3ddd56d…
|
wyoung
|
140 |
it can’t, that’s also a network partition, so *N=1* on both sides in |
|
3ddd56d…
|
wyoung
|
141 |
that case. The implication is that for a true CP-mode Fossil, we’d need |
|
3ddd56d…
|
wyoung
|
142 |
some kind of peer-to-peer networking layer so that our laptops can |
|
3ddd56d…
|
wyoung
|
143 |
accept commits from the other, so that when the central server comes |
|
3ddd56d…
|
wyoung
|
144 |
online, one of us can send the results up to it to get it caught up. |
|
3ddd56d…
|
wyoung
|
145 |
|
|
3ddd56d…
|
wyoung
|
146 |
But doesn’t that then mean there is no security? How does [Fossil’s RBAC |
|
3ddd56d…
|
wyoung
|
147 |
system][caps] work if peer-to-peer commits are allowed? |
|
3ddd56d…
|
wyoung
|
148 |
|
|
3ddd56d…
|
wyoung
|
149 |
You can instead reconceptualize the system as “node” meaning only server |
|
3ddd56d…
|
wyoung
|
150 |
nodes, so that client-only systems don’t count. This allows you to have |
|
3ddd56d…
|
wyoung
|
151 |
an RBAC system again. |
|
3ddd56d…
|
wyoung
|
152 |
|
|
3ddd56d…
|
wyoung
|
153 |
With just one central server, ⌊1/2⌋+1=1, so you get CVS-like behavior: |
|
3ddd56d…
|
wyoung
|
154 |
if the server’s up, you can commit. |
|
3ddd56d…
|
wyoung
|
155 |
|
|
3ddd56d…
|
wyoung
|
156 |
If you set up 2 servers for redundancy, both must be up for commits to |
|
3ddd56d…
|
wyoung
|
157 |
be allowed, since otherwise you could end up with half the commits going |
|
3ddd56d…
|
wyoung
|
158 |
to the server on one side of a network partition, half going to the |
|
3ddd56d…
|
wyoung
|
159 |
other, and no way to arbitrate among the two once the partition is |
|
3ddd56d…
|
wyoung
|
160 |
lifted. |
|
3ddd56d…
|
wyoung
|
161 |
|
|
3ddd56d…
|
wyoung
|
162 |
(Today’s AP-mode Fossil has this capability, but the necessary cost is |
|
3ddd56d…
|
wyoung
|
163 |
“C”, consistency! Once again, you can’t get around the CAP theorem.) |
|
3ddd56d…
|
wyoung
|
164 |
|
|
3ddd56d…
|
wyoung
|
165 |
3 servers is more sensible: any client that can see at least 2 of them |
|
3ddd56d…
|
wyoung
|
166 |
can commit. |
|
3ddd56d…
|
wyoung
|
167 |
|
|
3ddd56d…
|
wyoung
|
168 |
Will there ever be a CP-mode Fossil? This author doubts it, but as I’ve |
|
3ddd56d…
|
wyoung
|
169 |
shown, it would be useful in contexts where you’d rather have a |
|
3ddd56d…
|
wyoung
|
170 |
guarantee of consistency than availability. |
|
3ddd56d…
|
wyoung
|
171 |
|
|
3ddd56d…
|
wyoung
|
172 |
[BFT]: https://en.wikipedia.org/wiki/Byzantine_fault |
|
3ddd56d…
|
wyoung
|
173 |
[caps]: ./caps/ |