Fossil SCM

fossil-scm / www / cap-theorem.md
Source Blame History 173 lines
3ddd56d… wyoung 1 # Fossil and the CAP Theorem
3ddd56d… wyoung 2
3ddd56d… wyoung 3 [The CAP theorem][cap] is a fundamental mathematical proof about
3ddd56d… wyoung 4 distributed systems. A software system can no more get around it than a
3ddd56d… wyoung 5 physical system can get past *c*, the [speed of light][sol] constant.
3ddd56d… wyoung 6
3ddd56d… wyoung 7 Fossil is a distributed system, so it can be useful to think about it in
3ddd56d… wyoung 8 terms of the CAP theorem. We won’t discuss the theorem itself or how you
3ddd56d… wyoung 9 reason using its results here. For that, we recommend [this article][tut].
3ddd56d… wyoung 10
3ddd56d… wyoung 11 [cap]: https://en.wikipedia.org/wiki/CAP_theorem
3ddd56d… wyoung 12 [sol]: https://en.wikipedia.org/wiki/Speed_of_light
3ddd56d… wyoung 13 [tut]: https://www.ibm.com/cloud/learn/cap-theorem
3ddd56d… wyoung 14
3ddd56d… wyoung 15
991b089… wyoung 16 <a id="ap"></a>
3ddd56d… wyoung 17 ## Fossil Is an AP-Mode System
3ddd56d… wyoung 18
3ddd56d… wyoung 19 As with all common [DVCSes][dvcs], Fossil is an AP-mode system, meaning
3ddd56d… wyoung 20 that your local clone isn’t necessarily consistent with all other clones
3ddd56d… wyoung 21 (C), but the system is always available for use (A) and
3ddd56d… wyoung 22 partition-tolerant (P). This is what allows you to turn off Fossil’s
3ddd56d… wyoung 23 autosync mode, go off-network, and continue working with Fossil, even
3ddd56d… wyoung 24 though only a single node (your local repo clone) is accessible at the
3ddd56d… wyoung 25 time.
3ddd56d… wyoung 26
3ddd56d… wyoung 27 You may consider that going back online restores “C”, because upon sync,
3ddd56d… wyoung 28 you’re now consistent with the repo you cloned from. But, if another
3ddd56d… wyoung 29 user has gone offline in the meantime, and they’ve made commits to their
67a4c1d… wyoung 30 disconnected repo, *you* aren’t consistent with *them.* Besides which,
67a4c1d… wyoung 31 if another user commits to the central repo, that doesn’t push the
67a4c1d… wyoung 32 change down to you automatically: even if all users of a Fossil system
67a4c1d… wyoung 33 are online at the same instant, and they’re all using autosync, Fossil
67a4c1d… wyoung 34 doesn’t guarantee consistency across the network.
3ddd56d… wyoung 35
3ddd56d… wyoung 36 There’s no getting around the CAP theorem!
3ddd56d… wyoung 37
3ddd56d… wyoung 38 [dvcs]: https://en.wikipedia.org/wiki/Distributed_version_control
3ddd56d… wyoung 39
3ddd56d… wyoung 40
991b089… wyoung 41 <a id="ca"></a>
3ddd56d… wyoung 42 ## CA-Mode Fossil
3ddd56d… wyoung 43
3ddd56d… wyoung 44 What would it mean to redesign Fossil to be CA-mode?
3ddd56d… wyoung 45
3ddd56d… wyoung 46 It means we get a system that is always consistent (C) and available (A)
3ddd56d… wyoung 47 as long as there are no partitions (P).
3ddd56d… wyoung 48
3ddd56d… wyoung 49 That’s basically [CVS] and [Subversion][svn]: you can only continue
67a4c1d… wyoung 50 working with the repository itself as long as your connection to the central repo server functions.
3ddd56d… wyoung 51
67a4c1d… wyoung 52 It’s rather trivial to talk about single-point-of-failure systems like
67a4c1d… wyoung 53 CVS or Subversion as
3ddd56d… wyoung 54 CA-mode. Another common example used this way is a classical RDBMS, but
3ddd56d… wyoung 55 aren’t we here to talk about distributed systems? What’s a good example
3ddd56d… wyoung 56 of a *distributed* CA-mode system?
3ddd56d… wyoung 57
3ddd56d… wyoung 58 A better example is [Kafka], which in its default configuration assumes
3ddd56d… wyoung 59 it being run on a corporate LAN in a single data center, so network
67a4c1d… wyoung 60 partitions are exceedingly rare. It therefore sacrifices partition
67a4c1d… wyoung 61 tolerance to get the advantages of CA-mode operation. In its particular application of
67a4c1d… wyoung 62 this mode, a
3ddd56d… wyoung 63 message isn’t “committed” until all running brokers have a copy of it,
67a4c1d… wyoung 64 at which point the message becomes visible to the client(s). In that
67a4c1d… wyoung 65 way, all clients always see the same message store as long as all of the
67a4c1d… wyoung 66 Kafka servers are up and communicating.
3ddd56d… wyoung 67
3ddd56d… wyoung 68 How would that work in Fossil terms?
3ddd56d… wyoung 69
3ddd56d… wyoung 70 If there is only one central server and I clone it on my local laptop,
3ddd56d… wyoung 71 then CA mode means I can only commit if the remote Fossil is available,
3ddd56d… wyoung 72 so in that sense, it devolves to the old CVS model.
3ddd56d… wyoung 73
3ddd56d… wyoung 74 What if there are three clones? Perhaps there is a central server *A*,
3ddd56d… wyoung 75 the clone *B* on my laptop, and the clone *C* on your laptop. Doesn’t CA
3ddd56d… wyoung 76 mode now mean that my commit on *B* doesn’t exist after I commit it to
3ddd56d… wyoung 77 the central repo *A* until you, my coworker, *also* pull down the copy
3ddd56d… wyoung 78 of that commit to your laptop *C*, validating the commit through the
3ddd56d… wyoung 79 network?
3ddd56d… wyoung 80
3ddd56d… wyoung 81 That’s one way to design the system, but another way would be to scope
3ddd56d… wyoung 82 the system to only talk about proper *servers*, not about the clients.
3ddd56d… wyoung 83 In that model, a CA-mode Fossil alternative might require 2+ servers to
3ddd56d… wyoung 84 be running for proper replication. When I make a commit, if all of the
3ddd56d… wyoung 85 configured servers aren’t online, I can’t commit. This is basically CVS
3ddd56d… wyoung 86 with replication, but without any useful amount of failover.
3ddd56d… wyoung 87
3ddd56d… wyoung 88 [CVS]: https://en.wikipedia.org/wiki/Concurrent_Versions_System
3ddd56d… wyoung 89 [Kafka]: https://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka
3ddd56d… wyoung 90 [svn]: https://en.wikipedia.org/wiki/Apache_Subversion
3ddd56d… wyoung 91
3ddd56d… wyoung 92
991b089… wyoung 93 <a id="cp"></a>
3ddd56d… wyoung 94 ## CP-Mode Fossil
3ddd56d… wyoung 95
3ddd56d… wyoung 96 What if we modify our CA-mode system above with “warm spares”? We can
3ddd56d… wyoung 97 say that commits must go to all of the spares as well as the active
3ddd56d… wyoung 98 servers, but a loss of one active server requires that one warm spare
3ddd56d… wyoung 99 come into active state, and all of the clients learn that the spare is
3ddd56d… wyoung 100 now considered “active.” At this point, you have a CP-mode system, not a
3ddd56d… wyoung 101 CA-mode system, because it’s now partition-tolerant (P) but it becomes
67a4c1d… wyoung 102 unavailable when there aren’t enough active servers or warm
3ddd56d… wyoung 103 spares to promote to active status.
3ddd56d… wyoung 104
3ddd56d… wyoung 105 CP is your classical [BFT] style distributed consensus system, where the
3ddd56d… wyoung 106 system is available only if the client can contact a *majority* of the
3ddd56d… wyoung 107 servers. This is a formalization of the warm spare concept above: with
3ddd56d… wyoung 108 *N* server nodes, you need at least ⌊*N* / 2⌋ + 1 of them to be online
3ddd56d… wyoung 109 for a commit to succeed.
3ddd56d… wyoung 110
8a9facb… wyoung 111 Many distributed database systems run in CP mode because consistency (C) and
3ddd56d… wyoung 112 partition-tolerance (P) is a useful combination. What you lose is
3ddd56d… wyoung 113 always-available (A) operation: with a suitably bad partition, the
3ddd56d… wyoung 114 system goes down for users on the small side of that partition.
3ddd56d… wyoung 115
3ddd56d… wyoung 116 An optional CP mode for Fossil would be attractive in some ways since in
3ddd56d… wyoung 117 some sense Fossil is a distributed DBMS, but in practical terms, it
3ddd56d… wyoung 118 means Fossil would then not be a [DVCS] in the most useful sense, being
3ddd56d… wyoung 119 that you could work while your client is disconnected from the remote
3ddd56d… wyoung 120 Fossil it cloned from.
3ddd56d… wyoung 121
3ddd56d… wyoung 122 A fraught question is whether the non-server Fossil clones count as
3ddd56d… wyoung 123 “nodes” in this sense.
3ddd56d… wyoung 124
3ddd56d… wyoung 125 If they do count, then if there are only two systems, the central server
3ddd56d… wyoung 126 and the clone on my laptop, then it stands to reason from the formula
3ddd56d… wyoung 127 above that I can only commit if the central server is available. In that
3ddd56d… wyoung 128 scheme, a CP-mode Fossil is basically like CVS.
3ddd56d… wyoung 129
3ddd56d… wyoung 130 But what happens if my company hires a coworker to help me with the
3ddd56d… wyoung 131 project, and this person makes their own clone of the central repo? The
3ddd56d… wyoung 132 equation says I still need 2 nodes to be available for a commit, so if
3ddd56d… wyoung 133 my new coworker goes off-network, that doesn’t affect whether I can make
3ddd56d… wyoung 134 commits. Likewise, if I go off-network, my coworker can make commits to
3ddd56d… wyoung 135 the central server.
3ddd56d… wyoung 136
3ddd56d… wyoung 137 But what happens if the central server goes down? The equation says we
3ddd56d… wyoung 138 still have 2 nodes, so we should be able to commit, right? Sure, but
3ddd56d… wyoung 139 only if my laptop and communicate directly to my coworker’s laptop! If
3ddd56d… wyoung 140 it can’t, that’s also a network partition, so *N=1* on both sides in
3ddd56d… wyoung 141 that case. The implication is that for a true CP-mode Fossil, we’d need
3ddd56d… wyoung 142 some kind of peer-to-peer networking layer so that our laptops can
3ddd56d… wyoung 143 accept commits from the other, so that when the central server comes
3ddd56d… wyoung 144 online, one of us can send the results up to it to get it caught up.
3ddd56d… wyoung 145
3ddd56d… wyoung 146 But doesn’t that then mean there is no security? How does [Fossil’s RBAC
3ddd56d… wyoung 147 system][caps] work if peer-to-peer commits are allowed?
3ddd56d… wyoung 148
3ddd56d… wyoung 149 You can instead reconceptualize the system as “node” meaning only server
3ddd56d… wyoung 150 nodes, so that client-only systems don’t count. This allows you to have
3ddd56d… wyoung 151 an RBAC system again.
3ddd56d… wyoung 152
3ddd56d… wyoung 153 With just one central server, ⌊1/2⌋+1=1, so you get CVS-like behavior:
3ddd56d… wyoung 154 if the server’s up, you can commit.
3ddd56d… wyoung 155
3ddd56d… wyoung 156 If you set up 2 servers for redundancy, both must be up for commits to
3ddd56d… wyoung 157 be allowed, since otherwise you could end up with half the commits going
3ddd56d… wyoung 158 to the server on one side of a network partition, half going to the
3ddd56d… wyoung 159 other, and no way to arbitrate among the two once the partition is
3ddd56d… wyoung 160 lifted.
3ddd56d… wyoung 161
3ddd56d… wyoung 162 (Today’s AP-mode Fossil has this capability, but the necessary cost is
3ddd56d… wyoung 163 “C”, consistency! Once again, you can’t get around the CAP theorem.)
3ddd56d… wyoung 164
3ddd56d… wyoung 165 3 servers is more sensible: any client that can see at least 2 of them
3ddd56d… wyoung 166 can commit.
3ddd56d… wyoung 167
3ddd56d… wyoung 168 Will there ever be a CP-mode Fossil? This author doubts it, but as I’ve
3ddd56d… wyoung 169 shown, it would be useful in contexts where you’d rather have a
3ddd56d… wyoung 170 guarantee of consistency than availability.
3ddd56d… wyoung 171
3ddd56d… wyoung 172 [BFT]: https://en.wikipedia.org/wiki/Byzantine_fault
3ddd56d… wyoung 173 [caps]: ./caps/

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button