Fossil SCM

fossil-scm / www / rebaseharm.md

Source Rendered

Blame History Raw 523 lines

1	`# Rebase Considered Harmful`
2
3	`Fossil deliberately omits a "rebase" command because the original`
4	`designer of Fossil (and [original author][vhist] of this article) considers rebase to be`
5	`an anti-pattern to be avoided. This article attempts to`
6	`explain that point of view.`
7
8	`[vhist]: /finfo?name=www/rebaseharm.md&ubg`
9
10	`## 1.0 Rebasing is dangerous`
11
12	`Most people, even strident advocates of rebase, agree that rebase can`
13	`cause problems when misused. The Git rebase documentation talks about the`
14	`[golden rule of rebasing][golden]: never rebase on a public`
15	`branch. Horror stories of misused rebase abound, and the rebase`
16	`documentation devotes considerable space toward explaining how to`
17	`recover from rebase errors and/or misuse.`
18
19	`## <a id="cap-loss"></a>2.0 Rebase provides no new capabilities`
20
21	`Sometimes sharp and dangerous tools are justified,`
22	`because they accomplish things that cannot be`
23	`done otherwise, or at least cannot be done easily.`
24	`Rebase does not fall into that category,`
25	`because it provides no new capabilities.`
26
27	`### <a id="orphaning"></a>2.1 A rebase is just a merge with historical references omitted`
28
29	`A rebase is really nothing more than a merge (or a series of merges)`
30	`that deliberately forgets one of the parents of each merge step.`
31	`To help illustrate this fact,`
32	`consider the first rebase example from the`
33	`[Git documentation][gitrebase]. The merge looks like this:`
34
35	`~~~ pikchr toggle center`
36	`scale = 0.8`
37	`circle "C0" fit`
38	`arrow right 50%`
39	`circle same "C1"`
40	`arrow same`
41	`circle same "C2"`
42	`arrow same`
43	`circle same "C3"`
44	`arrow same`
45	`circle same "C5"`
46	`circle same "C4" at 1cm above C3`
47	`arrow from C2 to C4 chop`
48	`arrow from C4 to C5 chop`
49	`~~~`
50
51	`And the rebase looks like this:`
52
53	`~~~ pikchr toggle center`
54	`scale = 0.8`
55	`circle "C0" fit`
56	`arrow right 50%`
57	`circle same "C1"`
58	`arrow same`
59	`circle same "C2"`
60	`arrow same`
61	`circle same "C3"`
62	`arrow same`
63	`circle same "C4'"`
64	`circle same "C4" at 1cm above C3`
65	`arrow from C2 to C4 chop`
66	`~~~`
67
68	`As the [Git documentation][gitrebase] points out, check-ins C4\' and C5`
69	`are identical. The only difference between C4\' and C5 is that C5`
70	`records the fact that C4 is its merge parent but C4\' does not.`
71
72	`Thus, a rebase is just a merge that forgets where it came from.`
73
74	`The Git documentation acknowledges this fact (in so many words) and`
75	`justifies it by saying "rebasing makes for a cleaner history." I read`
76	`that sentence as a tacit admission that the Git history display`
77	`capabilities are weak and need active assistance from the user to`
78	`keep things manageable.`
79	`Surely a better approach is to record`
80	`the complete ancestry of every check-in but then fix the tool to show`
81	`a "clean" history in those instances where a simplified display is`
82	`desirable and edifying, but retain the option to show the real,`
83	`complete, messy history for cases where detail and accuracy are more`
84	`important.`
85
86	`So, another way of thinking about rebase is that it is a kind of`
87	`merge that intentionally forgets some details in order to`
88	`not overwhelm the weak history display mechanisms available in Git.`
89	`Wouldn't it be better, less error-prone, and easier on users`
90	`to enhance the history display mechanisms in Git so that rebasing`
91	`for a clean, linear history became unnecessary?`
92
93	`### <a id="clean-diffs"></a>2.2 Rebase does not actually provide better feature-branch diffs`
94
95	`Another argument, often cited, is that rebasing a feature branch`
96	`allows one to see just the changes in the feature branch without`
97	`the concurrent changes in the main line of development.`
98	`Consider a hypothetical case:`
99
100	`~~~ pikchr toggle center`
101	`scale = 0.8`
102	`circle "C0" fit fill white`
103	`arrow right 50%`
104	`circle same "C1"`
105	`arrow same`
106	`circle same "C2"`
107	`arrow same`
108	`circle same "C4"`
109	`arrow same`
110	`circle same "C6"`
111	`circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2`
112	`arrow right 50%`
113	`circle same "C5"`
114	`arrow from C2 to C3 chop`
115	`box ht C3.y-C2.y wid C6.e.x-C0.w.x+1.5*C1.rad at C2 behind C0 fill 0xc6e2ff color 0xaac5df`
116	`box ht previous.ht wid previous.wid*0.55 with .se at previous.ne \`
117	`behind C0 fill 0x9accfc color 0xaac5df`
118	`text "feature" with .s at previous.n`
119	`text "main" with .n at first box.s`
120	`~~~`
121
122	`In the above, a feature branch consisting of check-ins C3 and C5 is`
123	`run concurrently with the main line in check-ins C4 and C6. Advocates`
124	`for rebase say that you should rebase the feature branch to the tip`
125	`of main in order to remove main-line development differences from`
126	`the feature branch's history:`
127
128	`~~~ pikchr toggle center`
129	`# Duplicated below in section 5.0`
130	`scale = 0.8`
131	`circle "C0" fit fill white`
132	`arrow right 50%`
133	`circle same "C1"`
134	`arrow same`
135	`circle same "C2"`
136	`arrow same`
137	`circle same "C4"`
138	`arrow same`
139	`circle same "C6"`
140	`circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2`
141	`arrow right 50%`
142	`circle same "C5"`
143	`arrow from C2 to C3 chop`
144	`C3P: circle same "C3'" at first arrow.width + C0.rad*2 heading 30 from C6`
145	`arrow right 50% from C3P.e`
146	`C5P: circle same "C5'"`
147	`arrow from C6 to C3P chop`
148
149	`box ht C3.y-C2.y wid C5P.e.x-C0.w.x+1.5C1.rad with .w at 0.5(first arrow.wid) west of C0.w \`
150	`behind C0 fill 0xc6e2ff color 0xaac5df`
151	`box ht previous.ht wid previous.e.x - C2.w.x with .se at previous.ne \`
152	`behind C0 fill 0x9accfc color 0xaac5df`
153	`~~~`
154
155
156	`You could choose to collapse C3\' and C5\' into a single check-in`
157	`as part of this rebase, but that's a side issue we'll deal with`
158	`[separately](#collapsing).`
159
160	`Because Fossil purposefully lacks rebase, the closest you can get to this same check-in`
161	`history is the following merge:`
162
163	`~~~ pikchr toggle center`
164	`scale = 0.8`
165	`circle "C0" fit fill white`
166	`arrow right 50%`
167	`circle same "C1"`
168	`arrow same`
169	`circle same "C2"`
170	`arrow same`
171	`circle same "C4"`
172	`arrow same`
173	`circle same "C6"`
174	`circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2`
175	`arrow right 50%`
176	`circle same "C5"`
177	`arrow same`
178	`circle same "C7"`
179	`arrow from C2 to C3 chop`
180	`arrow from C6 to C7 chop`
181
182	`box ht C3.y-C2.y wid C7.e.x-C0.w.x+1.5C1.rad with .w at 0.5(first arrow.wid) west of C0.w \`
183	`behind C0 fill 0xc6e2ff color 0xaac5df`
184	`box ht previous.ht wid previous.e.x - C2.w.x with .se at previous.ne \`
185	`behind C0 fill 0x9accfc color 0xaac5df`
186	`~~~`
187
188
189	`Check-ins C5\' and C7 check-ins hold identical code. The only`
190	`difference is in their history.`
191
192	`The argument from rebase advocates`
193	`is that with merge it is difficult to see only the changes associated`
194	`with the feature branch without the commingled mainline changes.`
195	`In other words, diff(C2,C7) shows changes from both the feature`
196	`branch and from the mainline, whereas in the rebase case`
197	`diff(C6,C5\') shows only the feature branch changes.`
198
199	`But that argument is comparing apples to oranges, since the two diffs`
200	`do not have the same baseline. The correct way to see only the feature`
201	`branch changes in the merge case is not diff(C2,C7) but rather diff(C6,C7).`
202
203	`<div align=center>`
204
205	`\| Rebase \| Merge \| What You See \|`
206	`\|---------------\|-------------\|----------------------------------------\|`
207	`\| diff(C2,C5\') \| diff(C2,C7) \| Commingled branch and mainline changes \|`
208	`\| diff(C6,C5\') \| diff(C6,C7) \| Branch changes only \|`
209
210	`</div>`
211
212	`Remember: C7 and C5\' are bit-for-bit identical, so the output of the`
213	`diff is not determined by whether you select C7 or C5\' as the target`
214	`of the diff, but rather by your choice of the diff source, C2 or C6.`
215
216	`So, to help with the problem of viewing changes associated with a feature`
217	`branch, perhaps what is needed is not rebase but rather better tools to`
218	`help users identify an appropriate baseline for their diffs.`
219
220	`## <a id="siloing"></a>3.0 Rebase encourages siloed development`
221
222	`The [golden rule of rebasing][golden] is that you should never do it`
223	`on public branches, so if you are using rebase as intended, that means`
224	`you are keeping private branches. Or, to put it another way, you are`
225	`doing siloed development. You are not sharing your intermediate work`
226	`with collaborators. This is not good for product quality.`
227
228	`[Nagappan, et. al][nagappan] studied bugs in Windows Vista and found`
229	`that the best predictor of bugs is the distance on the org-chart between`
230	`the stake-holders. The bug rate is inversely related to the`
231	`amount of communication among the engineers.`
232	`Similar findings arise in other disciplines. Keeping`
233	`private branches does not prove that developers are communicating`
234	`insufficiently, but it is a key symptom of that problem.`
235
236	`[Weinberg][weinberg] argues programming should be "egoless." That`
237	`is to say, programmers should avoid linking their code with their sense of`
238	`self, as that makes it more difficult for them to find and respond`
239	`to bugs, and hence makes them less productive. Many developers are`
240	`drawn to private branches out of sense of ego. "I want to get the`
241	`code right before I publish it." I sympathize with this sentiment,`
242	`and am frequently guilty of it myself. It is humbling to display`
243	`your stupid mistake to the whole world on an Internet that`
244	`never forgets. And yet, humble programmers generate better code.`
245
246	`What is the fastest path to solid code? Is it to continue staring at`
247	`your private branch to seek out every last bug, or is it to publish it`
248	`as-is, whereupon the many eyeballs will immediately see that last stupid`
249	`error in the code? Testing and development are often done by separate`
250	`groups within a larger software development organization, because`
251	`developers get too close to their own code to see every problem in it.`
252
253	`Given that, is it better for those many eyeballs to find your problems`
254	`while they're still isolated on a feature branch, or should that vetting`
255	`wait until you finally push a collapsed version of a private working`
256	`branch to the parent repo? Will the many eyeballs even see those errors`
257	`when they’re intermingled with code implementing some compelling new feature?`
258
259	`## <a id="timestamps"></a>4.0 Rebase causes timestamp confusion`
260
261	`Consider the earlier example of rebasing a feature branch:`
262
263	`~~~ pikchr toggle center`
264	`# Copy of second diagram in section 2.2 above`
265	`scale = 0.8`
266	`circle "C0" fit fill white`
267	`arrow right 50%`
268	`circle same "C1"`
269	`arrow same`
270	`circle same "C2"`
271	`arrow same`
272	`circle same "C4"`
273	`arrow same`
274	`circle same "C6"`
275	`circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2`
276	`arrow right 50%`
277	`circle same "C5"`
278	`arrow from C2 to C3 chop`
279	`C3P: circle same "C3'" at first arrow.width + C0.rad*2 heading 30 from C6`
280	`arrow right 50% from C3P.e`
281	`C5P: circle same "C5'"`
282	`arrow from C6 to C3P chop`
283
284	`box ht C3.y-C2.y wid C5P.e.x-C0.w.x+1.5C1.rad with .w at 0.5(first arrow.wid) west of C0.w \`
285	`behind C0 fill 0xc6e2ff color 0xaac5df`
286	`box ht previous.ht wid previous.e.x - C2.w.x with .se at previous.ne \`
287	`behind C0 fill 0x9accfc color 0xaac5df`
288	`~~~`
289
290	`What timestamps go on the C3\' and C5\' check-ins? If you choose`
291	`the same timestamps as the original C3 and C5, then you have the`
292	`odd situation C3' is older than its parent C6. We call that a`
293	`"timewarp" in Fossil. Timewarps can also happen due to misconfigured`
294	`system clocks, so they are not unique to rebase, but they are very`
295	`confusing and so best avoided. The other option is to provide new`
296	`unique timestamps for C3' and C5' but then you lose the information`
297	`about when those check-ins were originally created, which can make`
298	`historical analysis of changes more difficult. It might also`
299	`complicate the legal defense of prior art claims.`
300
301	`## <a id="lying"></a>5.0 Rebase misrepresents the project history`
302
303	`By discarding parentage information, rebase attempts to deceive the`
304	`reader about how the code actually came together.`
305
306	`Git’s rebase feature is more than just an`
307	`alternative to merging: it also provides mechanisms for changing the`
308	`project history in order to make editorial changes. Fossil shows that`
309	`you can get similar effects without modifying historical records,`
310	`allowing users to:`
311
312	`1. Edit check-in comments to fix typos or enhance clarity`
313	`2. Attach supplemental notes to check-ins or whole branches`
314	`3. Hide ill-conceived or now-unused branches from routine display`
315	`4. Fix faulty check-in date/times resulting from misconfigured`
316	`system clocks`
317	`5. Cross-reference check-ins with each other, or with`
318	`wiki, tickets, forum posts, and/or embedded documentation`
319
320	`…and so forth.`
321
322	`Fossil allows all of this not by removing or modifying existing`
323	`repository entries, but rather by adding new supplemental records.`
324	`Fossil keeps the original incorrect or unclear inputs and makes them`
325	`readily accessible, preserving the original historical record. Fossil`
326	`doesn’t make the user tell counter-factual “stories,” it only allows the`
327	`user to provide annotations to provide a more readable edited`
328	`presentation for routine display purposes.`
329
330	`Git needs rebase because it lacks these annotation facilities. Rather`
331	`than consider rebase a desirable feature missing in Fossil, ask instead`
332	`why Git lacks support for making editorial changes to check-ins without`
333	`modifying history? Wouldn't it be better to fix the version control`
334	`tool rather than requiring users to fabricate a fictitious project`
335	`history?`
336
337	`## <a id="collapsing"></a>6.0 Collapsing check-ins throws away valuable information`
338
339	`One of the oft-cited advantages of rebasing in Git is that it lets you`
340	`collapse multiple check-ins down to a single check-in to make the`
341	`development history “clean.” The intent is that development appear as`
342	`though every feature were created in a single step: no multi-step`
343	`evolution, no back-tracking, no false starts, no mistakes. This ignores`
344	`actual developer psychology: ideas rarely spring forth from fingers to`
345	`files in faultless finished form. A wish for collapsed, finalized`
346	`check-ins is a wish for a counterfactual situation.`
347
348	`The common counterargument is that collapsed check-ins represent a`
349	`better world, the ideal we're striving for. What that argument overlooks`
350	`is that we must throw away valuable information to get there.`
351
352	`### <a id="empathy"></a>6.1 Individual check-ins support mutual understanding`
353
354	`Ideally, future developers of our software can understand every feature`
355	`in it using only context available in the version of the code they start`
356	`work with. Prior to widespread version control, developers had no choice`
357	`but to work that way. Pre-existing codebases could only be understood`
358	`as-is or not at all. Developers in that world had an incentive to`
359	`develop software that was easy to understand retrospectively, even if`
360	`they were selfish people, because they knew they might end up being`
361	`those future developers!`
362
363	`Yet, sometimes we come upon a piece of code that we simply cannot`
364	`understand. If you have never asked yourself, "What was this code's`
365	`developer thinking?" you haven't been developing software for very long.`
366
367	`When a developer can go back to the individual check-ins leading up to`
368	`the current code, they can work out the answers to such questions using`
369	`only the level of personal brilliance necessary to be a good developer. To`
370	`understand such code using only the finished form, you are asking future`
371	`developers to make intuitive leaps that the original developer was`
372	`unable to make. In other words, you are asking your future maintenance`
373	`developers to be smarter than the original developers! That's a`
374	`beautiful wish, but there's a sharp limit to how far you can carry it.`
375	`Eventually you hit the limits of human brilliance.`
376
377	`When the operation of some bit of code is not obvious, both Fossil and`
378	Git let you run a [`blame`](/help/blame) on the code file to get
379	`information about each line of code, and from that which check-in last`
380	`touched a given line of code. If you squash the check-ins on a branch`
381	`down to a single check-in, you throw away the information leading up to`
382	`that finished form. Fossil not only preserves the check-ins surrounding`
383	`the one that included the line of code you're trying to understand, its`
384	`[superior data model][sdm] lets you see the surrounding check-ins in`
385	`both directions; not only what lead up to it, but what came next. Git`
386	`can't do that short of crawling the block-chain backwards from the tip`
387	`of the branch to the check-in you’re looking at, an expensive operation.`
388
389	`We believe it is easier to understand a line of code from the 10-line`
390	`check-in it was a part of — and then to understand the surrounding`
391	`check-ins as necessary — than it is to understand a 500-line check-in`
392	`that collapses a whole branch's worth of changes down to a single`
393	`finished feature.`
394
395	`[sdm]: ./fossil-v-git.wiki#durable`
396
397	`### <a id="bisecting"></a>6.2 Bisecting works better on small check-ins`
398
399	`Git lets a developer write a feature in ten check-ins but collapse it`
400	`down to an eleventh check-in and then deliberately push only that final`
401	`collapsed check-in to the parent repo. Someone else may then do a bisect`
402	`that blames the merged check-in as the source of the problem they’re`
403	`chasing down; they then have to manually work out which of the 10 steps`
404	`the original developer took to create it to find the source of the`
405	`actual problem.`
406
407	`An equivalent push in Fossil will send all 11 check-ins to the parent`
408	`repository so that a later investigator doing the same sort of bisect`
409	`sees the complete check-in history. That bisect will point the`
410	`investigator at the single original check-in that caused the problem.`
411
412	`### <a id="comments"></a>6.3 Multiple check-ins require multiple check-in comments`
413
414	`The more comments you have from a given developer on a given body of`
415	`code, the more concise documentation you have of that developer's`
416	`thought process. To resume the bisecting example, a developer trying to`
417	`work out what the original developer was thinking with a given change`
418	`will have more success given a check-in comment that explains what the`
419	`one check-in out of ten blamed by the "bisect" command was trying to`
420	`accomplish than if they must work that out from the eleventh check-in's`
421	`comment, which only explains the "clean" version of the collapsed`
422	`feature.`
423
424	`### <a id="cherrypicking"></a>6.4 Cherry-picks work better with small check-ins`
425
426	`While working on a new feature in one branch, you may come across a bug`
427	`in the pre-existing code that you need to fix in order for work on that`
428	`feature to proceed. You could choose to switch briefly back to the`
429	`parent branch, develop the fix there, check it in, then merge the parent`
430	`back up to the feature branch in order to continue work, but that's`
431	`distracting. If the fix isn't for a critical bug, fixing it on the`
432	`parent branch can wait, so it's better to maintain your mental working`
433	`state by fixing the problem in place on the feature branch, then check`
434	`the fix in on the feature branch, resume work on the feature, and later`
435	`merge that fix down into the parent branch along with the feature.`
436
437	`But now what happens if another branch also needs that fix? Let us say`
438	`our code repository has a branch for the current stable release, a`
439	`development branch for the next major version, and feature branches off`
440	`of the development branch. If we rebase each feature branch down into`
441	`the development branch as a single check-in, pushing only the rebase`
442	`check-in up to the parent repo, only that fix's developer has the`
443	`information locally to perform the cherry-pick of the fix onto the`
444	`stable branch.`
445
446	`Developers working on new features often do not care about old stable`
447	`versions, yet that stable version may have an end user community that`
448	`depends on that version, who either cannot wait for the next stable`
449	`version or who wish to put off upgrading to it for some time. Such users`
450	`want backported bug fixes, yet the developers creating those fixes have`
451	`poor incentives to provide those backports. Thus the existence of`
452	`maintenance and support organizations, who end up doing such work.`
453	`(There is [a famous company][rh] that built a multi-billion dollar`
454	`enterprise on such work.)`
455
456	`This work is far easier when each cherry-pick transfers completely and`
457	`cleanly from one branch to another, and we increase the likelihood of`
458	`achieving that state by working from the smallest check-ins that remain`
459	`complete. If a support organization must manually disentangle a fix from`
460	`a feature check-in, they are likely to introduce new bugs on the stable`
461	`branch. Even if they manage to do their work without error, it takes`
462	`them more time to do the cherry-pick that way.`
463
464	`[rh]: https://en.wikipedia.org/wiki/Red_Hat`
465
466	`### <a id="backouts"></a>6.5 Back-outs also work better with small check-ins`
467
468	`The inverse of the cherry-pick merge is the back-out merge. If you push`
469	`only a collapsed version of a private working branch up to the parent`
470	`repo, those working from that parent repo cannot automatically back out`
471	`any of the individual check-ins that went into that private branch.`
472	`Others must either manually disentangle the problematic part of your`
473	`merge check-in or back out the entire feature.`
474
475	`## <a id="better-plan"></a>7.0 Cherry-pick merges work better than rebase`
476
477	`Perhaps there are some cases where a rebase-like transformation`
478	`is actually helpful, but those cases are rare, and when they do`
479	`come up, running a series of cherry-pick merges achieves the same`
480	`topology with several advantages:`
481
482	`1. In Fossil, cherry-pick merges preserve an honest and clear record`
483	`of history. Fossil remembers where a cherry-pick came from, and`
484	`it shows this in its timeline, so other developers can understand`
485	`how a cherry-pick based commit came together.`
486
487	`Git lacks the ability to remember the source of a cherry-pick as`
488	`part of the commit. This fact has no direct bearing on this`
489	`document’s thesis, but we can make a few observations. First, Git`
490	`forgets history in more cases than in rebasing. Second, if Git`
491	`remembered the source of cherry-picks in commits, Git users might`
492	`have a better argument for avoiding rebase, because they’d have an`
493	`alternative that didn’t lose history.`
494
495	`2. Fossil’s [test before commit philosophy][tbc] means you can test a`
496	`cherry-pick before committing it. Because Fossil allows multiple`
497	`cherry-picks in a single commit and it remembers them all, you can`
498	`do this for a complicated merge in step-wise fashion.`
499
500	`Git commits cherry-picks straight to the repository, so that if it`
501	`results in a bad state, you have to do something drastic like`
502	`git reset --hard` to repair the damage.
503
504	`3. Cherry-picks keep both the original and the revised check-ins,`
505	`so both timestamps are preserved.`
506
507	`[tbc]: ./fossil-v-git.wiki#testing`
508
509	`## <a id="conclusion"></a>8.0 Summary and conclusion`
510
511	`Rebasing is an anti-pattern. It is dishonest. It deliberately`
512	`omits historical information. It causes problems for collaboration.`
513	`And it has no offsetting benefits.`
514
515	`For these reasons, rebase is intentionally and deliberately omitted`
516	`from the design of Fossil.`
517
518
519	`[golden]: https://www.atlassian.com/git/tutorials/merging-vs-rebasing#the-golden-rule-of-rebasing`
520	`[gitrebase]: https://git-scm.com/book/en/v2/Git-Branching-Rebasing`
521	`[nagappan]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2008-11.pdf`
522	`[weinberg]: https://books.google.com/books?id=76dIAAAAMAAJ`
523

Fossil SCM

Keyboard Shortcuts