Fossil SCM

fossil-scm / www / rebaseharm.md
1
# Rebase Considered Harmful
2
3
Fossil deliberately omits a "rebase" command because the original
4
designer of Fossil (and [original author][vhist] of this article) considers rebase to be
5
an anti-pattern to be avoided. This article attempts to
6
explain that point of view.
7
8
[vhist]: /finfo?name=www/rebaseharm.md&ubg
9
10
## 1.0 Rebasing is dangerous
11
12
Most people, even strident advocates of rebase, agree that rebase can
13
cause problems when misused. The Git rebase documentation talks about the
14
[golden rule of rebasing][golden]: never rebase on a public
15
branch. Horror stories of misused rebase abound, and the rebase
16
documentation devotes considerable space toward explaining how to
17
recover from rebase errors and/or misuse.
18
19
## <a id="cap-loss"></a>2.0 Rebase provides no new capabilities
20
21
Sometimes sharp and dangerous tools are justified,
22
because they accomplish things that cannot be
23
done otherwise, or at least cannot be done easily.
24
Rebase does not fall into that category,
25
because it provides no new capabilities.
26
27
### <a id="orphaning"></a>2.1 A rebase is just a merge with historical references omitted
28
29
A rebase is really nothing more than a merge (or a series of merges)
30
that deliberately forgets one of the parents of each merge step.
31
To help illustrate this fact,
32
consider the first rebase example from the
33
[Git documentation][gitrebase]. The merge looks like this:
34
35
~~~ pikchr toggle center
36
scale = 0.8
37
circle "C0" fit
38
arrow right 50%
39
circle same "C1"
40
arrow same
41
circle same "C2"
42
arrow same
43
circle same "C3"
44
arrow same
45
circle same "C5"
46
circle same "C4" at 1cm above C3
47
arrow from C2 to C4 chop
48
arrow from C4 to C5 chop
49
~~~
50
51
And the rebase looks like this:
52
53
~~~ pikchr toggle center
54
scale = 0.8
55
circle "C0" fit
56
arrow right 50%
57
circle same "C1"
58
arrow same
59
circle same "C2"
60
arrow same
61
circle same "C3"
62
arrow same
63
circle same "C4'"
64
circle same "C4" at 1cm above C3
65
arrow from C2 to C4 chop
66
~~~
67
68
As the [Git documentation][gitrebase] points out, check-ins C4\' and C5
69
are identical. The only difference between C4\' and C5 is that C5
70
records the fact that C4 is its merge parent but C4\' does not.
71
72
Thus, a rebase is just a merge that forgets where it came from.
73
74
The Git documentation acknowledges this fact (in so many words) and
75
justifies it by saying "rebasing makes for a cleaner history." I read
76
that sentence as a tacit admission that the Git history display
77
capabilities are weak and need active assistance from the user to
78
keep things manageable.
79
Surely a better approach is to record
80
the complete ancestry of every check-in but then fix the tool to show
81
a "clean" history in those instances where a simplified display is
82
desirable and edifying, but retain the option to show the real,
83
complete, messy history for cases where detail and accuracy are more
84
important.
85
86
So, another way of thinking about rebase is that it is a kind of
87
merge that intentionally forgets some details in order to
88
not overwhelm the weak history display mechanisms available in Git.
89
Wouldn't it be better, less error-prone, and easier on users
90
to enhance the history display mechanisms in Git so that rebasing
91
for a clean, linear history became unnecessary?
92
93
### <a id="clean-diffs"></a>2.2 Rebase does not actually provide better feature-branch diffs
94
95
Another argument, often cited, is that rebasing a feature branch
96
allows one to see just the changes in the feature branch without
97
the concurrent changes in the main line of development.
98
Consider a hypothetical case:
99
100
~~~ pikchr toggle center
101
scale = 0.8
102
circle "C0" fit fill white
103
arrow right 50%
104
circle same "C1"
105
arrow same
106
circle same "C2"
107
arrow same
108
circle same "C4"
109
arrow same
110
circle same "C6"
111
circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2
112
arrow right 50%
113
circle same "C5"
114
arrow from C2 to C3 chop
115
box ht C3.y-C2.y wid C6.e.x-C0.w.x+1.5*C1.rad at C2 behind C0 fill 0xc6e2ff color 0xaac5df
116
box ht previous.ht wid previous.wid*0.55 with .se at previous.ne \
117
behind C0 fill 0x9accfc color 0xaac5df
118
text "feature" with .s at previous.n
119
text "main" with .n at first box.s
120
~~~
121
122
In the above, a feature branch consisting of check-ins C3 and C5 is
123
run concurrently with the main line in check-ins C4 and C6. Advocates
124
for rebase say that you should rebase the feature branch to the tip
125
of main in order to remove main-line development differences from
126
the feature branch's history:
127
128
~~~ pikchr toggle center
129
# Duplicated below in section 5.0
130
scale = 0.8
131
circle "C0" fit fill white
132
arrow right 50%
133
circle same "C1"
134
arrow same
135
circle same "C2"
136
arrow same
137
circle same "C4"
138
arrow same
139
circle same "C6"
140
circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2
141
arrow right 50%
142
circle same "C5"
143
arrow from C2 to C3 chop
144
C3P: circle same "C3'" at first arrow.width + C0.rad*2 heading 30 from C6
145
arrow right 50% from C3P.e
146
C5P: circle same "C5'"
147
arrow from C6 to C3P chop
148
149
box ht C3.y-C2.y wid C5P.e.x-C0.w.x+1.5*C1.rad with .w at 0.5*(first arrow.wid) west of C0.w \
150
behind C0 fill 0xc6e2ff color 0xaac5df
151
box ht previous.ht wid previous.e.x - C2.w.x with .se at previous.ne \
152
behind C0 fill 0x9accfc color 0xaac5df
153
~~~
154
155
156
You could choose to collapse C3\' and C5\' into a single check-in
157
as part of this rebase, but that's a side issue we'll deal with
158
[separately](#collapsing).
159
160
Because Fossil purposefully lacks rebase, the closest you can get to this same check-in
161
history is the following merge:
162
163
~~~ pikchr toggle center
164
scale = 0.8
165
circle "C0" fit fill white
166
arrow right 50%
167
circle same "C1"
168
arrow same
169
circle same "C2"
170
arrow same
171
circle same "C4"
172
arrow same
173
circle same "C6"
174
circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2
175
arrow right 50%
176
circle same "C5"
177
arrow same
178
circle same "C7"
179
arrow from C2 to C3 chop
180
arrow from C6 to C7 chop
181
182
box ht C3.y-C2.y wid C7.e.x-C0.w.x+1.5*C1.rad with .w at 0.5*(first arrow.wid) west of C0.w \
183
behind C0 fill 0xc6e2ff color 0xaac5df
184
box ht previous.ht wid previous.e.x - C2.w.x with .se at previous.ne \
185
behind C0 fill 0x9accfc color 0xaac5df
186
~~~
187
188
189
Check-ins C5\' and C7 check-ins hold identical code. The only
190
difference is in their history.
191
192
The argument from rebase advocates
193
is that with merge it is difficult to see only the changes associated
194
with the feature branch without the commingled mainline changes.
195
In other words, diff(C2,C7) shows changes from both the feature
196
branch and from the mainline, whereas in the rebase case
197
diff(C6,C5\') shows only the feature branch changes.
198
199
But that argument is comparing apples to oranges, since the two diffs
200
do not have the same baseline. The correct way to see only the feature
201
branch changes in the merge case is not diff(C2,C7) but rather diff(C6,C7).
202
203
<div align=center>
204
205
| Rebase | Merge | What You See |
206
|---------------|-------------|----------------------------------------|
207
| diff(C2,C5\') | diff(C2,C7) | Commingled branch and mainline changes |
208
| diff(C6,C5\') | diff(C6,C7) | Branch changes only |
209
210
</div>
211
212
Remember: C7 and C5\' are bit-for-bit identical, so the output of the
213
diff is not determined by whether you select C7 or C5\' as the target
214
of the diff, but rather by your choice of the diff source, C2 or C6.
215
216
So, to help with the problem of viewing changes associated with a feature
217
branch, perhaps what is needed is not rebase but rather better tools to
218
help users identify an appropriate baseline for their diffs.
219
220
## <a id="siloing"></a>3.0 Rebase encourages siloed development
221
222
The [golden rule of rebasing][golden] is that you should never do it
223
on public branches, so if you are using rebase as intended, that means
224
you are keeping private branches. Or, to put it another way, you are
225
doing siloed development. You are not sharing your intermediate work
226
with collaborators. This is not good for product quality.
227
228
[Nagappan, et. al][nagappan] studied bugs in Windows Vista and found
229
that the best predictor of bugs is the distance on the org-chart between
230
the stake-holders. The bug rate is inversely related to the
231
amount of communication among the engineers.
232
Similar findings arise in other disciplines. Keeping
233
private branches does not prove that developers are communicating
234
insufficiently, but it is a key symptom of that problem.
235
236
[Weinberg][weinberg] argues programming should be "egoless." That
237
is to say, programmers should avoid linking their code with their sense of
238
self, as that makes it more difficult for them to find and respond
239
to bugs, and hence makes them less productive. Many developers are
240
drawn to private branches out of sense of ego. "I want to get the
241
code right before I publish it." I sympathize with this sentiment,
242
and am frequently guilty of it myself. It is humbling to display
243
your stupid mistake to the whole world on an Internet that
244
never forgets. And yet, humble programmers generate better code.
245
246
What is the fastest path to solid code? Is it to continue staring at
247
your private branch to seek out every last bug, or is it to publish it
248
as-is, whereupon the many eyeballs will immediately see that last stupid
249
error in the code? Testing and development are often done by separate
250
groups within a larger software development organization, because
251
developers get too close to their own code to see every problem in it.
252
253
Given that, is it better for those many eyeballs to find your problems
254
while they're still isolated on a feature branch, or should that vetting
255
wait until you finally push a collapsed version of a private working
256
branch to the parent repo? Will the many eyeballs even see those errors
257
when they’re intermingled with code implementing some compelling new feature?
258
259
## <a id="timestamps"></a>4.0 Rebase causes timestamp confusion
260
261
Consider the earlier example of rebasing a feature branch:
262
263
~~~ pikchr toggle center
264
# Copy of second diagram in section 2.2 above
265
scale = 0.8
266
circle "C0" fit fill white
267
arrow right 50%
268
circle same "C1"
269
arrow same
270
circle same "C2"
271
arrow same
272
circle same "C4"
273
arrow same
274
circle same "C6"
275
circle same "C3" at last arrow.width + C0.rad*2 heading 30 from C2
276
arrow right 50%
277
circle same "C5"
278
arrow from C2 to C3 chop
279
C3P: circle same "C3'" at first arrow.width + C0.rad*2 heading 30 from C6
280
arrow right 50% from C3P.e
281
C5P: circle same "C5'"
282
arrow from C6 to C3P chop
283
284
box ht C3.y-C2.y wid C5P.e.x-C0.w.x+1.5*C1.rad with .w at 0.5*(first arrow.wid) west of C0.w \
285
behind C0 fill 0xc6e2ff color 0xaac5df
286
box ht previous.ht wid previous.e.x - C2.w.x with .se at previous.ne \
287
behind C0 fill 0x9accfc color 0xaac5df
288
~~~
289
290
What timestamps go on the C3\' and C5\' check-ins? If you choose
291
the same timestamps as the original C3 and C5, then you have the
292
odd situation C3' is older than its parent C6. We call that a
293
"timewarp" in Fossil. Timewarps can also happen due to misconfigured
294
system clocks, so they are not unique to rebase, but they are very
295
confusing and so best avoided. The other option is to provide new
296
unique timestamps for C3' and C5' but then you lose the information
297
about when those check-ins were originally created, which can make
298
historical analysis of changes more difficult. It might also
299
complicate the legal defense of prior art claims.
300
301
## <a id="lying"></a>5.0 Rebase misrepresents the project history
302
303
By discarding parentage information, rebase attempts to deceive the
304
reader about how the code actually came together.
305
306
Git’s rebase feature is more than just an
307
alternative to merging: it also provides mechanisms for changing the
308
project history in order to make editorial changes. Fossil shows that
309
you can get similar effects without modifying historical records,
310
allowing users to:
311
312
1. Edit check-in comments to fix typos or enhance clarity
313
2. Attach supplemental notes to check-ins or whole branches
314
3. Hide ill-conceived or now-unused branches from routine display
315
4. Fix faulty check-in date/times resulting from misconfigured
316
system clocks
317
5. Cross-reference check-ins with each other, or with
318
wiki, tickets, forum posts, and/or embedded documentation
319
320
…and so forth.
321
322
Fossil allows all of this not by removing or modifying existing
323
repository entries, but rather by adding new supplemental records.
324
Fossil keeps the original incorrect or unclear inputs and makes them
325
readily accessible, preserving the original historical record. Fossil
326
doesn’t make the user tell counter-factual “stories,” it only allows the
327
user to provide annotations to provide a more readable edited
328
presentation for routine display purposes.
329
330
Git needs rebase because it lacks these annotation facilities. Rather
331
than consider rebase a desirable feature missing in Fossil, ask instead
332
why Git lacks support for making editorial changes to check-ins without
333
modifying history? Wouldn't it be better to fix the version control
334
tool rather than requiring users to fabricate a fictitious project
335
history?
336
337
## <a id="collapsing"></a>6.0 Collapsing check-ins throws away valuable information
338
339
One of the oft-cited advantages of rebasing in Git is that it lets you
340
collapse multiple check-ins down to a single check-in to make the
341
development history “clean.” The intent is that development appear as
342
though every feature were created in a single step: no multi-step
343
evolution, no back-tracking, no false starts, no mistakes. This ignores
344
actual developer psychology: ideas rarely spring forth from fingers to
345
files in faultless finished form. A wish for collapsed, finalized
346
check-ins is a wish for a counterfactual situation.
347
348
The common counterargument is that collapsed check-ins represent a
349
better world, the ideal we're striving for. What that argument overlooks
350
is that we must throw away valuable information to get there.
351
352
### <a id="empathy"></a>6.1 Individual check-ins support mutual understanding
353
354
Ideally, future developers of our software can understand every feature
355
in it using only context available in the version of the code they start
356
work with. Prior to widespread version control, developers had no choice
357
but to work that way. Pre-existing codebases could only be understood
358
as-is or not at all. Developers in that world had an incentive to
359
develop software that was easy to understand retrospectively, even if
360
they were selfish people, because they knew they might end up being
361
those future developers!
362
363
Yet, sometimes we come upon a piece of code that we simply cannot
364
understand. If you have never asked yourself, "What was this code's
365
developer thinking?" you haven't been developing software for very long.
366
367
When a developer can go back to the individual check-ins leading up to
368
the current code, they can work out the answers to such questions using
369
only the level of personal brilliance necessary to be a good developer. To
370
understand such code using only the finished form, you are asking future
371
developers to make intuitive leaps that the original developer was
372
unable to make. In other words, you are asking your future maintenance
373
developers to be smarter than the original developers! That's a
374
beautiful wish, but there's a sharp limit to how far you can carry it.
375
Eventually you hit the limits of human brilliance.
376
377
When the operation of some bit of code is not obvious, both Fossil and
378
Git let you run a [`blame`](/help/blame) on the code file to get
379
information about each line of code, and from that which check-in last
380
touched a given line of code. If you squash the check-ins on a branch
381
down to a single check-in, you throw away the information leading up to
382
that finished form. Fossil not only preserves the check-ins surrounding
383
the one that included the line of code you're trying to understand, its
384
[superior data model][sdm] lets you see the surrounding check-ins in
385
both directions; not only what lead up to it, but what came next. Git
386
can't do that short of crawling the block-chain backwards from the tip
387
of the branch to the check-in you’re looking at, an expensive operation.
388
389
We believe it is easier to understand a line of code from the 10-line
390
check-in it was a part of — and then to understand the surrounding
391
check-ins as necessary — than it is to understand a 500-line check-in
392
that collapses a whole branch's worth of changes down to a single
393
finished feature.
394
395
[sdm]: ./fossil-v-git.wiki#durable
396
397
### <a id="bisecting"></a>6.2 Bisecting works better on small check-ins
398
399
Git lets a developer write a feature in ten check-ins but collapse it
400
down to an eleventh check-in and then deliberately push only that final
401
collapsed check-in to the parent repo. Someone else may then do a bisect
402
that blames the merged check-in as the source of the problem they’re
403
chasing down; they then have to manually work out which of the 10 steps
404
the original developer took to create it to find the source of the
405
actual problem.
406
407
An equivalent push in Fossil will send all 11 check-ins to the parent
408
repository so that a later investigator doing the same sort of bisect
409
sees the complete check-in history. That bisect will point the
410
investigator at the single original check-in that caused the problem.
411
412
### <a id="comments"></a>6.3 Multiple check-ins require multiple check-in comments
413
414
The more comments you have from a given developer on a given body of
415
code, the more concise documentation you have of that developer's
416
thought process. To resume the bisecting example, a developer trying to
417
work out what the original developer was thinking with a given change
418
will have more success given a check-in comment that explains what the
419
one check-in out of ten blamed by the "bisect" command was trying to
420
accomplish than if they must work that out from the eleventh check-in's
421
comment, which only explains the "clean" version of the collapsed
422
feature.
423
424
### <a id="cherrypicking"></a>6.4 Cherry-picks work better with small check-ins
425
426
While working on a new feature in one branch, you may come across a bug
427
in the pre-existing code that you need to fix in order for work on that
428
feature to proceed. You could choose to switch briefly back to the
429
parent branch, develop the fix there, check it in, then merge the parent
430
back up to the feature branch in order to continue work, but that's
431
distracting. If the fix isn't for a critical bug, fixing it on the
432
parent branch can wait, so it's better to maintain your mental working
433
state by fixing the problem in place on the feature branch, then check
434
the fix in on the feature branch, resume work on the feature, and later
435
merge that fix down into the parent branch along with the feature.
436
437
But now what happens if another branch *also* needs that fix? Let us say
438
our code repository has a branch for the current stable release, a
439
development branch for the next major version, and feature branches off
440
of the development branch. If we rebase each feature branch down into
441
the development branch as a single check-in, pushing only the rebase
442
check-in up to the parent repo, only that fix's developer has the
443
information locally to perform the cherry-pick of the fix onto the
444
stable branch.
445
446
Developers working on new features often do not care about old stable
447
versions, yet that stable version may have an end user community that
448
depends on that version, who either cannot wait for the next stable
449
version or who wish to put off upgrading to it for some time. Such users
450
want backported bug fixes, yet the developers creating those fixes have
451
poor incentives to provide those backports. Thus the existence of
452
maintenance and support organizations, who end up doing such work.
453
(There is [a famous company][rh] that built a multi-billion dollar
454
enterprise on such work.)
455
456
This work is far easier when each cherry-pick transfers completely and
457
cleanly from one branch to another, and we increase the likelihood of
458
achieving that state by working from the smallest check-ins that remain
459
complete. If a support organization must manually disentangle a fix from
460
a feature check-in, they are likely to introduce new bugs on the stable
461
branch. Even if they manage to do their work without error, it takes
462
them more time to do the cherry-pick that way.
463
464
[rh]: https://en.wikipedia.org/wiki/Red_Hat
465
466
### <a id="backouts"></a>6.5 Back-outs also work better with small check-ins
467
468
The inverse of the cherry-pick merge is the back-out merge. If you push
469
only a collapsed version of a private working branch up to the parent
470
repo, those working from that parent repo cannot automatically back out
471
any of the individual check-ins that went into that private branch.
472
Others must either manually disentangle the problematic part of your
473
merge check-in or back out the entire feature.
474
475
## <a id="better-plan"></a>7.0 Cherry-pick merges work better than rebase
476
477
Perhaps there are some cases where a rebase-like transformation
478
is actually helpful, but those cases are rare, and when they do
479
come up, running a series of cherry-pick merges achieves the same
480
topology with several advantages:
481
482
1. In Fossil, cherry-pick merges preserve an honest and clear record
483
of history. Fossil remembers where a cherry-pick came from, and
484
it shows this in its timeline, so other developers can understand
485
how a cherry-pick based commit came together.
486
487
Git lacks the ability to remember the source of a cherry-pick as
488
part of the commit. This fact has no direct bearing on this
489
document’s thesis, but we can make a few observations. First, Git
490
forgets history in more cases than in rebasing. Second, if Git
491
remembered the source of cherry-picks in commits, Git users might
492
have a better argument for avoiding rebase, because they’d have an
493
alternative that *didn’t* lose history.
494
495
2. Fossil’s [test before commit philosophy][tbc] means you can test a
496
cherry-pick before committing it. Because Fossil allows multiple
497
cherry-picks in a single commit and it remembers them all, you can
498
do this for a complicated merge in step-wise fashion.
499
500
Git commits cherry-picks straight to the repository, so that if it
501
results in a bad state, you have to do something drastic like
502
`git reset --hard` to repair the damage.
503
504
3. Cherry-picks keep both the original and the revised check-ins,
505
so both timestamps are preserved.
506
507
[tbc]: ./fossil-v-git.wiki#testing
508
509
## <a id="conclusion"></a>8.0 Summary and conclusion
510
511
Rebasing is an anti-pattern. It is dishonest. It deliberately
512
omits historical information. It causes problems for collaboration.
513
And it has no offsetting benefits.
514
515
For these reasons, rebase is intentionally and deliberately omitted
516
from the design of Fossil.
517
518
519
[golden]: https://www.atlassian.com/git/tutorials/merging-vs-rebasing#the-golden-rule-of-rebasing
520
[gitrebase]: https://git-scm.com/book/en/v2/Git-Branching-Rebasing
521
[nagappan]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2008-11.pdf
522
[weinberg]: https://books.google.com/books?id=76dIAAAAMAAJ
523

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button