Fossil SCM

Added section "7.0 Collapsing check-ins throws away valuable information" to rebaseharm.md, linked to from the previous throwaway comment about squashing a whole branch down to a single commit during rebase. This section explains an entire class of harms that come from rebase which wasn't previously covered.

wyoung 2019-09-13 11:12 trunk
Commit c71fe99f9b21b0b33335b031ca8bae4965b8a9e1aa3519efe77202186e02e7b0
1 file changed +149 -6
+149 -6
--- www/rebaseharm.md
+++ www/rebaseharm.md
@@ -70,14 +70,17 @@
7070
![unmerged feature branch](./rebase03.svg)
7171
7272
In the above, a feature branch consisting of check-ins C3 and C5 is
7373
run concurrently with the main line in check-ins C4 and C6. Advocates
7474
for rebase say that you should rebase the feature branch to the tip
75
-of main like the following (perhaps collapsing C3\' into C5\' to form
76
-a single check-in, or not, depending on preferences):
75
+of main like the following:
7776
7877
![rebased feature branch](./rebase04.svg)
78
+
79
+You could choose to collapse C3\' and C5\' into a single check-in
80
+as part of this rebase, but that’s a side issue we’ll deal with
81
+[separately](#collapsing).
7982
8083
If only merge is available, one would do a merge from the concurrent
8184
mainline changes into the feature branch as follows:
8285
8386
![merged feature branch](./rebase05.svg)
@@ -220,11 +223,149 @@
220223
or clarifications to historical check-ins in its blockchain. Hence,
221224
once again, rebase can be seen as an attempt to work around limitations
222225
of Git. Wouldn't it be better to fix the tool rather than to lie about
223226
the project history?
224227
225
-## <a name="better-plan"></a>7.0 Cherry-pick merges work better than rebase
228
+## <a name="collapsing"></a>7.0 Collapsing check-ins throws away valuable information
229
+
230
+One of the oft-cited advantages of rebasing in Git is that it lets you
231
+collapse multiple check-ins down to a single check-in to make the
232
+development history “clean.” The intent is that development appear as
233
+though every feature were created in a single step: no multi-step
234
+evolution, no back-tracking, no false starts, no mistakes. This ignores
235
+actual developer psychology: ideas rarely spring forth from fingers to
236
+files in faultless finished form. A wish for collapsed, finalized
237
+check-ins is a wish for a counterfactual situation.
238
+
239
+The common counterargument is that collapsed check-ins represent a
240
+better world, the ideal we’re striving for. What that argument overlooks
241
+is that we must throw away valuable information to get there.
242
+
243
+## <a name="empathy"></a>7.1 Individual check-ins support developer empathy
244
+
245
+Ideally, future developers of our software can understand every feature
246
+in it using only context available in the version of the code they start
247
+work with. Prior to widespread version control, developers had no choice
248
+but to work that way. Pre-existing codebases could only be understood
249
+as-is or not at all. Developers in that world had an incentive to
250
+develop software that was easy to understand retrospectively, even if
251
+they were selfish people, because they knew they might end up being
252
+those future developers!
253
+
254
+Yet, sometimes we come upon a piece of code that we simply cannot
255
+understand. If you have never asked yourself, “What was this code’s
256
+developer thinking?” you haven’t been developing software for very long.
257
+
258
+When a developer can go back to the individual check-ins leading up to
259
+the current code, they can work out the answers to such questions using
260
+only the level of empathy necessary to be a good developer. To
261
+understand such code using only the finished form, you are asking future
262
+developers to make intuitive leaps that the original developer was
263
+unable to make. In other words, you are asking your future maintenance
264
+developers to be smarter than the original developers! That’s a
265
+beautiful wish, but there’s a sharp limit to how far you can carry it.
266
+Eventually you hit the limits of human brilliance.
267
+
268
+When the operation of some bit of code is not obvious, both Fossil and
269
+Git let you run a [`blame`](/help?cmd=blame) on the code file to get
270
+information about each line of code, and from that which check-in last
271
+touched a given line of code. If you squash the check-ins on a branch
272
+down to a single check-in, you throw away the information leading up to
273
+that finished form. Fossil not only preserves the check-ins surrounding
274
+the one that included the line of code you’re trying to understand, its
275
+[superior data model][sdm] lets you see the surrounding check-ins in
276
+both directions; not only what lead up to it, but what came next. Git
277
+can’t do that short of crawling the block-chain backwards from the tip
278
+of the branch to the check-in you’re looking at, an expensive operation.
279
+
280
+We believe it is easier to understand a line of code from the 10-line
281
+check-in it was a part of — and then to understand the surrounding
282
+check-ins as necessary — than it is to understand a 500-line check-in
283
+that collapses a whole branch’s worth of changes down to a single
284
+finished feature.
285
+
286
+[sdm]: ./fossil-v-git.wiki#durable
287
+
288
+## <a name="bisecting"></a>7.2 Bisecting works better on small check-ins
289
+
290
+Git lets a developer write a feature in ten check-ins but collapse it
291
+down to an eleventh check-in and then deliberately push only that final
292
+collapsed check-in to the parent repo. Someone else may then do a bisect
293
+that blames the merged check-in as the source of the problem they’re
294
+chasing down; they then have to manually work out which of the 10 steps
295
+the original developer took to create it to find the source of the
296
+actual problem.
297
+
298
+Fossil pushes all 11 check-ins to the parent repository by default, so
299
+that someone doing that bisect sees the complete check-in history, so
300
+the bisect will point them at the single original check-in that caused
301
+the problem.
302
+
303
+## <a name="comments"></a>7.3 Multiple check-ins require multiple check-in comments
304
+
305
+The more comments you have from a given developer on a given body of
306
+code, the more concise documentation you have of that developer’s
307
+thought process. To resume the bisecting example, a developer trying to
308
+work out what the original developer was thinking with a given change
309
+will have more success given a check-in comment that explains what the
310
+one check-in out of ten blamed by the “bisect” command was trying to
311
+accomplish than if they must work that out from the eleventh check-in’s
312
+comment, which only explains the “clean” version of the collapsed
313
+feature.
314
+
315
+## <a name="cherrypicking"></a>7.4 Cherry-picks work better with small check-ins
316
+
317
+While working on a new feature in one branch, you may come across a bug
318
+in the pre-existing code that you need to fix in order for work on that
319
+feature to proceed. You could choose to switch briefly back to the
320
+parent branch, develop the fix there, check it in, then merge the parent
321
+back up to the feature branch in order to continue work, but that’s
322
+distracting. If the fix isn’t for a critical bug, fixing it on the
323
+parent branch can wait, so it’s better to maintain your mental working
324
+state by fixing the problem in place on the feature branch, then check
325
+the fix in on the feature branch, resume work on the feature, and later
326
+merge that fix down into the parent branch along with the feature.
327
+
328
+But now what happens if another branch *also* needs that fix? Let us say
329
+our code repository has a branch for the current stable release, a
330
+development branch for the next major version, and feature branches off
331
+of the development branch. If we rebase each feature branch down into
332
+the development branch as a single check-in, pushing only the rebase
333
+check-in up to the parent repo, only that fix’s developer has the
334
+information locally to perform the cherry-pick of the fix onto the
335
+stable branch.
336
+
337
+Developers working on new features often do not care about old stable
338
+versions, yet that stable version may have an end user community that
339
+depends on that version, who either cannot wait for the next stable
340
+version or who wish to put off upgrading to it for some time. Such users
341
+want backported bug fixes, yet the developers creating those fixes have
342
+poor incentives to provide those backports. Thus the existence of
343
+maintenance and support organizations, who end up doing such work.
344
+(There is [a famous company][rh] that built a multi-billion dollar
345
+enterprise on such work.)
346
+
347
+This work is far easier when each cherry-pick transfers completely and
348
+cleanly from one branch to another, and we increase the likelihood of
349
+achieving that state by working from the smallest check-ins that remain
350
+complete. If a support organization must manually disentangle a fix from
351
+a feature check-in, they are likely to introduce new bugs on the stable
352
+branch. Even if they manage to do their work without error, it takes
353
+them more time to do the cherry-pick that way.
354
+
355
+[rh]: https://en.wikipedia.org/wiki/Red_Hat
356
+
357
+## <a name="backouts"></a>7.5 Back-outs also work better with small check-ins
358
+
359
+The inverse of the cherry-pick merge is the back-out merge. If you push
360
+only a collapsed version of a private working branch up to the parent
361
+repo, those working from that parent repo cannot automatically back out
362
+any of the individual check-ins that went into that private branch.
363
+Others must either manually disentangle the problematic part of your
364
+merge check-in or back out the entire feature.
365
+
366
+## <a name="better-plan"></a>8.0 Cherry-pick merges work better than rebase
226367
227368
Perhaps there are some cases where a rebase-like transformation
228369
is actually helpful. But those cases are rare. And when they do
229370
come up, running a series of cherry-pick merges achieve the same
230371
topology, but with advantages:
@@ -231,20 +372,22 @@
231372
232373
1. Cherry-pick merges preserve an honest record of history.
233374
(They do in Fossil at least. Git's file format does not have
234375
a slot to record cherry-pick merge history, unfortunately.)
235376
236
- 2. Cherry-picks provide an opportunity to test each new check-in
237
- before it is committed to the blockchain
377
+ 2. Cherry-picks provide an opportunity to [test each new check-in
378
+ before it is committed][tbc] to the blockchain
238379
239380
3. Cherry-pick merges are "safe" in the sense that they do not
240381
cause problems for collaborators if you do them on public branches.
241382
242383
4. Cherry-picks keep both the original and the revised check-ins,
243384
so both timestamps are preserved.
244385
245
-## <a name="conclusion"></a>8.0 Summary and conclusion
386
+[tbc]: ./fossil-v-git.wiki#testing
387
+
388
+## <a name="conclusion"></a>9.0 Summary and conclusion
246389
247390
Rebasing is an anti-pattern. It is dishonest. It deliberately
248391
omits historical information. It causes problems for collaboration.
249392
And it has no offsetting benefits.
250393
251394
--- www/rebaseharm.md
+++ www/rebaseharm.md
@@ -70,14 +70,17 @@
70 ![unmerged feature branch](./rebase03.svg)
71
72 In the above, a feature branch consisting of check-ins C3 and C5 is
73 run concurrently with the main line in check-ins C4 and C6. Advocates
74 for rebase say that you should rebase the feature branch to the tip
75 of main like the following (perhaps collapsing C3\' into C5\' to form
76 a single check-in, or not, depending on preferences):
77
78 ![rebased feature branch](./rebase04.svg)
 
 
 
 
79
80 If only merge is available, one would do a merge from the concurrent
81 mainline changes into the feature branch as follows:
82
83 ![merged feature branch](./rebase05.svg)
@@ -220,11 +223,149 @@
220 or clarifications to historical check-ins in its blockchain. Hence,
221 once again, rebase can be seen as an attempt to work around limitations
222 of Git. Wouldn't it be better to fix the tool rather than to lie about
223 the project history?
224
225 ## <a name="better-plan"></a>7.0 Cherry-pick merges work better than rebase
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
226
227 Perhaps there are some cases where a rebase-like transformation
228 is actually helpful. But those cases are rare. And when they do
229 come up, running a series of cherry-pick merges achieve the same
230 topology, but with advantages:
@@ -231,20 +372,22 @@
231
232 1. Cherry-pick merges preserve an honest record of history.
233 (They do in Fossil at least. Git's file format does not have
234 a slot to record cherry-pick merge history, unfortunately.)
235
236 2. Cherry-picks provide an opportunity to test each new check-in
237 before it is committed to the blockchain
238
239 3. Cherry-pick merges are "safe" in the sense that they do not
240 cause problems for collaborators if you do them on public branches.
241
242 4. Cherry-picks keep both the original and the revised check-ins,
243 so both timestamps are preserved.
244
245 ## <a name="conclusion"></a>8.0 Summary and conclusion
 
 
246
247 Rebasing is an anti-pattern. It is dishonest. It deliberately
248 omits historical information. It causes problems for collaboration.
249 And it has no offsetting benefits.
250
251
--- www/rebaseharm.md
+++ www/rebaseharm.md
@@ -70,14 +70,17 @@
70 ![unmerged feature branch](./rebase03.svg)
71
72 In the above, a feature branch consisting of check-ins C3 and C5 is
73 run concurrently with the main line in check-ins C4 and C6. Advocates
74 for rebase say that you should rebase the feature branch to the tip
75 of main like the following:
 
76
77 ![rebased feature branch](./rebase04.svg)
78
79 You could choose to collapse C3\' and C5\' into a single check-in
80 as part of this rebase, but that’s a side issue we’ll deal with
81 [separately](#collapsing).
82
83 If only merge is available, one would do a merge from the concurrent
84 mainline changes into the feature branch as follows:
85
86 ![merged feature branch](./rebase05.svg)
@@ -220,11 +223,149 @@
223 or clarifications to historical check-ins in its blockchain. Hence,
224 once again, rebase can be seen as an attempt to work around limitations
225 of Git. Wouldn't it be better to fix the tool rather than to lie about
226 the project history?
227
228 ## <a name="collapsing"></a>7.0 Collapsing check-ins throws away valuable information
229
230 One of the oft-cited advantages of rebasing in Git is that it lets you
231 collapse multiple check-ins down to a single check-in to make the
232 development history “clean.” The intent is that development appear as
233 though every feature were created in a single step: no multi-step
234 evolution, no back-tracking, no false starts, no mistakes. This ignores
235 actual developer psychology: ideas rarely spring forth from fingers to
236 files in faultless finished form. A wish for collapsed, finalized
237 check-ins is a wish for a counterfactual situation.
238
239 The common counterargument is that collapsed check-ins represent a
240 better world, the ideal we’re striving for. What that argument overlooks
241 is that we must throw away valuable information to get there.
242
243 ## <a name="empathy"></a>7.1 Individual check-ins support developer empathy
244
245 Ideally, future developers of our software can understand every feature
246 in it using only context available in the version of the code they start
247 work with. Prior to widespread version control, developers had no choice
248 but to work that way. Pre-existing codebases could only be understood
249 as-is or not at all. Developers in that world had an incentive to
250 develop software that was easy to understand retrospectively, even if
251 they were selfish people, because they knew they might end up being
252 those future developers!
253
254 Yet, sometimes we come upon a piece of code that we simply cannot
255 understand. If you have never asked yourself, “What was this code’s
256 developer thinking?” you haven’t been developing software for very long.
257
258 When a developer can go back to the individual check-ins leading up to
259 the current code, they can work out the answers to such questions using
260 only the level of empathy necessary to be a good developer. To
261 understand such code using only the finished form, you are asking future
262 developers to make intuitive leaps that the original developer was
263 unable to make. In other words, you are asking your future maintenance
264 developers to be smarter than the original developers! That’s a
265 beautiful wish, but there’s a sharp limit to how far you can carry it.
266 Eventually you hit the limits of human brilliance.
267
268 When the operation of some bit of code is not obvious, both Fossil and
269 Git let you run a [`blame`](/help?cmd=blame) on the code file to get
270 information about each line of code, and from that which check-in last
271 touched a given line of code. If you squash the check-ins on a branch
272 down to a single check-in, you throw away the information leading up to
273 that finished form. Fossil not only preserves the check-ins surrounding
274 the one that included the line of code you’re trying to understand, its
275 [superior data model][sdm] lets you see the surrounding check-ins in
276 both directions; not only what lead up to it, but what came next. Git
277 can’t do that short of crawling the block-chain backwards from the tip
278 of the branch to the check-in you’re looking at, an expensive operation.
279
280 We believe it is easier to understand a line of code from the 10-line
281 check-in it was a part of — and then to understand the surrounding
282 check-ins as necessary — than it is to understand a 500-line check-in
283 that collapses a whole branch’s worth of changes down to a single
284 finished feature.
285
286 [sdm]: ./fossil-v-git.wiki#durable
287
288 ## <a name="bisecting"></a>7.2 Bisecting works better on small check-ins
289
290 Git lets a developer write a feature in ten check-ins but collapse it
291 down to an eleventh check-in and then deliberately push only that final
292 collapsed check-in to the parent repo. Someone else may then do a bisect
293 that blames the merged check-in as the source of the problem they’re
294 chasing down; they then have to manually work out which of the 10 steps
295 the original developer took to create it to find the source of the
296 actual problem.
297
298 Fossil pushes all 11 check-ins to the parent repository by default, so
299 that someone doing that bisect sees the complete check-in history, so
300 the bisect will point them at the single original check-in that caused
301 the problem.
302
303 ## <a name="comments"></a>7.3 Multiple check-ins require multiple check-in comments
304
305 The more comments you have from a given developer on a given body of
306 code, the more concise documentation you have of that developer’s
307 thought process. To resume the bisecting example, a developer trying to
308 work out what the original developer was thinking with a given change
309 will have more success given a check-in comment that explains what the
310 one check-in out of ten blamed by the “bisect” command was trying to
311 accomplish than if they must work that out from the eleventh check-in’s
312 comment, which only explains the “clean” version of the collapsed
313 feature.
314
315 ## <a name="cherrypicking"></a>7.4 Cherry-picks work better with small check-ins
316
317 While working on a new feature in one branch, you may come across a bug
318 in the pre-existing code that you need to fix in order for work on that
319 feature to proceed. You could choose to switch briefly back to the
320 parent branch, develop the fix there, check it in, then merge the parent
321 back up to the feature branch in order to continue work, but that’s
322 distracting. If the fix isn’t for a critical bug, fixing it on the
323 parent branch can wait, so it’s better to maintain your mental working
324 state by fixing the problem in place on the feature branch, then check
325 the fix in on the feature branch, resume work on the feature, and later
326 merge that fix down into the parent branch along with the feature.
327
328 But now what happens if another branch *also* needs that fix? Let us say
329 our code repository has a branch for the current stable release, a
330 development branch for the next major version, and feature branches off
331 of the development branch. If we rebase each feature branch down into
332 the development branch as a single check-in, pushing only the rebase
333 check-in up to the parent repo, only that fix’s developer has the
334 information locally to perform the cherry-pick of the fix onto the
335 stable branch.
336
337 Developers working on new features often do not care about old stable
338 versions, yet that stable version may have an end user community that
339 depends on that version, who either cannot wait for the next stable
340 version or who wish to put off upgrading to it for some time. Such users
341 want backported bug fixes, yet the developers creating those fixes have
342 poor incentives to provide those backports. Thus the existence of
343 maintenance and support organizations, who end up doing such work.
344 (There is [a famous company][rh] that built a multi-billion dollar
345 enterprise on such work.)
346
347 This work is far easier when each cherry-pick transfers completely and
348 cleanly from one branch to another, and we increase the likelihood of
349 achieving that state by working from the smallest check-ins that remain
350 complete. If a support organization must manually disentangle a fix from
351 a feature check-in, they are likely to introduce new bugs on the stable
352 branch. Even if they manage to do their work without error, it takes
353 them more time to do the cherry-pick that way.
354
355 [rh]: https://en.wikipedia.org/wiki/Red_Hat
356
357 ## <a name="backouts"></a>7.5 Back-outs also work better with small check-ins
358
359 The inverse of the cherry-pick merge is the back-out merge. If you push
360 only a collapsed version of a private working branch up to the parent
361 repo, those working from that parent repo cannot automatically back out
362 any of the individual check-ins that went into that private branch.
363 Others must either manually disentangle the problematic part of your
364 merge check-in or back out the entire feature.
365
366 ## <a name="better-plan"></a>8.0 Cherry-pick merges work better than rebase
367
368 Perhaps there are some cases where a rebase-like transformation
369 is actually helpful. But those cases are rare. And when they do
370 come up, running a series of cherry-pick merges achieve the same
371 topology, but with advantages:
@@ -231,20 +372,22 @@
372
373 1. Cherry-pick merges preserve an honest record of history.
374 (They do in Fossil at least. Git's file format does not have
375 a slot to record cherry-pick merge history, unfortunately.)
376
377 2. Cherry-picks provide an opportunity to [test each new check-in
378 before it is committed][tbc] to the blockchain
379
380 3. Cherry-pick merges are "safe" in the sense that they do not
381 cause problems for collaborators if you do them on public branches.
382
383 4. Cherry-picks keep both the original and the revised check-ins,
384 so both timestamps are preserved.
385
386 [tbc]: ./fossil-v-git.wiki#testing
387
388 ## <a name="conclusion"></a>9.0 Summary and conclusion
389
390 Rebasing is an anti-pattern. It is dishonest. It deliberately
391 omits historical information. It causes problems for collaboration.
392 And it has no offsetting benefits.
393
394

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button