Fossil SCM

Added section "7.0 Collapsing check-ins throws away valuable information" to rebaseharm.md, linked to from the previous throwaway comment about squashing a whole branch down to a single commit during rebase. This section explains an entire class of harms that come from rebase which wasn't previously covered.

wyoung 2019-09-13 11:12 trunk

Commit c71fe99f9b21b0b33335b031ca8bae4965b8a9e1aa3519efe77202186e02e7b0

Parent e5ba45788b99384…

1 file changed +149 -6

~ www/rebaseharm.md

M www/rebaseharm.md

+149 -6

		--- www/rebaseharm.md
		+++ www/rebaseharm.md
		@@ -70,14 +70,17 @@
70	70	![unmerged feature branch](./rebase03.svg)
71	71
72	72	In the above, a feature branch consisting of check-ins C3 and C5 is
73	73	run concurrently with the main line in check-ins C4 and C6. Advocates
74	74	for rebase say that you should rebase the feature branch to the tip
75		-of main like the following (perhaps collapsing C3\' into C5\' to form
76		-a single check-in, or not, depending on preferences):
	75	+of main like the following:
77	76
78	77	![rebased feature branch](./rebase04.svg)
	78	+
	79	+You could choose to collapse C3\' and C5\' into a single check-in
	80	+as part of this rebase, but that’s a side issue we’ll deal with
	81	+[separately](#collapsing).
79	82
80	83	If only merge is available, one would do a merge from the concurrent
81	84	mainline changes into the feature branch as follows:
82	85
83	86	![merged feature branch](./rebase05.svg)
		@@ -220,11 +223,149 @@
220	223	or clarifications to historical check-ins in its blockchain. Hence,
221	224	once again, rebase can be seen as an attempt to work around limitations
222	225	of Git. Wouldn't it be better to fix the tool rather than to lie about
223	226	the project history?
224	227
225		-## <a name="better-plan"></a>7.0 Cherry-pick merges work better than rebase
	228	+## <a name="collapsing"></a>7.0 Collapsing check-ins throws away valuable information
	229	+
	230	+One of the oft-cited advantages of rebasing in Git is that it lets you
	231	+collapse multiple check-ins down to a single check-in to make the
	232	+development history “clean.” The intent is that development appear as
	233	+though every feature were created in a single step: no multi-step
	234	+evolution, no back-tracking, no false starts, no mistakes. This ignores
	235	+actual developer psychology: ideas rarely spring forth from fingers to
	236	+files in faultless finished form. A wish for collapsed, finalized
	237	+check-ins is a wish for a counterfactual situation.
	238	+
	239	+The common counterargument is that collapsed check-ins represent a
	240	+better world, the ideal we’re striving for. What that argument overlooks
	241	+is that we must throw away valuable information to get there.
	242	+
	243	+## <a name="empathy"></a>7.1 Individual check-ins support developer empathy
	244	+
	245	+Ideally, future developers of our software can understand every feature
	246	+in it using only context available in the version of the code they start
	247	+work with. Prior to widespread version control, developers had no choice
	248	+but to work that way. Pre-existing codebases could only be understood
	249	+as-is or not at all. Developers in that world had an incentive to
	250	+develop software that was easy to understand retrospectively, even if
	251	+they were selfish people, because they knew they might end up being
	252	+those future developers!
	253	+
	254	+Yet, sometimes we come upon a piece of code that we simply cannot
	255	+understand. If you have never asked yourself, “What was this code’s
	256	+developer thinking?” you haven’t been developing software for very long.
	257	+
	258	+When a developer can go back to the individual check-ins leading up to
	259	+the current code, they can work out the answers to such questions using
	260	+only the level of empathy necessary to be a good developer. To
	261	+understand such code using only the finished form, you are asking future
	262	+developers to make intuitive leaps that the original developer was
	263	+unable to make. In other words, you are asking your future maintenance
	264	+developers to be smarter than the original developers! That’s a
	265	+beautiful wish, but there’s a sharp limit to how far you can carry it.
	266	+Eventually you hit the limits of human brilliance.
	267	+
	268	+When the operation of some bit of code is not obvious, both Fossil and
	269	+Git let you run a [`blame`](/help?cmd=blame) on the code file to get
	270	+information about each line of code, and from that which check-in last
	271	+touched a given line of code. If you squash the check-ins on a branch
	272	+down to a single check-in, you throw away the information leading up to
	273	+that finished form. Fossil not only preserves the check-ins surrounding
	274	+the one that included the line of code you’re trying to understand, its
	275	+[superior data model][sdm] lets you see the surrounding check-ins in
	276	+both directions; not only what lead up to it, but what came next. Git
	277	+can’t do that short of crawling the block-chain backwards from the tip
	278	+of the branch to the check-in you’re looking at, an expensive operation.
	279	+
	280	+We believe it is easier to understand a line of code from the 10-line
	281	+check-in it was a part of — and then to understand the surrounding
	282	+check-ins as necessary — than it is to understand a 500-line check-in
	283	+that collapses a whole branch’s worth of changes down to a single
	284	+finished feature.
	285	+
	286	+[sdm]: ./fossil-v-git.wiki#durable
	287	+
	288	+## <a name="bisecting"></a>7.2 Bisecting works better on small check-ins
	289	+
	290	+Git lets a developer write a feature in ten check-ins but collapse it
	291	+down to an eleventh check-in and then deliberately push only that final
	292	+collapsed check-in to the parent repo. Someone else may then do a bisect
	293	+that blames the merged check-in as the source of the problem they’re
	294	+chasing down; they then have to manually work out which of the 10 steps
	295	+the original developer took to create it to find the source of the
	296	+actual problem.
	297	+
	298	+Fossil pushes all 11 check-ins to the parent repository by default, so
	299	+that someone doing that bisect sees the complete check-in history, so
	300	+the bisect will point them at the single original check-in that caused
	301	+the problem.
	302	+
	303	+## <a name="comments"></a>7.3 Multiple check-ins require multiple check-in comments
	304	+
	305	+The more comments you have from a given developer on a given body of
	306	+code, the more concise documentation you have of that developer’s
	307	+thought process. To resume the bisecting example, a developer trying to
	308	+work out what the original developer was thinking with a given change
	309	+will have more success given a check-in comment that explains what the
	310	+one check-in out of ten blamed by the “bisect” command was trying to
	311	+accomplish than if they must work that out from the eleventh check-in’s
	312	+comment, which only explains the “clean” version of the collapsed
	313	+feature.
	314	+
	315	+## <a name="cherrypicking"></a>7.4 Cherry-picks work better with small check-ins
	316	+
	317	+While working on a new feature in one branch, you may come across a bug
	318	+in the pre-existing code that you need to fix in order for work on that
	319	+feature to proceed. You could choose to switch briefly back to the
	320	+parent branch, develop the fix there, check it in, then merge the parent
	321	+back up to the feature branch in order to continue work, but that’s
	322	+distracting. If the fix isn’t for a critical bug, fixing it on the
	323	+parent branch can wait, so it’s better to maintain your mental working
	324	+state by fixing the problem in place on the feature branch, then check
	325	+the fix in on the feature branch, resume work on the feature, and later
	326	+merge that fix down into the parent branch along with the feature.
	327	+
	328	+But now what happens if another branch also needs that fix? Let us say
	329	+our code repository has a branch for the current stable release, a
	330	+development branch for the next major version, and feature branches off
	331	+of the development branch. If we rebase each feature branch down into
	332	+the development branch as a single check-in, pushing only the rebase
	333	+check-in up to the parent repo, only that fix’s developer has the
	334	+information locally to perform the cherry-pick of the fix onto the
	335	+stable branch.
	336	+
	337	+Developers working on new features often do not care about old stable
	338	+versions, yet that stable version may have an end user community that
	339	+depends on that version, who either cannot wait for the next stable
	340	+version or who wish to put off upgrading to it for some time. Such users
	341	+want backported bug fixes, yet the developers creating those fixes have
	342	+poor incentives to provide those backports. Thus the existence of
	343	+maintenance and support organizations, who end up doing such work.
	344	+(There is [a famous company][rh] that built a multi-billion dollar
	345	+enterprise on such work.)
	346	+
	347	+This work is far easier when each cherry-pick transfers completely and
	348	+cleanly from one branch to another, and we increase the likelihood of
	349	+achieving that state by working from the smallest check-ins that remain
	350	+complete. If a support organization must manually disentangle a fix from
	351	+a feature check-in, they are likely to introduce new bugs on the stable
	352	+branch. Even if they manage to do their work without error, it takes
	353	+them more time to do the cherry-pick that way.
	354	+
	355	+[rh]: https://en.wikipedia.org/wiki/Red_Hat
	356	+
	357	+## <a name="backouts"></a>7.5 Back-outs also work better with small check-ins
	358	+
	359	+The inverse of the cherry-pick merge is the back-out merge. If you push
	360	+only a collapsed version of a private working branch up to the parent
	361	+repo, those working from that parent repo cannot automatically back out
	362	+any of the individual check-ins that went into that private branch.
	363	+Others must either manually disentangle the problematic part of your
	364	+merge check-in or back out the entire feature.
	365	+
	366	+## <a name="better-plan"></a>8.0 Cherry-pick merges work better than rebase
226	367
227	368	Perhaps there are some cases where a rebase-like transformation
228	369	is actually helpful. But those cases are rare. And when they do
229	370	come up, running a series of cherry-pick merges achieve the same
230	371	topology, but with advantages:
		@@ -231,20 +372,22 @@
231	372
232	373	1. Cherry-pick merges preserve an honest record of history.
233	374	(They do in Fossil at least. Git's file format does not have
234	375	a slot to record cherry-pick merge history, unfortunately.)
235	376
236		- 2. Cherry-picks provide an opportunity to test each new check-in
237		- before it is committed to the blockchain
	377	+ 2. Cherry-picks provide an opportunity to [test each new check-in
	378	+ before it is committed][tbc] to the blockchain
238	379
239	380	3. Cherry-pick merges are "safe" in the sense that they do not
240	381	cause problems for collaborators if you do them on public branches.
241	382
242	383	4. Cherry-picks keep both the original and the revised check-ins,
243	384	so both timestamps are preserved.
244	385
245		-## <a name="conclusion"></a>8.0 Summary and conclusion
	386	+[tbc]: ./fossil-v-git.wiki#testing
	387	+
	388	+## <a name="conclusion"></a>9.0 Summary and conclusion
246	389
247	390	Rebasing is an anti-pattern. It is dishonest. It deliberately
248	391	omits historical information. It causes problems for collaboration.
249	392	And it has no offsetting benefits.
250	393
251	394

	--- www/rebaseharm.md
	+++ www/rebaseharm.md
	@@ -70,14 +70,17 @@
70	![unmerged feature branch](./rebase03.svg)
71
72	In the above, a feature branch consisting of check-ins C3 and C5 is
73	run concurrently with the main line in check-ins C4 and C6. Advocates
74	for rebase say that you should rebase the feature branch to the tip
75	of main like the following (perhaps collapsing C3\' into C5\' to form
76	a single check-in, or not, depending on preferences):
77
78	![rebased feature branch](./rebase04.svg)




79
80	If only merge is available, one would do a merge from the concurrent
81	mainline changes into the feature branch as follows:
82
83	![merged feature branch](./rebase05.svg)
	@@ -220,11 +223,149 @@
220	or clarifications to historical check-ins in its blockchain. Hence,
221	once again, rebase can be seen as an attempt to work around limitations
222	of Git. Wouldn't it be better to fix the tool rather than to lie about
223	the project history?
224
225	## <a name="better-plan"></a>7.0 Cherry-pick merges work better than rebase










































































































































226
227	Perhaps there are some cases where a rebase-like transformation
228	is actually helpful. But those cases are rare. And when they do
229	come up, running a series of cherry-pick merges achieve the same
230	topology, but with advantages:
	@@ -231,20 +372,22 @@
231
232	1. Cherry-pick merges preserve an honest record of history.
233	(They do in Fossil at least. Git's file format does not have
234	a slot to record cherry-pick merge history, unfortunately.)
235
236	2. Cherry-picks provide an opportunity to test each new check-in
237	before it is committed to the blockchain
238
239	3. Cherry-pick merges are "safe" in the sense that they do not
240	cause problems for collaborators if you do them on public branches.
241
242	4. Cherry-picks keep both the original and the revised check-ins,
243	so both timestamps are preserved.
244
245	## <a name="conclusion"></a>8.0 Summary and conclusion


246
247	Rebasing is an anti-pattern. It is dishonest. It deliberately
248	omits historical information. It causes problems for collaboration.
249	And it has no offsetting benefits.
250
251

	--- www/rebaseharm.md
	+++ www/rebaseharm.md
	@@ -70,14 +70,17 @@
70	![unmerged feature branch](./rebase03.svg)
71
72	In the above, a feature branch consisting of check-ins C3 and C5 is
73	run concurrently with the main line in check-ins C4 and C6. Advocates
74	for rebase say that you should rebase the feature branch to the tip
75	of main like the following:

76
77	![rebased feature branch](./rebase04.svg)
78
79	You could choose to collapse C3\' and C5\' into a single check-in
80	as part of this rebase, but that’s a side issue we’ll deal with
81	[separately](#collapsing).
82
83	If only merge is available, one would do a merge from the concurrent
84	mainline changes into the feature branch as follows:
85
86	![merged feature branch](./rebase05.svg)
	@@ -220,11 +223,149 @@
223	or clarifications to historical check-ins in its blockchain. Hence,
224	once again, rebase can be seen as an attempt to work around limitations
225	of Git. Wouldn't it be better to fix the tool rather than to lie about
226	the project history?
227
228	## <a name="collapsing"></a>7.0 Collapsing check-ins throws away valuable information
229
230	One of the oft-cited advantages of rebasing in Git is that it lets you
231	collapse multiple check-ins down to a single check-in to make the
232	development history “clean.” The intent is that development appear as
233	though every feature were created in a single step: no multi-step
234	evolution, no back-tracking, no false starts, no mistakes. This ignores
235	actual developer psychology: ideas rarely spring forth from fingers to
236	files in faultless finished form. A wish for collapsed, finalized
237	check-ins is a wish for a counterfactual situation.
238
239	The common counterargument is that collapsed check-ins represent a
240	better world, the ideal we’re striving for. What that argument overlooks
241	is that we must throw away valuable information to get there.
242
243	## <a name="empathy"></a>7.1 Individual check-ins support developer empathy
244
245	Ideally, future developers of our software can understand every feature
246	in it using only context available in the version of the code they start
247	work with. Prior to widespread version control, developers had no choice
248	but to work that way. Pre-existing codebases could only be understood
249	as-is or not at all. Developers in that world had an incentive to
250	develop software that was easy to understand retrospectively, even if
251	they were selfish people, because they knew they might end up being
252	those future developers!
253
254	Yet, sometimes we come upon a piece of code that we simply cannot
255	understand. If you have never asked yourself, “What was this code’s
256	developer thinking?” you haven’t been developing software for very long.
257
258	When a developer can go back to the individual check-ins leading up to
259	the current code, they can work out the answers to such questions using
260	only the level of empathy necessary to be a good developer. To
261	understand such code using only the finished form, you are asking future
262	developers to make intuitive leaps that the original developer was
263	unable to make. In other words, you are asking your future maintenance
264	developers to be smarter than the original developers! That’s a
265	beautiful wish, but there’s a sharp limit to how far you can carry it.
266	Eventually you hit the limits of human brilliance.
267
268	When the operation of some bit of code is not obvious, both Fossil and
269	Git let you run a [`blame`](/help?cmd=blame) on the code file to get
270	information about each line of code, and from that which check-in last
271	touched a given line of code. If you squash the check-ins on a branch
272	down to a single check-in, you throw away the information leading up to
273	that finished form. Fossil not only preserves the check-ins surrounding
274	the one that included the line of code you’re trying to understand, its
275	[superior data model][sdm] lets you see the surrounding check-ins in
276	both directions; not only what lead up to it, but what came next. Git
277	can’t do that short of crawling the block-chain backwards from the tip
278	of the branch to the check-in you’re looking at, an expensive operation.
279
280	We believe it is easier to understand a line of code from the 10-line
281	check-in it was a part of — and then to understand the surrounding
282	check-ins as necessary — than it is to understand a 500-line check-in
283	that collapses a whole branch’s worth of changes down to a single
284	finished feature.
285
286	[sdm]: ./fossil-v-git.wiki#durable
287
288	## <a name="bisecting"></a>7.2 Bisecting works better on small check-ins
289
290	Git lets a developer write a feature in ten check-ins but collapse it
291	down to an eleventh check-in and then deliberately push only that final
292	collapsed check-in to the parent repo. Someone else may then do a bisect
293	that blames the merged check-in as the source of the problem they’re
294	chasing down; they then have to manually work out which of the 10 steps
295	the original developer took to create it to find the source of the
296	actual problem.
297
298	Fossil pushes all 11 check-ins to the parent repository by default, so
299	that someone doing that bisect sees the complete check-in history, so
300	the bisect will point them at the single original check-in that caused
301	the problem.
302
303	## <a name="comments"></a>7.3 Multiple check-ins require multiple check-in comments
304
305	The more comments you have from a given developer on a given body of
306	code, the more concise documentation you have of that developer’s
307	thought process. To resume the bisecting example, a developer trying to
308	work out what the original developer was thinking with a given change
309	will have more success given a check-in comment that explains what the
310	one check-in out of ten blamed by the “bisect” command was trying to
311	accomplish than if they must work that out from the eleventh check-in’s
312	comment, which only explains the “clean” version of the collapsed
313	feature.
314
315	## <a name="cherrypicking"></a>7.4 Cherry-picks work better with small check-ins
316
317	While working on a new feature in one branch, you may come across a bug
318	in the pre-existing code that you need to fix in order for work on that
319	feature to proceed. You could choose to switch briefly back to the
320	parent branch, develop the fix there, check it in, then merge the parent
321	back up to the feature branch in order to continue work, but that’s
322	distracting. If the fix isn’t for a critical bug, fixing it on the
323	parent branch can wait, so it’s better to maintain your mental working
324	state by fixing the problem in place on the feature branch, then check
325	the fix in on the feature branch, resume work on the feature, and later
326	merge that fix down into the parent branch along with the feature.
327
328	But now what happens if another branch also needs that fix? Let us say
329	our code repository has a branch for the current stable release, a
330	development branch for the next major version, and feature branches off
331	of the development branch. If we rebase each feature branch down into
332	the development branch as a single check-in, pushing only the rebase
333	check-in up to the parent repo, only that fix’s developer has the
334	information locally to perform the cherry-pick of the fix onto the
335	stable branch.
336
337	Developers working on new features often do not care about old stable
338	versions, yet that stable version may have an end user community that
339	depends on that version, who either cannot wait for the next stable
340	version or who wish to put off upgrading to it for some time. Such users
341	want backported bug fixes, yet the developers creating those fixes have
342	poor incentives to provide those backports. Thus the existence of
343	maintenance and support organizations, who end up doing such work.
344	(There is [a famous company][rh] that built a multi-billion dollar
345	enterprise on such work.)
346
347	This work is far easier when each cherry-pick transfers completely and
348	cleanly from one branch to another, and we increase the likelihood of
349	achieving that state by working from the smallest check-ins that remain
350	complete. If a support organization must manually disentangle a fix from
351	a feature check-in, they are likely to introduce new bugs on the stable
352	branch. Even if they manage to do their work without error, it takes
353	them more time to do the cherry-pick that way.
354
355	[rh]: https://en.wikipedia.org/wiki/Red_Hat
356
357	## <a name="backouts"></a>7.5 Back-outs also work better with small check-ins
358
359	The inverse of the cherry-pick merge is the back-out merge. If you push
360	only a collapsed version of a private working branch up to the parent
361	repo, those working from that parent repo cannot automatically back out
362	any of the individual check-ins that went into that private branch.
363	Others must either manually disentangle the problematic part of your
364	merge check-in or back out the entire feature.
365
366	## <a name="better-plan"></a>8.0 Cherry-pick merges work better than rebase
367
368	Perhaps there are some cases where a rebase-like transformation
369	is actually helpful. But those cases are rare. And when they do
370	come up, running a series of cherry-pick merges achieve the same
371	topology, but with advantages:
	@@ -231,20 +372,22 @@
372
373	1. Cherry-pick merges preserve an honest record of history.
374	(They do in Fossil at least. Git's file format does not have
375	a slot to record cherry-pick merge history, unfortunately.)
376
377	2. Cherry-picks provide an opportunity to [test each new check-in
378	before it is committed][tbc] to the blockchain
379
380	3. Cherry-pick merges are "safe" in the sense that they do not
381	cause problems for collaborators if you do them on public branches.
382
383	4. Cherry-picks keep both the original and the revised check-ins,
384	so both timestamps are preserved.
385
386	[tbc]: ./fossil-v-git.wiki#testing
387
388	## <a name="conclusion"></a>9.0 Summary and conclusion
389
390	Rebasing is an anti-pattern. It is dishonest. It deliberately
391	omits historical information. It causes problems for collaboration.
392	And it has no offsetting benefits.
393
394

Fossil SCM

Keyboard Shortcuts