Fossil SCM

Performance bugfix. nextmap/premap can still be performance killers and memory hogs. Moved the computation of sucessor changesets down to the type-dependent code (new methods) and the SQL database, i.e. the C level. In the current setup it was possible that the DB would deliver us millions of file-level dependency pairs which the Tcl level would then reduce to tens of actual changeset dependencies. Tcl did not cope well with that amount of data. Now the reduction happens in the query itself. A concrete example was a branch in the Tcl CVS generating nearly 9 million pairs, which reduced to roughly 200 changeset dependencies. This blew the memory out of the water and the converter ground to a halt, busily swapping. Ok, causes behind us, also added another index on 'csitem(iid)' to speed the search for changesets from the revisions, tags, and branches.

aku 2007-12-02 05:48 trunk
Commit 9c5705502507e993fa2486d31093571139db2128
--- tools/cvs2fossil/lib/c2f_pinitcsets.tcl
+++ tools/cvs2fossil/lib/c2f_pinitcsets.tcl
@@ -93,11 +93,12 @@
9393
cid INTEGER NOT NULL REFERENCES changeset,
9494
pos INTEGER NOT NULL,
9595
iid INTEGER NOT NULL, -- REFERENCES revision|tag|branch
9696
UNIQUE (cid, pos),
9797
UNIQUE (cid, iid)
98
- }
98
+ } { iid }
99
+ # Index on: iid (successor/predecessor retrieval)
99100
100101
project::rev getcstypes
101102
return
102103
}
103104
104105
--- tools/cvs2fossil/lib/c2f_pinitcsets.tcl
+++ tools/cvs2fossil/lib/c2f_pinitcsets.tcl
@@ -93,11 +93,12 @@
93 cid INTEGER NOT NULL REFERENCES changeset,
94 pos INTEGER NOT NULL,
95 iid INTEGER NOT NULL, -- REFERENCES revision|tag|branch
96 UNIQUE (cid, pos),
97 UNIQUE (cid, iid)
98 }
 
99
100 project::rev getcstypes
101 return
102 }
103
104
--- tools/cvs2fossil/lib/c2f_pinitcsets.tcl
+++ tools/cvs2fossil/lib/c2f_pinitcsets.tcl
@@ -93,11 +93,12 @@
93 cid INTEGER NOT NULL REFERENCES changeset,
94 pos INTEGER NOT NULL,
95 iid INTEGER NOT NULL, -- REFERENCES revision|tag|branch
96 UNIQUE (cid, pos),
97 UNIQUE (cid, iid)
98 } { iid }
99 # Index on: iid (successor/predecessor retrieval)
100
101 project::rev getcstypes
102 return
103 }
104
105
--- tools/cvs2fossil/lib/c2f_prev.tcl
+++ tools/cvs2fossil/lib/c2f_prev.tcl
@@ -86,14 +86,26 @@
8686
delegate method isbranch to mytypeobj
8787
delegate method istag to mytypeobj
8888
8989
method setpos {p} { set mypos $p ; return }
9090
method pos {} { return $mypos }
91
+
92
+ # result = list (changeset)
93
+ method successors {} {
94
+ return [struct::list map \
95
+ [$mytypeobj cs_successors $myitems] \
96
+ [mytypemethod of]]
97
+ }
9198
9299
# result = dict (item -> list (changeset))
93100
method successormap {} {
94
- # NOTE / FUTURE: Possible bottleneck.
101
+ # NOTE / FUTURE: Definitive bottleneck (can be millions of pairs).
102
+ #
103
+ # Only user is pass 9, computing the limits of backward
104
+ # branches per branch in the changeset. TODO: Fold that into
105
+ # the SQL query, i.e. move the crunching from Tcl to C.
106
+
95107
array set tmp {}
96108
foreach {rev children} [$self nextmap] {
97109
foreach child $children {
98110
lappend tmp($rev) $myitemmap($child)
99111
}
@@ -100,25 +112,18 @@
100112
set tmp($rev) [lsort -unique $tmp($rev)]
101113
}
102114
return [array get tmp]
103115
}
104116
105
- # result = list (changeset)
106
- method successors {} {
107
- # NOTE / FUTURE: Possible bottleneck.
108
- set csets {}
109
- foreach {_ children} [$self nextmap] {
110
- foreach child $children {
111
- lappend csets $myitemmap($child)
112
- }
113
- }
114
- return [lsort -unique $csets]
115
- }
116
-
117117
# result = dict (item -> list (changeset))
118118
method predecessormap {} {
119
- # NOTE / FUTURE: Possible bottleneck.
119
+ # NOTE / FUTURE: Definitive bottleneck (can be millions of pairs).
120
+ #
121
+ # Only user is pass 9, computing the limits of backward
122
+ # branches per branch in the changeset. TODO: Fold that into
123
+ # the SQL query, i.e. move the crunching from Tcl to C.
124
+
120125
array set tmp {}
121126
foreach {rev children} [$self premap] {
122127
foreach child $children {
123128
lappend tmp($rev) $myitemmap($child)
124129
}
@@ -1057,10 +1062,68 @@
10571062
"] {
10581063
lappend dependencies([list rev $rid]) [list sym::branch $parent]
10591064
}
10601065
return
10611066
}
1067
+
1068
+ # result = list (changeset-id)
1069
+ typemethod cs_successors {revisions} {
1070
+ # This is a variant of 'successors' which maps the low-level
1071
+ # data directly to the associated changesets. I.e. instead
1072
+ # millions of dependency pairs (in extreme cases (Example: Tcl
1073
+ # CVS)) we return a very short and much more manageable list
1074
+ # of changesets.
1075
+
1076
+ set theset ('[join $revisions {','}]')
1077
+ return [state run "
1078
+ SELECT C.cid
1079
+ FROM revision R, csitem CI, changeset C
1080
+ WHERE R.rid IN $theset -- Restrict to revisions of interest
1081
+ AND R.child IS NOT NULL -- Has primary child
1082
+ AND CI.iid = R.rid
1083
+ AND C.cid = CI.cid
1084
+ AND C.type = 0
1085
+ UNION
1086
+ SELECT C.cid
1087
+ FROM revision R, revisionbranchchildren B, csitem CI, changeset C
1088
+ WHERE R.rid IN $theset -- Restrict to revisions of interest
1089
+ AND R.rid = B.rid -- Select subset of branch children
1090
+ AND CI.iid = R.rid
1091
+ AND C.cid = CI.cid
1092
+ AND C.type = 0
1093
+ UNION
1094
+ SELECT C.cid
1095
+ FROM revision R, revision RA, csitem CI, changeset C
1096
+ WHERE R.rid IN $theset -- Restrict to revisions of interest
1097
+ AND R.isdefault -- Restrict to NTDB
1098
+ AND R.dbchild IS NOT NULL -- and last NTDB belonging to trunk
1099
+ AND RA.rid = R.dbchild -- Go directly to trunk root
1100
+ AND RA.child IS NOT NULL -- Has primary child.
1101
+ AND CI.iid = R.rid
1102
+ AND C.cid = CI.cid
1103
+ AND C.type = 0
1104
+ AND CI.iid = R.rid
1105
+ AND C.cid = CI.cid
1106
+ AND C.type = 0
1107
+ UNION
1108
+ SELECT C.cid
1109
+ FROM revision R, tag T, csitem CI, changeset C
1110
+ WHERE R.rid in $theset
1111
+ AND T.rev = R.rid
1112
+ AND CI.iid = T.tid
1113
+ AND C.cid = CI.cid
1114
+ AND C.type = 1
1115
+ UNION
1116
+ SELECT C.cid
1117
+ FROM revision R, branch B, csitem CI, changeset C
1118
+ WHERE R.rid in $theset
1119
+ AND B.root = R.rid
1120
+ AND CI.iid = B.bid
1121
+ AND C.cid = CI.cid
1122
+ AND C.type = 2
1123
+ "]
1124
+ }
10621125
}
10631126
10641127
# # ## ### ##### ######## ############# #####################
10651128
## Helper singleton. Commands for tag symbol changesets.
10661129
@@ -1138,10 +1201,16 @@
11381201
"] {
11391202
lappend dependencies([list sym::tag $tid]) [list sym::tag $parent]
11401203
}
11411204
return
11421205
}
1206
+
1207
+ # result = list (changeset-id)
1208
+ typemethod cs_successors {tags} {
1209
+ # Tags have no successors.
1210
+ return
1211
+ }
11431212
}
11441213
11451214
# # ## ### ##### ######## ############# #####################
11461215
## Helper singleton. Commands for branch symbol changesets.
11471216
@@ -1250,10 +1319,49 @@
12501319
AND B.sid = P.sid
12511320
AND P.pid = T.sid
12521321
"] {
12531322
lappend dependencies([list sym::branch $bid]) [list sym::tag $parent]
12541323
}
1324
+ return
1325
+ }
1326
+
1327
+ # result = list (changeset-id)
1328
+ typemethod cs_successors {branches} {
1329
+ # This is a variant of 'successors' which maps the low-level
1330
+ # data directly to the associated changesets. I.e. instead
1331
+ # millions of dependency pairs (in extreme cases (Example: Tcl
1332
+ # CVS)) we return a very short and much more manageable list
1333
+ # of changesets.
1334
+
1335
+ set theset ('[join $branches {','}]')
1336
+ return [state run "
1337
+ SELECT C.cid
1338
+ FROM branch B, revision R, csitem CI, changeset C
1339
+ WHERE B.bid IN $theset
1340
+ AND B.first = R.rid
1341
+ AND CI.iid = R.rid
1342
+ AND C.cid = CI.cid
1343
+ AND C.type = 0
1344
+ UNION
1345
+ SELECT C.cid
1346
+ FROM branch B, preferedparent P, branch BX, csitem CI, changeset C
1347
+ WHERE B.bid IN $theset
1348
+ AND B.sid = P.pid
1349
+ AND BX.sid = P.sid
1350
+ AND CI.iid = BX.bid
1351
+ AND C.cid = CI.cid
1352
+ AND C.type = 2
1353
+ UNION
1354
+ SELECT C.cid
1355
+ FROM branch B, preferedparent P, tag T, csitem CI, changeset C
1356
+ WHERE B.bid IN $theset
1357
+ AND B.sid = P.pid
1358
+ AND T.sid = P.sid
1359
+ AND CI.iid = T.tid
1360
+ AND C.cid = CI.cid
1361
+ AND C.type = 1
1362
+ "]
12551363
return
12561364
}
12571365
12581366
# # ## ### ##### ######## #############
12591367
## Configuration
12601368
--- tools/cvs2fossil/lib/c2f_prev.tcl
+++ tools/cvs2fossil/lib/c2f_prev.tcl
@@ -86,14 +86,26 @@
86 delegate method isbranch to mytypeobj
87 delegate method istag to mytypeobj
88
89 method setpos {p} { set mypos $p ; return }
90 method pos {} { return $mypos }
 
 
 
 
 
 
 
91
92 # result = dict (item -> list (changeset))
93 method successormap {} {
94 # NOTE / FUTURE: Possible bottleneck.
 
 
 
 
 
95 array set tmp {}
96 foreach {rev children} [$self nextmap] {
97 foreach child $children {
98 lappend tmp($rev) $myitemmap($child)
99 }
@@ -100,25 +112,18 @@
100 set tmp($rev) [lsort -unique $tmp($rev)]
101 }
102 return [array get tmp]
103 }
104
105 # result = list (changeset)
106 method successors {} {
107 # NOTE / FUTURE: Possible bottleneck.
108 set csets {}
109 foreach {_ children} [$self nextmap] {
110 foreach child $children {
111 lappend csets $myitemmap($child)
112 }
113 }
114 return [lsort -unique $csets]
115 }
116
117 # result = dict (item -> list (changeset))
118 method predecessormap {} {
119 # NOTE / FUTURE: Possible bottleneck.
 
 
 
 
 
120 array set tmp {}
121 foreach {rev children} [$self premap] {
122 foreach child $children {
123 lappend tmp($rev) $myitemmap($child)
124 }
@@ -1057,10 +1062,68 @@
1057 "] {
1058 lappend dependencies([list rev $rid]) [list sym::branch $parent]
1059 }
1060 return
1061 }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1062 }
1063
1064 # # ## ### ##### ######## ############# #####################
1065 ## Helper singleton. Commands for tag symbol changesets.
1066
@@ -1138,10 +1201,16 @@
1138 "] {
1139 lappend dependencies([list sym::tag $tid]) [list sym::tag $parent]
1140 }
1141 return
1142 }
 
 
 
 
 
 
1143 }
1144
1145 # # ## ### ##### ######## ############# #####################
1146 ## Helper singleton. Commands for branch symbol changesets.
1147
@@ -1250,10 +1319,49 @@
1250 AND B.sid = P.sid
1251 AND P.pid = T.sid
1252 "] {
1253 lappend dependencies([list sym::branch $bid]) [list sym::tag $parent]
1254 }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1255 return
1256 }
1257
1258 # # ## ### ##### ######## #############
1259 ## Configuration
1260
--- tools/cvs2fossil/lib/c2f_prev.tcl
+++ tools/cvs2fossil/lib/c2f_prev.tcl
@@ -86,14 +86,26 @@
86 delegate method isbranch to mytypeobj
87 delegate method istag to mytypeobj
88
89 method setpos {p} { set mypos $p ; return }
90 method pos {} { return $mypos }
91
92 # result = list (changeset)
93 method successors {} {
94 return [struct::list map \
95 [$mytypeobj cs_successors $myitems] \
96 [mytypemethod of]]
97 }
98
99 # result = dict (item -> list (changeset))
100 method successormap {} {
101 # NOTE / FUTURE: Definitive bottleneck (can be millions of pairs).
102 #
103 # Only user is pass 9, computing the limits of backward
104 # branches per branch in the changeset. TODO: Fold that into
105 # the SQL query, i.e. move the crunching from Tcl to C.
106
107 array set tmp {}
108 foreach {rev children} [$self nextmap] {
109 foreach child $children {
110 lappend tmp($rev) $myitemmap($child)
111 }
@@ -100,25 +112,18 @@
112 set tmp($rev) [lsort -unique $tmp($rev)]
113 }
114 return [array get tmp]
115 }
116
 
 
 
 
 
 
 
 
 
 
 
 
117 # result = dict (item -> list (changeset))
118 method predecessormap {} {
119 # NOTE / FUTURE: Definitive bottleneck (can be millions of pairs).
120 #
121 # Only user is pass 9, computing the limits of backward
122 # branches per branch in the changeset. TODO: Fold that into
123 # the SQL query, i.e. move the crunching from Tcl to C.
124
125 array set tmp {}
126 foreach {rev children} [$self premap] {
127 foreach child $children {
128 lappend tmp($rev) $myitemmap($child)
129 }
@@ -1057,10 +1062,68 @@
1062 "] {
1063 lappend dependencies([list rev $rid]) [list sym::branch $parent]
1064 }
1065 return
1066 }
1067
1068 # result = list (changeset-id)
1069 typemethod cs_successors {revisions} {
1070 # This is a variant of 'successors' which maps the low-level
1071 # data directly to the associated changesets. I.e. instead
1072 # millions of dependency pairs (in extreme cases (Example: Tcl
1073 # CVS)) we return a very short and much more manageable list
1074 # of changesets.
1075
1076 set theset ('[join $revisions {','}]')
1077 return [state run "
1078 SELECT C.cid
1079 FROM revision R, csitem CI, changeset C
1080 WHERE R.rid IN $theset -- Restrict to revisions of interest
1081 AND R.child IS NOT NULL -- Has primary child
1082 AND CI.iid = R.rid
1083 AND C.cid = CI.cid
1084 AND C.type = 0
1085 UNION
1086 SELECT C.cid
1087 FROM revision R, revisionbranchchildren B, csitem CI, changeset C
1088 WHERE R.rid IN $theset -- Restrict to revisions of interest
1089 AND R.rid = B.rid -- Select subset of branch children
1090 AND CI.iid = R.rid
1091 AND C.cid = CI.cid
1092 AND C.type = 0
1093 UNION
1094 SELECT C.cid
1095 FROM revision R, revision RA, csitem CI, changeset C
1096 WHERE R.rid IN $theset -- Restrict to revisions of interest
1097 AND R.isdefault -- Restrict to NTDB
1098 AND R.dbchild IS NOT NULL -- and last NTDB belonging to trunk
1099 AND RA.rid = R.dbchild -- Go directly to trunk root
1100 AND RA.child IS NOT NULL -- Has primary child.
1101 AND CI.iid = R.rid
1102 AND C.cid = CI.cid
1103 AND C.type = 0
1104 AND CI.iid = R.rid
1105 AND C.cid = CI.cid
1106 AND C.type = 0
1107 UNION
1108 SELECT C.cid
1109 FROM revision R, tag T, csitem CI, changeset C
1110 WHERE R.rid in $theset
1111 AND T.rev = R.rid
1112 AND CI.iid = T.tid
1113 AND C.cid = CI.cid
1114 AND C.type = 1
1115 UNION
1116 SELECT C.cid
1117 FROM revision R, branch B, csitem CI, changeset C
1118 WHERE R.rid in $theset
1119 AND B.root = R.rid
1120 AND CI.iid = B.bid
1121 AND C.cid = CI.cid
1122 AND C.type = 2
1123 "]
1124 }
1125 }
1126
1127 # # ## ### ##### ######## ############# #####################
1128 ## Helper singleton. Commands for tag symbol changesets.
1129
@@ -1138,10 +1201,16 @@
1201 "] {
1202 lappend dependencies([list sym::tag $tid]) [list sym::tag $parent]
1203 }
1204 return
1205 }
1206
1207 # result = list (changeset-id)
1208 typemethod cs_successors {tags} {
1209 # Tags have no successors.
1210 return
1211 }
1212 }
1213
1214 # # ## ### ##### ######## ############# #####################
1215 ## Helper singleton. Commands for branch symbol changesets.
1216
@@ -1250,10 +1319,49 @@
1319 AND B.sid = P.sid
1320 AND P.pid = T.sid
1321 "] {
1322 lappend dependencies([list sym::branch $bid]) [list sym::tag $parent]
1323 }
1324 return
1325 }
1326
1327 # result = list (changeset-id)
1328 typemethod cs_successors {branches} {
1329 # This is a variant of 'successors' which maps the low-level
1330 # data directly to the associated changesets. I.e. instead
1331 # millions of dependency pairs (in extreme cases (Example: Tcl
1332 # CVS)) we return a very short and much more manageable list
1333 # of changesets.
1334
1335 set theset ('[join $branches {','}]')
1336 return [state run "
1337 SELECT C.cid
1338 FROM branch B, revision R, csitem CI, changeset C
1339 WHERE B.bid IN $theset
1340 AND B.first = R.rid
1341 AND CI.iid = R.rid
1342 AND C.cid = CI.cid
1343 AND C.type = 0
1344 UNION
1345 SELECT C.cid
1346 FROM branch B, preferedparent P, branch BX, csitem CI, changeset C
1347 WHERE B.bid IN $theset
1348 AND B.sid = P.pid
1349 AND BX.sid = P.sid
1350 AND CI.iid = BX.bid
1351 AND C.cid = CI.cid
1352 AND C.type = 2
1353 UNION
1354 SELECT C.cid
1355 FROM branch B, preferedparent P, tag T, csitem CI, changeset C
1356 WHERE B.bid IN $theset
1357 AND B.sid = P.pid
1358 AND T.sid = P.sid
1359 AND CI.iid = T.tid
1360 AND C.cid = CI.cid
1361 AND C.type = 1
1362 "]
1363 return
1364 }
1365
1366 # # ## ### ##### ######## #############
1367 ## Configuration
1368

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button