Fossil SCM

Updates to the antibot.wiki page, to discuss the latest enhancements to robot defenses.

drh 2025-10-09 13:35 trunk
Commit 14e23927cea4e56aea3ec50e7de9251bd80530bea902cad175405fc66f246efd
1 file changed +34 -3
+34 -3
--- www/antibot.wiki
+++ www/antibot.wiki
@@ -3,17 +3,26 @@
33
A typical Fossil website can have billions and billions of pages,
44
and many of those pages (for example diffs and annotations and tarballs)
55
can be expensive to compute.
66
If a robot walks a Fossil-generated website,
77
it can present a crippling bandwidth and CPU load.
8
+A "robots.txt" file can help, but in practice, most robots these
9
+days ignore the robots.txt file, so it won't help much.
810
911
A Fossil website is intended to be used
1012
interactively by humans, not walked by robots. This article
1113
describes the techniques used by Fossil to try to welcome human
1214
users while keeping out robots.
1315
14
-<h2>Setting Up Anti-Robot Defenses</h2>
16
+<h2>Defenses Are Enabled By Default</h2>
17
+
18
+In the latest implementations of Fossil, most robot defenses are
19
+enabled by default. You can probably get by with standing up a
20
+public-facing Fossil instance in the default configuration. But
21
+you can also customize the defenses to serve your particular needs.
22
+
23
+<h2>Customizing Anti-Robot Defenses</h2>
1524
1625
Admin users can configure robot defenses on the
1726
"Robot Defense Settings" page (/setup_robot).
1827
That page is accessible (to Admin users) from the default menu bar
1928
by click on the "Admin" menu choice, then selecting the
@@ -130,11 +139,11 @@
130139
The [/help?cmd=robot-restrict|robot-restrict setting] is a comma-separated
131140
list of GLOB patterns for pages for which robot access is prohibited.
132141
The default value is:
133142
134143
<blockquote><pre>
135
-timelineX,diff,annotate,zip,fileage,file,finfo,reports
144
+timelineX,diff,annotate,fileage,file,finfo,reports
136145
</pre></blockquote>
137146
138147
Each entry corresponds to the first path element on the URI for a
139148
Fossil-generated page. If Fossil does not know for certain that the
140149
HTTP request is coming from a human, then any attempt to access one of
@@ -156,10 +165,26 @@
156165
prohibited.
157166
158167
* <b>zip &rarr;</b>
159168
The special "zip" keyword also matches "/tarball/" and "/sqlar/".
160169
170
+ * <b>zipX &rarr;</b>
171
+ This is like "zip" in that it restricts access to "/zip/", "/tarball"/
172
+ and "/sqlar/" but with exceptions:<ol type="a">
173
+ <li><p> If the [/help?cmd=robot-zip-leaf|robot-zip-leaf] setting is
174
+ true, then tarballs of leaf check-ins are allowed. This permits
175
+ URLs that attempt to download the latest check-in on trunk or
176
+ from a named branch, for example.
177
+ <li><p> If a check-in has a tag that matches the GLOB list in
178
+ [/help?cmd=robot-zip-tag|robot-zip-tag], then tarballs of that
179
+ check-in are allowed. This allow check-ins tagged with
180
+ "release" or "allow-robots" (for example) to be downloaded
181
+ without restriction.
182
+ </ol>
183
+ The "zipX" restriction is not in the default robot-restrict setting.
184
+ This is something you might want to add, depending on your needs.
185
+
161186
* <b>diff &rarr;</b>
162187
This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that
163188
is primarily about showing the difference between two check-ins or two
164189
file versioons.
165190
@@ -167,15 +192,21 @@
167192
This also matches /blame/ and /praise/.
168193
169194
Other special keywords may be added in the future.
170195
171196
The default [/help?cmd=robot-restrict|robot-restrict]
172
-setting has been shown in practice to do a great job of keeping
197
+setting has been shown in practice to do a good job of keeping
173198
robots from consuming all available CPU and bandwidth while will
174199
still allowing humans access to the full power of the site without
175200
having to be logged in.
176201
202
+One possible enhancement is to add "zipX" to the
203
+[/help?cmd=robot-restrict|robot-restrict] setting,
204
+and enable [help?cmd=robot-zip-leaf|robot-zip-leaf]
205
+and configure [help?cmd=robot-zip-tag|robot-zip-tag].
206
+Do this if you find that robots downloading lots of
207
+obscure tarballs is causing load issues on your site.
177208
178209
<h2>Anti-robot Exception RegExps</h2>
179210
180211
The [/help?cmd=robot-exception|robot-exception setting] under the name
181212
of <b>Exceptions to anti-robot restrictions</b> is a list of
182213
--- www/antibot.wiki
+++ www/antibot.wiki
@@ -3,17 +3,26 @@
3 A typical Fossil website can have billions and billions of pages,
4 and many of those pages (for example diffs and annotations and tarballs)
5 can be expensive to compute.
6 If a robot walks a Fossil-generated website,
7 it can present a crippling bandwidth and CPU load.
 
 
8
9 A Fossil website is intended to be used
10 interactively by humans, not walked by robots. This article
11 describes the techniques used by Fossil to try to welcome human
12 users while keeping out robots.
13
14 <h2>Setting Up Anti-Robot Defenses</h2>
 
 
 
 
 
 
 
15
16 Admin users can configure robot defenses on the
17 "Robot Defense Settings" page (/setup_robot).
18 That page is accessible (to Admin users) from the default menu bar
19 by click on the "Admin" menu choice, then selecting the
@@ -130,11 +139,11 @@
130 The [/help?cmd=robot-restrict|robot-restrict setting] is a comma-separated
131 list of GLOB patterns for pages for which robot access is prohibited.
132 The default value is:
133
134 <blockquote><pre>
135 timelineX,diff,annotate,zip,fileage,file,finfo,reports
136 </pre></blockquote>
137
138 Each entry corresponds to the first path element on the URI for a
139 Fossil-generated page. If Fossil does not know for certain that the
140 HTTP request is coming from a human, then any attempt to access one of
@@ -156,10 +165,26 @@
156 prohibited.
157
158 * <b>zip &rarr;</b>
159 The special "zip" keyword also matches "/tarball/" and "/sqlar/".
160
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161 * <b>diff &rarr;</b>
162 This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that
163 is primarily about showing the difference between two check-ins or two
164 file versioons.
165
@@ -167,15 +192,21 @@
167 This also matches /blame/ and /praise/.
168
169 Other special keywords may be added in the future.
170
171 The default [/help?cmd=robot-restrict|robot-restrict]
172 setting has been shown in practice to do a great job of keeping
173 robots from consuming all available CPU and bandwidth while will
174 still allowing humans access to the full power of the site without
175 having to be logged in.
176
 
 
 
 
 
 
177
178 <h2>Anti-robot Exception RegExps</h2>
179
180 The [/help?cmd=robot-exception|robot-exception setting] under the name
181 of <b>Exceptions to anti-robot restrictions</b> is a list of
182
--- www/antibot.wiki
+++ www/antibot.wiki
@@ -3,17 +3,26 @@
3 A typical Fossil website can have billions and billions of pages,
4 and many of those pages (for example diffs and annotations and tarballs)
5 can be expensive to compute.
6 If a robot walks a Fossil-generated website,
7 it can present a crippling bandwidth and CPU load.
8 A "robots.txt" file can help, but in practice, most robots these
9 days ignore the robots.txt file, so it won't help much.
10
11 A Fossil website is intended to be used
12 interactively by humans, not walked by robots. This article
13 describes the techniques used by Fossil to try to welcome human
14 users while keeping out robots.
15
16 <h2>Defenses Are Enabled By Default</h2>
17
18 In the latest implementations of Fossil, most robot defenses are
19 enabled by default. You can probably get by with standing up a
20 public-facing Fossil instance in the default configuration. But
21 you can also customize the defenses to serve your particular needs.
22
23 <h2>Customizing Anti-Robot Defenses</h2>
24
25 Admin users can configure robot defenses on the
26 "Robot Defense Settings" page (/setup_robot).
27 That page is accessible (to Admin users) from the default menu bar
28 by click on the "Admin" menu choice, then selecting the
@@ -130,11 +139,11 @@
139 The [/help?cmd=robot-restrict|robot-restrict setting] is a comma-separated
140 list of GLOB patterns for pages for which robot access is prohibited.
141 The default value is:
142
143 <blockquote><pre>
144 timelineX,diff,annotate,fileage,file,finfo,reports
145 </pre></blockquote>
146
147 Each entry corresponds to the first path element on the URI for a
148 Fossil-generated page. If Fossil does not know for certain that the
149 HTTP request is coming from a human, then any attempt to access one of
@@ -156,10 +165,26 @@
165 prohibited.
166
167 * <b>zip &rarr;</b>
168 The special "zip" keyword also matches "/tarball/" and "/sqlar/".
169
170 * <b>zipX &rarr;</b>
171 This is like "zip" in that it restricts access to "/zip/", "/tarball"/
172 and "/sqlar/" but with exceptions:<ol type="a">
173 <li><p> If the [/help?cmd=robot-zip-leaf|robot-zip-leaf] setting is
174 true, then tarballs of leaf check-ins are allowed. This permits
175 URLs that attempt to download the latest check-in on trunk or
176 from a named branch, for example.
177 <li><p> If a check-in has a tag that matches the GLOB list in
178 [/help?cmd=robot-zip-tag|robot-zip-tag], then tarballs of that
179 check-in are allowed. This allow check-ins tagged with
180 "release" or "allow-robots" (for example) to be downloaded
181 without restriction.
182 </ol>
183 The "zipX" restriction is not in the default robot-restrict setting.
184 This is something you might want to add, depending on your needs.
185
186 * <b>diff &rarr;</b>
187 This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that
188 is primarily about showing the difference between two check-ins or two
189 file versioons.
190
@@ -167,15 +192,21 @@
192 This also matches /blame/ and /praise/.
193
194 Other special keywords may be added in the future.
195
196 The default [/help?cmd=robot-restrict|robot-restrict]
197 setting has been shown in practice to do a good job of keeping
198 robots from consuming all available CPU and bandwidth while will
199 still allowing humans access to the full power of the site without
200 having to be logged in.
201
202 One possible enhancement is to add "zipX" to the
203 [/help?cmd=robot-restrict|robot-restrict] setting,
204 and enable [help?cmd=robot-zip-leaf|robot-zip-leaf]
205 and configure [help?cmd=robot-zip-tag|robot-zip-tag].
206 Do this if you find that robots downloading lots of
207 obscure tarballs is causing load issues on your site.
208
209 <h2>Anti-robot Exception RegExps</h2>
210
211 The [/help?cmd=robot-exception|robot-exception setting] under the name
212 of <b>Exceptions to anti-robot restrictions</b> is a list of
213

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button