Fossil SCM
Updates to the antibot.wiki page, to discuss the latest enhancements to robot defenses.
Commit
14e23927cea4e56aea3ec50e7de9251bd80530bea902cad175405fc66f246efd
Parent
4d198d0e1270ac3…
1 file changed
+34
-3
+34
-3
| --- www/antibot.wiki | ||
| +++ www/antibot.wiki | ||
| @@ -3,17 +3,26 @@ | ||
| 3 | 3 | A typical Fossil website can have billions and billions of pages, |
| 4 | 4 | and many of those pages (for example diffs and annotations and tarballs) |
| 5 | 5 | can be expensive to compute. |
| 6 | 6 | If a robot walks a Fossil-generated website, |
| 7 | 7 | it can present a crippling bandwidth and CPU load. |
| 8 | +A "robots.txt" file can help, but in practice, most robots these | |
| 9 | +days ignore the robots.txt file, so it won't help much. | |
| 8 | 10 | |
| 9 | 11 | A Fossil website is intended to be used |
| 10 | 12 | interactively by humans, not walked by robots. This article |
| 11 | 13 | describes the techniques used by Fossil to try to welcome human |
| 12 | 14 | users while keeping out robots. |
| 13 | 15 | |
| 14 | -<h2>Setting Up Anti-Robot Defenses</h2> | |
| 16 | +<h2>Defenses Are Enabled By Default</h2> | |
| 17 | + | |
| 18 | +In the latest implementations of Fossil, most robot defenses are | |
| 19 | +enabled by default. You can probably get by with standing up a | |
| 20 | +public-facing Fossil instance in the default configuration. But | |
| 21 | +you can also customize the defenses to serve your particular needs. | |
| 22 | + | |
| 23 | +<h2>Customizing Anti-Robot Defenses</h2> | |
| 15 | 24 | |
| 16 | 25 | Admin users can configure robot defenses on the |
| 17 | 26 | "Robot Defense Settings" page (/setup_robot). |
| 18 | 27 | That page is accessible (to Admin users) from the default menu bar |
| 19 | 28 | by click on the "Admin" menu choice, then selecting the |
| @@ -130,11 +139,11 @@ | ||
| 130 | 139 | The [/help?cmd=robot-restrict|robot-restrict setting] is a comma-separated |
| 131 | 140 | list of GLOB patterns for pages for which robot access is prohibited. |
| 132 | 141 | The default value is: |
| 133 | 142 | |
| 134 | 143 | <blockquote><pre> |
| 135 | -timelineX,diff,annotate,zip,fileage,file,finfo,reports | |
| 144 | +timelineX,diff,annotate,fileage,file,finfo,reports | |
| 136 | 145 | </pre></blockquote> |
| 137 | 146 | |
| 138 | 147 | Each entry corresponds to the first path element on the URI for a |
| 139 | 148 | Fossil-generated page. If Fossil does not know for certain that the |
| 140 | 149 | HTTP request is coming from a human, then any attempt to access one of |
| @@ -156,10 +165,26 @@ | ||
| 156 | 165 | prohibited. |
| 157 | 166 | |
| 158 | 167 | * <b>zip →</b> |
| 159 | 168 | The special "zip" keyword also matches "/tarball/" and "/sqlar/". |
| 160 | 169 | |
| 170 | + * <b>zipX →</b> | |
| 171 | + This is like "zip" in that it restricts access to "/zip/", "/tarball"/ | |
| 172 | + and "/sqlar/" but with exceptions:<ol type="a"> | |
| 173 | + <li><p> If the [/help?cmd=robot-zip-leaf|robot-zip-leaf] setting is | |
| 174 | + true, then tarballs of leaf check-ins are allowed. This permits | |
| 175 | + URLs that attempt to download the latest check-in on trunk or | |
| 176 | + from a named branch, for example. | |
| 177 | + <li><p> If a check-in has a tag that matches the GLOB list in | |
| 178 | + [/help?cmd=robot-zip-tag|robot-zip-tag], then tarballs of that | |
| 179 | + check-in are allowed. This allow check-ins tagged with | |
| 180 | + "release" or "allow-robots" (for example) to be downloaded | |
| 181 | + without restriction. | |
| 182 | + </ol> | |
| 183 | + The "zipX" restriction is not in the default robot-restrict setting. | |
| 184 | + This is something you might want to add, depending on your needs. | |
| 185 | + | |
| 161 | 186 | * <b>diff →</b> |
| 162 | 187 | This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that |
| 163 | 188 | is primarily about showing the difference between two check-ins or two |
| 164 | 189 | file versioons. |
| 165 | 190 | |
| @@ -167,15 +192,21 @@ | ||
| 167 | 192 | This also matches /blame/ and /praise/. |
| 168 | 193 | |
| 169 | 194 | Other special keywords may be added in the future. |
| 170 | 195 | |
| 171 | 196 | The default [/help?cmd=robot-restrict|robot-restrict] |
| 172 | -setting has been shown in practice to do a great job of keeping | |
| 197 | +setting has been shown in practice to do a good job of keeping | |
| 173 | 198 | robots from consuming all available CPU and bandwidth while will |
| 174 | 199 | still allowing humans access to the full power of the site without |
| 175 | 200 | having to be logged in. |
| 176 | 201 | |
| 202 | +One possible enhancement is to add "zipX" to the | |
| 203 | +[/help?cmd=robot-restrict|robot-restrict] setting, | |
| 204 | +and enable [help?cmd=robot-zip-leaf|robot-zip-leaf] | |
| 205 | +and configure [help?cmd=robot-zip-tag|robot-zip-tag]. | |
| 206 | +Do this if you find that robots downloading lots of | |
| 207 | +obscure tarballs is causing load issues on your site. | |
| 177 | 208 | |
| 178 | 209 | <h2>Anti-robot Exception RegExps</h2> |
| 179 | 210 | |
| 180 | 211 | The [/help?cmd=robot-exception|robot-exception setting] under the name |
| 181 | 212 | of <b>Exceptions to anti-robot restrictions</b> is a list of |
| 182 | 213 |
| --- www/antibot.wiki | |
| +++ www/antibot.wiki | |
| @@ -3,17 +3,26 @@ | |
| 3 | A typical Fossil website can have billions and billions of pages, |
| 4 | and many of those pages (for example diffs and annotations and tarballs) |
| 5 | can be expensive to compute. |
| 6 | If a robot walks a Fossil-generated website, |
| 7 | it can present a crippling bandwidth and CPU load. |
| 8 | |
| 9 | A Fossil website is intended to be used |
| 10 | interactively by humans, not walked by robots. This article |
| 11 | describes the techniques used by Fossil to try to welcome human |
| 12 | users while keeping out robots. |
| 13 | |
| 14 | <h2>Setting Up Anti-Robot Defenses</h2> |
| 15 | |
| 16 | Admin users can configure robot defenses on the |
| 17 | "Robot Defense Settings" page (/setup_robot). |
| 18 | That page is accessible (to Admin users) from the default menu bar |
| 19 | by click on the "Admin" menu choice, then selecting the |
| @@ -130,11 +139,11 @@ | |
| 130 | The [/help?cmd=robot-restrict|robot-restrict setting] is a comma-separated |
| 131 | list of GLOB patterns for pages for which robot access is prohibited. |
| 132 | The default value is: |
| 133 | |
| 134 | <blockquote><pre> |
| 135 | timelineX,diff,annotate,zip,fileage,file,finfo,reports |
| 136 | </pre></blockquote> |
| 137 | |
| 138 | Each entry corresponds to the first path element on the URI for a |
| 139 | Fossil-generated page. If Fossil does not know for certain that the |
| 140 | HTTP request is coming from a human, then any attempt to access one of |
| @@ -156,10 +165,26 @@ | |
| 156 | prohibited. |
| 157 | |
| 158 | * <b>zip →</b> |
| 159 | The special "zip" keyword also matches "/tarball/" and "/sqlar/". |
| 160 | |
| 161 | * <b>diff →</b> |
| 162 | This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that |
| 163 | is primarily about showing the difference between two check-ins or two |
| 164 | file versioons. |
| 165 | |
| @@ -167,15 +192,21 @@ | |
| 167 | This also matches /blame/ and /praise/. |
| 168 | |
| 169 | Other special keywords may be added in the future. |
| 170 | |
| 171 | The default [/help?cmd=robot-restrict|robot-restrict] |
| 172 | setting has been shown in practice to do a great job of keeping |
| 173 | robots from consuming all available CPU and bandwidth while will |
| 174 | still allowing humans access to the full power of the site without |
| 175 | having to be logged in. |
| 176 | |
| 177 | |
| 178 | <h2>Anti-robot Exception RegExps</h2> |
| 179 | |
| 180 | The [/help?cmd=robot-exception|robot-exception setting] under the name |
| 181 | of <b>Exceptions to anti-robot restrictions</b> is a list of |
| 182 |
| --- www/antibot.wiki | |
| +++ www/antibot.wiki | |
| @@ -3,17 +3,26 @@ | |
| 3 | A typical Fossil website can have billions and billions of pages, |
| 4 | and many of those pages (for example diffs and annotations and tarballs) |
| 5 | can be expensive to compute. |
| 6 | If a robot walks a Fossil-generated website, |
| 7 | it can present a crippling bandwidth and CPU load. |
| 8 | A "robots.txt" file can help, but in practice, most robots these |
| 9 | days ignore the robots.txt file, so it won't help much. |
| 10 | |
| 11 | A Fossil website is intended to be used |
| 12 | interactively by humans, not walked by robots. This article |
| 13 | describes the techniques used by Fossil to try to welcome human |
| 14 | users while keeping out robots. |
| 15 | |
| 16 | <h2>Defenses Are Enabled By Default</h2> |
| 17 | |
| 18 | In the latest implementations of Fossil, most robot defenses are |
| 19 | enabled by default. You can probably get by with standing up a |
| 20 | public-facing Fossil instance in the default configuration. But |
| 21 | you can also customize the defenses to serve your particular needs. |
| 22 | |
| 23 | <h2>Customizing Anti-Robot Defenses</h2> |
| 24 | |
| 25 | Admin users can configure robot defenses on the |
| 26 | "Robot Defense Settings" page (/setup_robot). |
| 27 | That page is accessible (to Admin users) from the default menu bar |
| 28 | by click on the "Admin" menu choice, then selecting the |
| @@ -130,11 +139,11 @@ | |
| 139 | The [/help?cmd=robot-restrict|robot-restrict setting] is a comma-separated |
| 140 | list of GLOB patterns for pages for which robot access is prohibited. |
| 141 | The default value is: |
| 142 | |
| 143 | <blockquote><pre> |
| 144 | timelineX,diff,annotate,fileage,file,finfo,reports |
| 145 | </pre></blockquote> |
| 146 | |
| 147 | Each entry corresponds to the first path element on the URI for a |
| 148 | Fossil-generated page. If Fossil does not know for certain that the |
| 149 | HTTP request is coming from a human, then any attempt to access one of |
| @@ -156,10 +165,26 @@ | |
| 165 | prohibited. |
| 166 | |
| 167 | * <b>zip →</b> |
| 168 | The special "zip" keyword also matches "/tarball/" and "/sqlar/". |
| 169 | |
| 170 | * <b>zipX →</b> |
| 171 | This is like "zip" in that it restricts access to "/zip/", "/tarball"/ |
| 172 | and "/sqlar/" but with exceptions:<ol type="a"> |
| 173 | <li><p> If the [/help?cmd=robot-zip-leaf|robot-zip-leaf] setting is |
| 174 | true, then tarballs of leaf check-ins are allowed. This permits |
| 175 | URLs that attempt to download the latest check-in on trunk or |
| 176 | from a named branch, for example. |
| 177 | <li><p> If a check-in has a tag that matches the GLOB list in |
| 178 | [/help?cmd=robot-zip-tag|robot-zip-tag], then tarballs of that |
| 179 | check-in are allowed. This allow check-ins tagged with |
| 180 | "release" or "allow-robots" (for example) to be downloaded |
| 181 | without restriction. |
| 182 | </ol> |
| 183 | The "zipX" restriction is not in the default robot-restrict setting. |
| 184 | This is something you might want to add, depending on your needs. |
| 185 | |
| 186 | * <b>diff →</b> |
| 187 | This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that |
| 188 | is primarily about showing the difference between two check-ins or two |
| 189 | file versioons. |
| 190 | |
| @@ -167,15 +192,21 @@ | |
| 192 | This also matches /blame/ and /praise/. |
| 193 | |
| 194 | Other special keywords may be added in the future. |
| 195 | |
| 196 | The default [/help?cmd=robot-restrict|robot-restrict] |
| 197 | setting has been shown in practice to do a good job of keeping |
| 198 | robots from consuming all available CPU and bandwidth while will |
| 199 | still allowing humans access to the full power of the site without |
| 200 | having to be logged in. |
| 201 | |
| 202 | One possible enhancement is to add "zipX" to the |
| 203 | [/help?cmd=robot-restrict|robot-restrict] setting, |
| 204 | and enable [help?cmd=robot-zip-leaf|robot-zip-leaf] |
| 205 | and configure [help?cmd=robot-zip-tag|robot-zip-tag]. |
| 206 | Do this if you find that robots downloading lots of |
| 207 | obscure tarballs is causing load issues on your site. |
| 208 | |
| 209 | <h2>Anti-robot Exception RegExps</h2> |
| 210 | |
| 211 | The [/help?cmd=robot-exception|robot-exception setting] under the name |
| 212 | of <b>Exceptions to anti-robot restrictions</b> is a list of |
| 213 |