Fossil SCM
Substantial and minor changes to the file globs document taking advice from Warren Young's email.
Commit
1239b6c47041082e13f7784a7476975dbcae3310bddd0f6a5cf9081dd860e8c2
Parent
565685b5c8d041a…
1 file changed
+186
-101
+186
-101
| --- www/globs.md | ||
| +++ www/globs.md | ||
| @@ -1,113 +1,189 @@ | ||
| 1 | 1 | File Name GLOB Patterns |
| 2 | 2 | ======================= |
| 3 | 3 | |
| 4 | -A number of settings (and options to certain commands as well as query | |
| 5 | -parameters to certain pages) are documented as one or more GLOB | |
| 6 | -patterns that will match files either on the disk or in the active | |
| 7 | -checkout. | |
| 8 | - | |
| 9 | -A GLOB pattern is described as a pattern that matches file names, and | |
| 10 | -some of the individual commands show examples of simple GLOBs. The | |
| 11 | -examples show use of `*` as a wild card, and hint that more is | |
| 12 | -possible. | |
| 13 | - | |
| 14 | -In many cases more than one GLOB may be specified as a comma or | |
| 15 | -white space separated list of GLOB patterns. Several spots in the | |
| 16 | -command help mention that GLOB patterns may be quoted with single or | |
| 17 | -double quotes so that spaces and commas may be included in the pattern | |
| 18 | -if needed. | |
| 19 | - | |
| 20 | -Outside of this document, only the source code contains the exact | |
| 21 | -specification of the complete syntax of a GLOB pattern. | |
| 4 | +A [glob pattern][glob] is a text expression that matches one or more | |
| 5 | +file names using wild cards familiar to most users of a command line. | |
| 6 | +For example, `*` is a glob that matches any name at all and | |
| 7 | +`Readme.txt` is a glob that matches exactly one file. Note that | |
| 8 | +although they are related, glob patterns are not the same thing as a | |
| 9 | +[regular expression or regexp][regexp]. | |
| 10 | + | |
| 11 | +[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia) | |
| 12 | +[regexp]: https://en.wikipedia.org/wiki/Regular_expression | |
| 13 | + | |
| 14 | + | |
| 15 | +A number of fossil setting values hold one or more file glob patterns | |
| 16 | +that will match files either on the disk or in the active checkout. | |
| 17 | +Glob patterns are also accepted in options to certain commands as well | |
| 18 | +as query parameters to certain pages. | |
| 19 | + | |
| 20 | +In many cases more than one glob may be specified in a setting, | |
| 21 | +option, or query parameter by listing multiple globs separated by a | |
| 22 | +comma or white space. If a glob must contain commas or spaces, | |
| 23 | +surround it with single or double quotation marks. | |
| 24 | + | |
| 25 | +Of course, many fossil commands also accept lists of files to act on, | |
| 26 | +and those also may be specified with globs. Although those glob | |
| 27 | +patterns are similar to what is described here, they are not defined | |
| 28 | +by fossil, but rather by the conventions of the operating system in | |
| 29 | +use. | |
| 30 | + | |
| 22 | 31 | |
| 23 | 32 | ## Syntax |
| 24 | 33 | |
| 25 | - any Any character not mentioned matches exactly that character | |
| 34 | +A list of glob patterns is simply one or more glob patterns separated | |
| 35 | +by white space or commas. If a glob must contain white spaces or | |
| 36 | +commas, it can be quoted with either single or double quotation marks. | |
| 37 | +A list is said to match if any one (or more) globs in the list | |
| 38 | +matches. | |
| 39 | + | |
| 40 | +A glob pattern is a collection of characters compared to a target | |
| 41 | +text, usually a file name. The whole glob is said to match if it | |
| 42 | +successfully consumes and matches the entire target text. Glob | |
| 43 | +patterns are made up of ordinary characters and special characters. | |
| 44 | + | |
| 45 | +Ordinary characters consume a single character of the target and must | |
| 46 | +match it exactly. | |
| 47 | + | |
| 48 | +Special characters (and special character sequences) consume zero or | |
| 49 | +more characters from the target and describe what matches. The special | |
| 50 | +characters (and sequences) are: | |
| 51 | + | |
| 26 | 52 | * Matches any sequence of zero or more characters. |
| 27 | 53 | ? Matches exactly one character. |
| 28 | 54 | [...] Matches one character from the enclosed list of characters. |
| 29 | 55 | [^...] Matches one character not in the enclosed list. |
| 30 | 56 | |
| 31 | -Lists of characters have some additional features. | |
| 32 | - | |
| 33 | - * A range of characters may be specified with `-`, so `[a-d]` matches | |
| 34 | - exactly the same characters as `[abcd]`. | |
| 35 | - * Include `-` in a list by placing it last, just before the `]`. | |
| 36 | - * Include `]` in a list by making the first character after the `[` or | |
| 37 | - `[^`. At any other place, `]` ends the list. | |
| 38 | - * Include `^` in a list by placing anywhere except first after the | |
| 39 | - `[`. | |
| 40 | - | |
| 41 | - | |
| 42 | -Some examples: | |
| 43 | - | |
| 44 | - [a-d] Matches any one of `a`, `b`, `c`, or `d` | |
| 45 | - [a-] Matches either `a` or `-` | |
| 46 | - [][] Matches either `]` or `[` | |
| 47 | - [^]] Matches exactly one character other than `]` | |
| 48 | - []^] Matches either `]` or `^` | |
| 49 | - | |
| 50 | -The glob is compared to the canonical name of the file in the checkout | |
| 51 | -tree, and must match the entire name to be considered a match. | |
| 52 | - | |
| 53 | -Unlike typical Unix shell globs, wildcard sequences are allowed to | |
| 54 | -match `/` directory separators as well as the initial `.` in the name | |
| 55 | -of a hidden file or directory. | |
| 56 | - | |
| 57 | -A list of GLOBs is simply one or more GLOBs separated by whitespace or | |
| 58 | -commas. If a GLOB must contain a space or comma, it can be quoted with | |
| 59 | -either single or double quotation marks. | |
| 60 | - | |
| 61 | -Since a newline is considered to be whitespace, a list of GLOBs in a | |
| 62 | -file (as for a versioned setting) may have one GLOB per line. | |
| 63 | - | |
| 64 | - | |
| 65 | -## File names to match | |
| 66 | - | |
| 67 | -Before comparing to a GLOB pattern, each file name is transformed to a | |
| 68 | -canonical form. Although the real process is more complicated, the | |
| 69 | -canonical name of a file has all directory separators changed to `/`, | |
| 70 | -and all `/./` and `/../` sequences removed. The goal is a name that is | |
| 71 | -the simplest possible while still specific to each particular file. | |
| 72 | - | |
| 73 | -This has some consequences. | |
| 74 | - | |
| 75 | -The simplest GLOB pattern is just a bare name of a file named with the | |
| 76 | -usual assortment of allowed file name characters. Such a pattern | |
| 77 | -matches that one file: the GLOB `README` matches only a file named | |
| 78 | -`README` in the root of the tree. The GLOB `*/README` would match a | |
| 79 | -file named `README` anywhere except the root, since the glob requires | |
| 80 | -that at least one `/` be in the name. (Recall that `/` matches the | |
| 81 | -directory separator regardless of whether it is `/` or `\` on your | |
| 82 | -system.) | |
| 83 | - | |
| 84 | - | |
| 85 | - | |
| 86 | - | |
| 87 | -## Where are they used | |
| 88 | - | |
| 89 | -### Settings that use GLOBs | |
| 90 | - | |
| 91 | -These settings are all lists of GLOBs. All may be global, local, or | |
| 92 | -versioned. Use `fossil settings` to manage global and local settings, | |
| 93 | -or file in the repository's `.fossil-settings/` folder named for each | |
| 94 | -for versioned setting. | |
| 57 | +Special character sequences have some additional features: | |
| 58 | + | |
| 59 | + * A range of characters may be specified with `-`, so `[a-d]` matches | |
| 60 | + exactly the same characters as `[abcd]`. Ranges reflect Unicode | |
| 61 | + code points without any locale-specific collation sequence. | |
| 62 | + * Include `-` in a list by placing it last, just before the `]`. | |
| 63 | + * Include `]` in a list by making the first character after the `[` or | |
| 64 | + `[^`. At any other place, `]` ends the list. | |
| 65 | + * Include `^` in a list by placing anywhere except first after the | |
| 66 | + `[`. | |
| 67 | + * Some examples of character lists: | |
| 68 | + `[a-d]` Matches any one of `a`, `b`, `c`, or `d` but not `ä`; | |
| 69 | + `[^a-d]` Matches exactly one character other than `a`, `b`, `c`, | |
| 70 | + or `d`; | |
| 71 | + `[0-9a-fA-F]` Matches exactly one hexadecimal digit; | |
| 72 | + `[a-]` Matches either `a` or `-`; | |
| 73 | + `[][]` Matches either `]` or `[`; | |
| 74 | + `[^]]` Matches exactly one character other than `]`; | |
| 75 | + `[]^]` Matches either `]` or `^`; and | |
| 76 | + `[^-]` Matches exactly one character other than `-`. | |
| 77 | + * Beware that ranges in lists may include more than you expect: | |
| 78 | + `[A-z]` Matches `A` and `Z`, but also matches `a` and some less | |
| 79 | + obvious characters such as `[`, `\`, and `]` with code point | |
| 80 | + values between `Z` and `a`. | |
| 81 | + * Beware that a range must be specified from low value to high | |
| 82 | + value: `[z-a]` does not match any character at all, preventing the | |
| 83 | + entire glob from matching. | |
| 84 | + * Note that unlike typical Unix shell globs, wildcards (`*`, `?`, | |
| 85 | + and character lists) are allowed to match `/` directory | |
| 86 | + separators as well as the initial `.` in the name of a hidden | |
| 87 | + file or directory. | |
| 88 | + | |
| 89 | + | |
| 90 | +White space means the ASCII characters TAB, LF, VT, FF, CR, and SPACE. | |
| 91 | +Note that this does not include any of the many additional spacing | |
| 92 | +characters available in Unicode, and specifically does not include | |
| 93 | +U+00A0 NO-BREAK SPACE. | |
| 94 | + | |
| 95 | +Because both LF and CR are white space and leading and trailing spaces | |
| 96 | +are stripped from each glob in a list, a list of globs may be broken | |
| 97 | +into lines between globs when the list is stored in a file (as for a | |
| 98 | +versioned setting). | |
| 99 | + | |
| 100 | +Similarly 'single quotes' and "double quotes" are the ASCII straight | |
| 101 | +quote characters, not any of the other quotation marks provided in | |
| 102 | +Unicode and specifically not the "curly" quotes preferred by | |
| 103 | +typesetters and word processors. | |
| 104 | + | |
| 105 | + | |
| 106 | +## File Names to Match | |
| 107 | + | |
| 108 | +Before it is compared to a glob pattern, each file name is transformed | |
| 109 | +to a canonical form. The glob must match the entire canonical file | |
| 110 | +name to be considered a match. | |
| 111 | + | |
| 112 | +The canonical name of a file has all directory separators changed to | |
| 113 | +`/`, redundant slashes are removed, all `.` path components are | |
| 114 | +removed, and all `..` path components are resolved. (There are | |
| 115 | +additional details we won’t go into here.) | |
| 116 | + | |
| 117 | +The goal is a name that is the simplest possible for each particular | |
| 118 | +file, and will be the same on Windows, Unix, and any other platform | |
| 119 | +where fossil is run. | |
| 120 | + | |
| 121 | +Beware, however, that all glob matching is case sensitive. This will | |
| 122 | +not be a surprise on Unix where all file names are also case | |
| 123 | +sensitive. However, most Windows file systems are case preserving and | |
| 124 | +case insensitive. On Windows, the names `ReadMe` and `README` are | |
| 125 | +names of the same file; on Unix they are different files. | |
| 126 | + | |
| 127 | +Some example cases: | |
| 128 | + | |
| 129 | + * The glob `README` matches only a file named `README` in the root of | |
| 130 | + the tree. It does not match a file named `src/README` because it | |
| 131 | + does not include any characters that consumed the `src/` part. | |
| 132 | + * The glob `*/README` does match `src/README`. Unlike Unix file | |
| 133 | + globs, it also matches `src/library/README`. However it does not | |
| 134 | + match the file `README` in the root of the tree. | |
| 135 | + * The glob `src/README` does match the file named `src\README` on | |
| 136 | + Windows because all directory separators are rewritten as `/` in | |
| 137 | + the canonical name before the glob is matched. This makes it much | |
| 138 | + easier to write globs that work on both Unix and Windows. | |
| 139 | + * The glob `*.[ch]` matches every C source or header file in the | |
| 140 | + tree at the root or at any depth. Again, this is (deliberately) | |
| 141 | + different from Unix file globs and Windows wild cards. | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | +## Where Globs are Used | |
| 146 | + | |
| 147 | +### Settings that are Globs | |
| 148 | + | |
| 149 | +These settings are all lists of glob patterns: | |
| 95 | 150 | |
| 96 | 151 | * `binary-glob` |
| 97 | 152 | * `clean-glob` |
| 98 | 153 | * `crlf-glob` |
| 99 | 154 | * `crnl-glob` |
| 100 | 155 | * `encoding-glob` |
| 101 | 156 | * `ignore-glob` |
| 102 | 157 | * `keep-glob` |
| 103 | 158 | |
| 159 | +All may be [versioned, local, or global][settings]. Use `fossil | |
| 160 | +settings` to manage local and global settings, or a file in the | |
| 161 | +repository's `.fossil-settings/` folder at the root of the tree named | |
| 162 | +for each for versioned setting. | |
| 163 | + | |
| 164 | + [settings]: /doc/trunk/www/settings.wiki | |
| 165 | + | |
| 166 | +Using versioned settings for these not only has the advantage that | |
| 167 | +they are tracked in the repository just like the rest of your project, | |
| 168 | +but you can more easily keep longer lists of more complicated glob | |
| 169 | +patterns than would be practical in either local or global settings. | |
| 170 | + | |
| 171 | +The `ignore-glob` is an example of one setting that frequently grows | |
| 172 | +to be an elaborate list of files that should be ignored by most | |
| 173 | +commands. This is especially true when one (or more) IDEs are used in | |
| 174 | +a project because each IDE has its own ideas of how and where to cache | |
| 175 | +information that speeds up its browsing and building tasks but which | |
| 176 | +need not be preserved in your project's history. | |
| 177 | + | |
| 104 | 178 | |
| 105 | -### Commands that refer to GLOBs | |
| 179 | +### Commands that Refer to Globs | |
| 106 | 180 | |
| 107 | -Many of the commands that respect the settings containing GLOBs have | |
| 108 | -options to override some or all of the settings. | |
| 181 | +Many of the commands that respect the settings containing globs have | |
| 182 | +options to override some or all of the settings. These options are | |
| 183 | +usually named to correspond to the setting they override, such as | |
| 184 | +`--ignore` to override the `ignore-glob` setting. These commands are: | |
| 109 | 185 | |
| 110 | 186 | * `add` |
| 111 | 187 | * `addremove` |
| 112 | 188 | * `changes` |
| 113 | 189 | * `clean` |
| @@ -115,23 +191,24 @@ | ||
| 115 | 191 | * `merge` |
| 116 | 192 | * `settings` |
| 117 | 193 | * `status` |
| 118 | 194 | * `unset` |
| 119 | 195 | |
| 120 | -The commands `tarball` and `zip` produce compressed archives of a specific | |
| 121 | -checkin. They may be further restricted by options that specify GLOBs | |
| 122 | -that name files to include or exclude rather than taking the entire | |
| 123 | -checkin. | |
| 124 | - | |
| 125 | -The commands `http`, `cgi`, `server`, and `ui` that implement or support with web servers | |
| 126 | -provide a mechanism to name some files to serve with static content | |
| 127 | -where a list of GLOBs specifies what content may be served. | |
| 196 | +The commands `tarball` and `zip` produce compressed archives of a | |
| 197 | +specific checkin. They may be further restricted by options that | |
| 198 | +specify glob patterns that name files to include or exclude rather | |
| 199 | +than archiving the entire checkin. | |
| 200 | + | |
| 201 | +The commands `http`, `cgi`, `server`, and `ui` that implement or | |
| 202 | +support with web servers provide a mechanism to name some files to | |
| 203 | +serve with static content where a list of GLOBs specifies what content | |
| 204 | +may be served. | |
| 128 | 205 | |
| 129 | 206 | |
| 130 | 207 | ### Web pages that refer to GLOBs |
| 131 | 208 | |
| 132 | -The /timeline page supports a query parameter that names a GLOB of | |
| 209 | +The `/timeline` page supports a query parameter that names a GLOB of | |
| 133 | 210 | files to focus the timeline on. It also can use `GLOB`, `LIKE`, or |
| 134 | 211 | `REGEXP` matching on tag names, where each is implemented by the |
| 135 | 212 | corresponding operator in [SQLite][]. |
| 136 | 213 | |
| 137 | 214 | The pages `/tarball` and `/zip` generate compressed archives of a |
| @@ -203,15 +280,23 @@ | ||
| 203 | 280 | all the files. |
| 204 | 281 | |
| 205 | 282 | |
| 206 | 283 | ## Implementation |
| 207 | 284 | |
| 208 | -Most of the implementation of GLOB handling is found in | |
| 209 | -[`src/glob.c`][glob.c]. | |
| 285 | +Most of the implementation of glob pattern handling in fossil is found | |
| 286 | +in [`src/glob.c`][glob.c]. The canonical name of a file is implemented | |
| 287 | +in [`src/file.c`][file.c]. Each command that references a glob | |
| 288 | +constructs the target text from information specific to that command. | |
| 210 | 289 | |
| 211 | -The actual matching is implemented in SQL, so the documentation for | |
| 212 | -`GLOB` and the other string matching operators in [SQLite][] is | |
| 213 | -useful. | |
| 214 | 290 | |
| 215 | 291 | [glob.c]: https://www.fossil-scm.org/index.html/file/src/glob.c |
| 216 | -[SQLite]: https://sqlite.org/lang_expr.html#like | |
| 292 | +[file.c]: https://www.fossil-scm.org/index.html/file/src/file.c | |
| 293 | + | |
| 294 | +The actual matching is implemented in SQL, so the documentation for | |
| 295 | +`GLOB` and the other string matching operators in [SQLite] | |
| 296 | +(https://sqlite.org/lang_expr.html#like) is useful. Of course, the | |
| 297 | +SQLite source code and test harnesses also make entertaining reading: | |
| 217 | 298 | |
| 299 | + * `src/func.c` [lines 570-768] | |
| 300 | + (https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768) | |
| 301 | + * `test/expr.test` [lines 586-673] | |
| 302 | + (https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673) | |
| 218 | 303 |
| --- www/globs.md | |
| +++ www/globs.md | |
| @@ -1,113 +1,189 @@ | |
| 1 | File Name GLOB Patterns |
| 2 | ======================= |
| 3 | |
| 4 | A number of settings (and options to certain commands as well as query |
| 5 | parameters to certain pages) are documented as one or more GLOB |
| 6 | patterns that will match files either on the disk or in the active |
| 7 | checkout. |
| 8 | |
| 9 | A GLOB pattern is described as a pattern that matches file names, and |
| 10 | some of the individual commands show examples of simple GLOBs. The |
| 11 | examples show use of `*` as a wild card, and hint that more is |
| 12 | possible. |
| 13 | |
| 14 | In many cases more than one GLOB may be specified as a comma or |
| 15 | white space separated list of GLOB patterns. Several spots in the |
| 16 | command help mention that GLOB patterns may be quoted with single or |
| 17 | double quotes so that spaces and commas may be included in the pattern |
| 18 | if needed. |
| 19 | |
| 20 | Outside of this document, only the source code contains the exact |
| 21 | specification of the complete syntax of a GLOB pattern. |
| 22 | |
| 23 | ## Syntax |
| 24 | |
| 25 | any Any character not mentioned matches exactly that character |
| 26 | * Matches any sequence of zero or more characters. |
| 27 | ? Matches exactly one character. |
| 28 | [...] Matches one character from the enclosed list of characters. |
| 29 | [^...] Matches one character not in the enclosed list. |
| 30 | |
| 31 | Lists of characters have some additional features. |
| 32 | |
| 33 | * A range of characters may be specified with `-`, so `[a-d]` matches |
| 34 | exactly the same characters as `[abcd]`. |
| 35 | * Include `-` in a list by placing it last, just before the `]`. |
| 36 | * Include `]` in a list by making the first character after the `[` or |
| 37 | `[^`. At any other place, `]` ends the list. |
| 38 | * Include `^` in a list by placing anywhere except first after the |
| 39 | `[`. |
| 40 | |
| 41 | |
| 42 | Some examples: |
| 43 | |
| 44 | [a-d] Matches any one of `a`, `b`, `c`, or `d` |
| 45 | [a-] Matches either `a` or `-` |
| 46 | [][] Matches either `]` or `[` |
| 47 | [^]] Matches exactly one character other than `]` |
| 48 | []^] Matches either `]` or `^` |
| 49 | |
| 50 | The glob is compared to the canonical name of the file in the checkout |
| 51 | tree, and must match the entire name to be considered a match. |
| 52 | |
| 53 | Unlike typical Unix shell globs, wildcard sequences are allowed to |
| 54 | match `/` directory separators as well as the initial `.` in the name |
| 55 | of a hidden file or directory. |
| 56 | |
| 57 | A list of GLOBs is simply one or more GLOBs separated by whitespace or |
| 58 | commas. If a GLOB must contain a space or comma, it can be quoted with |
| 59 | either single or double quotation marks. |
| 60 | |
| 61 | Since a newline is considered to be whitespace, a list of GLOBs in a |
| 62 | file (as for a versioned setting) may have one GLOB per line. |
| 63 | |
| 64 | |
| 65 | ## File names to match |
| 66 | |
| 67 | Before comparing to a GLOB pattern, each file name is transformed to a |
| 68 | canonical form. Although the real process is more complicated, the |
| 69 | canonical name of a file has all directory separators changed to `/`, |
| 70 | and all `/./` and `/../` sequences removed. The goal is a name that is |
| 71 | the simplest possible while still specific to each particular file. |
| 72 | |
| 73 | This has some consequences. |
| 74 | |
| 75 | The simplest GLOB pattern is just a bare name of a file named with the |
| 76 | usual assortment of allowed file name characters. Such a pattern |
| 77 | matches that one file: the GLOB `README` matches only a file named |
| 78 | `README` in the root of the tree. The GLOB `*/README` would match a |
| 79 | file named `README` anywhere except the root, since the glob requires |
| 80 | that at least one `/` be in the name. (Recall that `/` matches the |
| 81 | directory separator regardless of whether it is `/` or `\` on your |
| 82 | system.) |
| 83 | |
| 84 | |
| 85 | |
| 86 | |
| 87 | ## Where are they used |
| 88 | |
| 89 | ### Settings that use GLOBs |
| 90 | |
| 91 | These settings are all lists of GLOBs. All may be global, local, or |
| 92 | versioned. Use `fossil settings` to manage global and local settings, |
| 93 | or file in the repository's `.fossil-settings/` folder named for each |
| 94 | for versioned setting. |
| 95 | |
| 96 | * `binary-glob` |
| 97 | * `clean-glob` |
| 98 | * `crlf-glob` |
| 99 | * `crnl-glob` |
| 100 | * `encoding-glob` |
| 101 | * `ignore-glob` |
| 102 | * `keep-glob` |
| 103 | |
| 104 | |
| 105 | ### Commands that refer to GLOBs |
| 106 | |
| 107 | Many of the commands that respect the settings containing GLOBs have |
| 108 | options to override some or all of the settings. |
| 109 | |
| 110 | * `add` |
| 111 | * `addremove` |
| 112 | * `changes` |
| 113 | * `clean` |
| @@ -115,23 +191,24 @@ | |
| 115 | * `merge` |
| 116 | * `settings` |
| 117 | * `status` |
| 118 | * `unset` |
| 119 | |
| 120 | The commands `tarball` and `zip` produce compressed archives of a specific |
| 121 | checkin. They may be further restricted by options that specify GLOBs |
| 122 | that name files to include or exclude rather than taking the entire |
| 123 | checkin. |
| 124 | |
| 125 | The commands `http`, `cgi`, `server`, and `ui` that implement or support with web servers |
| 126 | provide a mechanism to name some files to serve with static content |
| 127 | where a list of GLOBs specifies what content may be served. |
| 128 | |
| 129 | |
| 130 | ### Web pages that refer to GLOBs |
| 131 | |
| 132 | The /timeline page supports a query parameter that names a GLOB of |
| 133 | files to focus the timeline on. It also can use `GLOB`, `LIKE`, or |
| 134 | `REGEXP` matching on tag names, where each is implemented by the |
| 135 | corresponding operator in [SQLite][]. |
| 136 | |
| 137 | The pages `/tarball` and `/zip` generate compressed archives of a |
| @@ -203,15 +280,23 @@ | |
| 203 | all the files. |
| 204 | |
| 205 | |
| 206 | ## Implementation |
| 207 | |
| 208 | Most of the implementation of GLOB handling is found in |
| 209 | [`src/glob.c`][glob.c]. |
| 210 | |
| 211 | The actual matching is implemented in SQL, so the documentation for |
| 212 | `GLOB` and the other string matching operators in [SQLite][] is |
| 213 | useful. |
| 214 | |
| 215 | [glob.c]: https://www.fossil-scm.org/index.html/file/src/glob.c |
| 216 | [SQLite]: https://sqlite.org/lang_expr.html#like |
| 217 | |
| 218 |
| --- www/globs.md | |
| +++ www/globs.md | |
| @@ -1,113 +1,189 @@ | |
| 1 | File Name GLOB Patterns |
| 2 | ======================= |
| 3 | |
| 4 | A [glob pattern][glob] is a text expression that matches one or more |
| 5 | file names using wild cards familiar to most users of a command line. |
| 6 | For example, `*` is a glob that matches any name at all and |
| 7 | `Readme.txt` is a glob that matches exactly one file. Note that |
| 8 | although they are related, glob patterns are not the same thing as a |
| 9 | [regular expression or regexp][regexp]. |
| 10 | |
| 11 | [glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia) |
| 12 | [regexp]: https://en.wikipedia.org/wiki/Regular_expression |
| 13 | |
| 14 | |
| 15 | A number of fossil setting values hold one or more file glob patterns |
| 16 | that will match files either on the disk or in the active checkout. |
| 17 | Glob patterns are also accepted in options to certain commands as well |
| 18 | as query parameters to certain pages. |
| 19 | |
| 20 | In many cases more than one glob may be specified in a setting, |
| 21 | option, or query parameter by listing multiple globs separated by a |
| 22 | comma or white space. If a glob must contain commas or spaces, |
| 23 | surround it with single or double quotation marks. |
| 24 | |
| 25 | Of course, many fossil commands also accept lists of files to act on, |
| 26 | and those also may be specified with globs. Although those glob |
| 27 | patterns are similar to what is described here, they are not defined |
| 28 | by fossil, but rather by the conventions of the operating system in |
| 29 | use. |
| 30 | |
| 31 | |
| 32 | ## Syntax |
| 33 | |
| 34 | A list of glob patterns is simply one or more glob patterns separated |
| 35 | by white space or commas. If a glob must contain white spaces or |
| 36 | commas, it can be quoted with either single or double quotation marks. |
| 37 | A list is said to match if any one (or more) globs in the list |
| 38 | matches. |
| 39 | |
| 40 | A glob pattern is a collection of characters compared to a target |
| 41 | text, usually a file name. The whole glob is said to match if it |
| 42 | successfully consumes and matches the entire target text. Glob |
| 43 | patterns are made up of ordinary characters and special characters. |
| 44 | |
| 45 | Ordinary characters consume a single character of the target and must |
| 46 | match it exactly. |
| 47 | |
| 48 | Special characters (and special character sequences) consume zero or |
| 49 | more characters from the target and describe what matches. The special |
| 50 | characters (and sequences) are: |
| 51 | |
| 52 | * Matches any sequence of zero or more characters. |
| 53 | ? Matches exactly one character. |
| 54 | [...] Matches one character from the enclosed list of characters. |
| 55 | [^...] Matches one character not in the enclosed list. |
| 56 | |
| 57 | Special character sequences have some additional features: |
| 58 | |
| 59 | * A range of characters may be specified with `-`, so `[a-d]` matches |
| 60 | exactly the same characters as `[abcd]`. Ranges reflect Unicode |
| 61 | code points without any locale-specific collation sequence. |
| 62 | * Include `-` in a list by placing it last, just before the `]`. |
| 63 | * Include `]` in a list by making the first character after the `[` or |
| 64 | `[^`. At any other place, `]` ends the list. |
| 65 | * Include `^` in a list by placing anywhere except first after the |
| 66 | `[`. |
| 67 | * Some examples of character lists: |
| 68 | `[a-d]` Matches any one of `a`, `b`, `c`, or `d` but not `ä`; |
| 69 | `[^a-d]` Matches exactly one character other than `a`, `b`, `c`, |
| 70 | or `d`; |
| 71 | `[0-9a-fA-F]` Matches exactly one hexadecimal digit; |
| 72 | `[a-]` Matches either `a` or `-`; |
| 73 | `[][]` Matches either `]` or `[`; |
| 74 | `[^]]` Matches exactly one character other than `]`; |
| 75 | `[]^]` Matches either `]` or `^`; and |
| 76 | `[^-]` Matches exactly one character other than `-`. |
| 77 | * Beware that ranges in lists may include more than you expect: |
| 78 | `[A-z]` Matches `A` and `Z`, but also matches `a` and some less |
| 79 | obvious characters such as `[`, `\`, and `]` with code point |
| 80 | values between `Z` and `a`. |
| 81 | * Beware that a range must be specified from low value to high |
| 82 | value: `[z-a]` does not match any character at all, preventing the |
| 83 | entire glob from matching. |
| 84 | * Note that unlike typical Unix shell globs, wildcards (`*`, `?`, |
| 85 | and character lists) are allowed to match `/` directory |
| 86 | separators as well as the initial `.` in the name of a hidden |
| 87 | file or directory. |
| 88 | |
| 89 | |
| 90 | White space means the ASCII characters TAB, LF, VT, FF, CR, and SPACE. |
| 91 | Note that this does not include any of the many additional spacing |
| 92 | characters available in Unicode, and specifically does not include |
| 93 | U+00A0 NO-BREAK SPACE. |
| 94 | |
| 95 | Because both LF and CR are white space and leading and trailing spaces |
| 96 | are stripped from each glob in a list, a list of globs may be broken |
| 97 | into lines between globs when the list is stored in a file (as for a |
| 98 | versioned setting). |
| 99 | |
| 100 | Similarly 'single quotes' and "double quotes" are the ASCII straight |
| 101 | quote characters, not any of the other quotation marks provided in |
| 102 | Unicode and specifically not the "curly" quotes preferred by |
| 103 | typesetters and word processors. |
| 104 | |
| 105 | |
| 106 | ## File Names to Match |
| 107 | |
| 108 | Before it is compared to a glob pattern, each file name is transformed |
| 109 | to a canonical form. The glob must match the entire canonical file |
| 110 | name to be considered a match. |
| 111 | |
| 112 | The canonical name of a file has all directory separators changed to |
| 113 | `/`, redundant slashes are removed, all `.` path components are |
| 114 | removed, and all `..` path components are resolved. (There are |
| 115 | additional details we won’t go into here.) |
| 116 | |
| 117 | The goal is a name that is the simplest possible for each particular |
| 118 | file, and will be the same on Windows, Unix, and any other platform |
| 119 | where fossil is run. |
| 120 | |
| 121 | Beware, however, that all glob matching is case sensitive. This will |
| 122 | not be a surprise on Unix where all file names are also case |
| 123 | sensitive. However, most Windows file systems are case preserving and |
| 124 | case insensitive. On Windows, the names `ReadMe` and `README` are |
| 125 | names of the same file; on Unix they are different files. |
| 126 | |
| 127 | Some example cases: |
| 128 | |
| 129 | * The glob `README` matches only a file named `README` in the root of |
| 130 | the tree. It does not match a file named `src/README` because it |
| 131 | does not include any characters that consumed the `src/` part. |
| 132 | * The glob `*/README` does match `src/README`. Unlike Unix file |
| 133 | globs, it also matches `src/library/README`. However it does not |
| 134 | match the file `README` in the root of the tree. |
| 135 | * The glob `src/README` does match the file named `src\README` on |
| 136 | Windows because all directory separators are rewritten as `/` in |
| 137 | the canonical name before the glob is matched. This makes it much |
| 138 | easier to write globs that work on both Unix and Windows. |
| 139 | * The glob `*.[ch]` matches every C source or header file in the |
| 140 | tree at the root or at any depth. Again, this is (deliberately) |
| 141 | different from Unix file globs and Windows wild cards. |
| 142 | |
| 143 | |
| 144 | |
| 145 | ## Where Globs are Used |
| 146 | |
| 147 | ### Settings that are Globs |
| 148 | |
| 149 | These settings are all lists of glob patterns: |
| 150 | |
| 151 | * `binary-glob` |
| 152 | * `clean-glob` |
| 153 | * `crlf-glob` |
| 154 | * `crnl-glob` |
| 155 | * `encoding-glob` |
| 156 | * `ignore-glob` |
| 157 | * `keep-glob` |
| 158 | |
| 159 | All may be [versioned, local, or global][settings]. Use `fossil |
| 160 | settings` to manage local and global settings, or a file in the |
| 161 | repository's `.fossil-settings/` folder at the root of the tree named |
| 162 | for each for versioned setting. |
| 163 | |
| 164 | [settings]: /doc/trunk/www/settings.wiki |
| 165 | |
| 166 | Using versioned settings for these not only has the advantage that |
| 167 | they are tracked in the repository just like the rest of your project, |
| 168 | but you can more easily keep longer lists of more complicated glob |
| 169 | patterns than would be practical in either local or global settings. |
| 170 | |
| 171 | The `ignore-glob` is an example of one setting that frequently grows |
| 172 | to be an elaborate list of files that should be ignored by most |
| 173 | commands. This is especially true when one (or more) IDEs are used in |
| 174 | a project because each IDE has its own ideas of how and where to cache |
| 175 | information that speeds up its browsing and building tasks but which |
| 176 | need not be preserved in your project's history. |
| 177 | |
| 178 | |
| 179 | ### Commands that Refer to Globs |
| 180 | |
| 181 | Many of the commands that respect the settings containing globs have |
| 182 | options to override some or all of the settings. These options are |
| 183 | usually named to correspond to the setting they override, such as |
| 184 | `--ignore` to override the `ignore-glob` setting. These commands are: |
| 185 | |
| 186 | * `add` |
| 187 | * `addremove` |
| 188 | * `changes` |
| 189 | * `clean` |
| @@ -115,23 +191,24 @@ | |
| 191 | * `merge` |
| 192 | * `settings` |
| 193 | * `status` |
| 194 | * `unset` |
| 195 | |
| 196 | The commands `tarball` and `zip` produce compressed archives of a |
| 197 | specific checkin. They may be further restricted by options that |
| 198 | specify glob patterns that name files to include or exclude rather |
| 199 | than archiving the entire checkin. |
| 200 | |
| 201 | The commands `http`, `cgi`, `server`, and `ui` that implement or |
| 202 | support with web servers provide a mechanism to name some files to |
| 203 | serve with static content where a list of GLOBs specifies what content |
| 204 | may be served. |
| 205 | |
| 206 | |
| 207 | ### Web pages that refer to GLOBs |
| 208 | |
| 209 | The `/timeline` page supports a query parameter that names a GLOB of |
| 210 | files to focus the timeline on. It also can use `GLOB`, `LIKE`, or |
| 211 | `REGEXP` matching on tag names, where each is implemented by the |
| 212 | corresponding operator in [SQLite][]. |
| 213 | |
| 214 | The pages `/tarball` and `/zip` generate compressed archives of a |
| @@ -203,15 +280,23 @@ | |
| 280 | all the files. |
| 281 | |
| 282 | |
| 283 | ## Implementation |
| 284 | |
| 285 | Most of the implementation of glob pattern handling in fossil is found |
| 286 | in [`src/glob.c`][glob.c]. The canonical name of a file is implemented |
| 287 | in [`src/file.c`][file.c]. Each command that references a glob |
| 288 | constructs the target text from information specific to that command. |
| 289 | |
| 290 | |
| 291 | [glob.c]: https://www.fossil-scm.org/index.html/file/src/glob.c |
| 292 | [file.c]: https://www.fossil-scm.org/index.html/file/src/file.c |
| 293 | |
| 294 | The actual matching is implemented in SQL, so the documentation for |
| 295 | `GLOB` and the other string matching operators in [SQLite] |
| 296 | (https://sqlite.org/lang_expr.html#like) is useful. Of course, the |
| 297 | SQLite source code and test harnesses also make entertaining reading: |
| 298 | |
| 299 | * `src/func.c` [lines 570-768] |
| 300 | (https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768) |
| 301 | * `test/expr.test` [lines 586-673] |
| 302 | (https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673) |
| 303 |