Fossil SCM
Assorted improvements to www/globs.md, mainly to clarity and grammar.
Commit
7898593d9de25804fc1c737148ab0f96b646cf861d24fcdd8afdf5ed57a48670
Parent
77be1777e193381…
1 file changed
+119
-102
+119
-102
| --- www/globs.md | ||
| +++ www/globs.md | ||
| @@ -4,82 +4,92 @@ | ||
| 4 | 4 | A [glob pattern][glob] is a text expression that matches one or more |
| 5 | 5 | file names using wild cards familiar to most users of a command line. |
| 6 | 6 | For example, `*` is a glob that matches any name at all and |
| 7 | 7 | `Readme.txt` is a glob that matches exactly one file. |
| 8 | 8 | |
| 9 | -Note that although both are notations for describing patterns in text, | |
| 10 | -glob patterns are not the same thing as a [regular expression or | |
| 11 | -regexp][regexp]. | |
| 9 | +A glob should not be confused with a [regular expression][regexp] (RE), | |
| 10 | +even though they use some of the same special characters for similar | |
| 11 | +purposes, because [they are not fully compatible][greinc] pattern | |
| 12 | +matching languages. Fossil uses globs when matching file names with the | |
| 13 | +settings described in this document, not REs. | |
| 12 | 14 | |
| 13 | -[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia) | |
| 15 | +[glob]: https://en.wikipedia.org/wiki/Glob_(programming) | |
| 16 | +[greinc]: https://unix.stackexchange.com/a/57958/138 | |
| 14 | 17 | [regexp]: https://en.wikipedia.org/wiki/Regular_expression |
| 15 | 18 | |
| 16 | - | |
| 17 | -A number of fossil setting values hold one or more file glob patterns | |
| 18 | -that will identify files needing special treatment. Glob patterns are | |
| 19 | -also accepted in options to certain commands as well as query | |
| 20 | -parameters to certain pages. | |
| 21 | - | |
| 22 | -In many cases more than one glob may be specified in a setting, | |
| 23 | -option, or query parameter by listing multiple globs separated by a | |
| 24 | -comma or white space. | |
| 25 | - | |
| 26 | -Of course, many fossil commands also accept lists of files to act on, | |
| 27 | -and those also may be specified with globs. Although those glob | |
| 28 | -patterns are similar to what is described here, they are not defined | |
| 29 | -by fossil, but rather by the conventions of the operating system in | |
| 30 | -use. | |
| 19 | +These settings hold one or more file glob patterns to cause Fossil to | |
| 20 | +give matching named files special treatment. Glob patterns are also | |
| 21 | +accepted in options to certain commands and as query parameters to | |
| 22 | +certain Fossil UI web pages. | |
| 23 | + | |
| 24 | +Where Fossil also accepts globs in commands, this handling may interact | |
| 25 | +with your OS’s command shell or its C runtime system, because they may | |
| 26 | +have their own glob pattern handling. We will detail such interactions | |
| 27 | +below. | |
| 31 | 28 | |
| 32 | 29 | |
| 33 | 30 | ## Syntax |
| 34 | 31 | |
| 35 | -A list of glob patterns is simply one or more glob patterns separated | |
| 32 | +Where Fossil accepts glob patterns, it will usually accept a *list* of | |
| 33 | +such patterns, each individual pattern separated from the others | |
| 36 | 34 | by white space or commas. If a glob must contain white spaces or |
| 37 | 35 | commas, it can be quoted with either single or double quotation marks. |
| 38 | -A list is said to match if any one (or more) globs in the list | |
| 36 | +A list is said to match if any one glob in the list | |
| 39 | 37 | matches. |
| 40 | 38 | |
| 41 | -A glob pattern is a collection of characters compared to a target | |
| 42 | -text, usually a file name. The whole glob is said to match if it | |
| 43 | -successfully consumes and matches the entire target text. Glob | |
| 44 | -patterns are made up of ordinary characters and special characters. | |
| 45 | - | |
| 46 | -Ordinary characters consume a single character of the target and must | |
| 47 | -match it exactly. | |
| 48 | - | |
| 49 | -Special characters (and special character sequences) consume zero or | |
| 50 | -more characters from the target and describe what matches. The special | |
| 51 | -characters (and sequences) are: | |
| 39 | +A glob pattern matches a given file name if it successfully consumes and | |
| 40 | +matches the *entire* name. Partial matches are failed matches. | |
| 41 | + | |
| 42 | +Most characters in a glob pattern consume a single character of the file | |
| 43 | +name and must match it exactly. For instance, “a” in a glob simply | |
| 44 | +matches the letter “a” in the file name unless it is inside a special | |
| 45 | +character sequence. | |
| 46 | + | |
| 47 | +Other characters have special meaning, and they may include otherwise | |
| 48 | +normal characters to give them special meaning: | |
| 52 | 49 | |
| 53 | 50 | :Pattern |:Effect |
| 54 | 51 | --------------------------------------------------------------------- |
| 55 | 52 | `*` | Matches any sequence of zero or more characters |
| 56 | 53 | `?` | Matches exactly one character |
| 57 | 54 | `[...]` | Matches one character from the enclosed list of characters |
| 58 | -`[^...]` | Matches one character not in the enclosed list | |
| 55 | +`[^...]` | Matches one character *not* in the enclosed list | |
| 59 | 56 | |
| 60 | -Special character sequences have some additional features: | |
| 57 | +Note that unlike [POSIX globs][pg], these special characters and | |
| 58 | +sequences are allowed to match `/` directory separators as well as the | |
| 59 | +initial `.` in the name of a hidden file or directory. This is because | |
| 60 | +Fossil file names are stored as complete path names. The distinction | |
| 61 | +between file name and directory name is “below” Fossil in this sense. | |
| 61 | 62 | |
| 62 | - * A range of characters may be specified with `-`, so `[a-d]` matches | |
| 63 | - exactly the same characters as `[abcd]`. Ranges reflect Unicode | |
| 63 | +[pg]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 | |
| 64 | + | |
| 65 | +The bracket expresssions above require some additional explanation: | |
| 66 | + | |
| 67 | + * A range of characters may be specified with `-`, so `[a-f]` matches | |
| 68 | + exactly the same characters as `[abcdef]`. Ranges reflect Unicode | |
| 64 | 69 | code points without any locale-specific collation sequence. |
| 65 | - * Include `-` in a list by placing it last, just before the `]`. | |
| 66 | - * Include `]` in a list by making the first character after the `[` or | |
| 67 | - `[^`. At any other place, `]` ends the list. | |
| 68 | - * Include `^` in a list by placing anywhere except first after the | |
| 69 | - `[`. | |
| 70 | - * Beware that ranges in lists may include more than you expect: | |
| 71 | - `[A-z]` Matches `A` and `Z`, but also matches `a` and some less | |
| 72 | - obvious characters such as `[`, `\`, and `]` with code point | |
| 73 | - values between `Z` and `a`. | |
| 70 | + Therefore, this particular sequence never matches the Unicode | |
| 71 | + pre-composed character `é`, for example. (U+00E9) | |
| 72 | + | |
| 73 | + * This dependence on character/code point ordering may have other | |
| 74 | + effects to surprise you. For example, the glob `[A-z]` not only | |
| 75 | + matches upper and lowercase ASCII letters, it also matches several | |
| 76 | + punctuation characters placed between `Z` and `a` in both ASCII and | |
| 77 | + Unicode: `[`, `\`, `]`, `^`, `_`, and <tt>\`</tt>. | |
| 78 | + | |
| 79 | + * You may include a literal `-` in a list by placing it last, just | |
| 80 | + before the `]`. | |
| 81 | + | |
| 82 | + * You may include a literal `]` in a list by making the first | |
| 83 | + character after the `[` or `[^`. At any other place, `]` ends the list. | |
| 84 | + | |
| 85 | + * You may include a literal `^` in a list by placing it anywhere | |
| 86 | + except after the opening `[`. | |
| 87 | + | |
| 74 | 88 | * Beware that a range must be specified from low value to high |
| 75 | 89 | value: `[z-a]` does not match any character at all, preventing the |
| 76 | 90 | entire glob from matching. |
| 77 | - * Note that unlike typical Unix shell globs, wildcards (`*`, `?`, | |
| 78 | - and character lists) are allowed to match `/` directory | |
| 79 | - separators as well as the initial `.` in the name of a hidden | |
| 80 | - file or directory. | |
| 81 | 91 | |
| 82 | 92 | Some examples of character lists: |
| 83 | 93 | |
| 84 | 94 | :Pattern |:Effect |
| 85 | 95 | --------------------------------------------------------------------- |
| @@ -92,45 +102,56 @@ | ||
| 92 | 102 | `[]^]` | Matches either `]` or `^` |
| 93 | 103 | `[^-]` | Matches exactly one character other than `-` |
| 94 | 104 | |
| 95 | 105 | White space means the specific ASCII characters TAB, LF, VT, FF, CR, |
| 96 | 106 | and SPACE. Note that this does not include any of the many additional |
| 97 | -spacing characters available in Unicode, and specifically does not | |
| 98 | -include U+00A0 NO-BREAK SPACE. | |
| 107 | +spacing characters available in Unicode such as | |
| 108 | +U+00A0, NO-BREAK SPACE. | |
| 99 | 109 | |
| 100 | 110 | Because both LF and CR are white space and leading and trailing spaces |
| 101 | 111 | are stripped from each glob in a list, a list of globs may be broken |
| 102 | -into lines between globs when the list is stored in a file (as for a | |
| 103 | -versioned setting). | |
| 112 | +into lines between globs when the list is stored in a file, as for a | |
| 113 | +versioned setting. | |
| 104 | 114 | |
| 105 | -Similarly 'single quotes' and "double quotes" are the ASCII straight | |
| 115 | +Note that 'single quotes' and "double quotes" are the ASCII straight | |
| 106 | 116 | quote characters, not any of the other quotation marks provided in |
| 107 | 117 | Unicode and specifically not the "curly" quotes preferred by |
| 108 | 118 | typesetters and word processors. |
| 109 | 119 | |
| 110 | 120 | |
| 111 | 121 | ## File Names to Match |
| 112 | 122 | |
| 113 | 123 | Before it is compared to a glob pattern, each file name is transformed |
| 114 | -to a canonical form. The glob must match the entire canonical file | |
| 115 | -name to be considered a match. | |
| 116 | - | |
| 117 | -The canonical name of a file has all directory separators changed to | |
| 118 | -`/`, redundant slashes are removed, all `.` path components are | |
| 119 | -removed, and all `..` path components are resolved. (There are | |
| 120 | -additional details we are ignoring here, but they cover rare edge | |
| 121 | -cases and also follow the principle of least surprise.) | |
| 124 | +to a canonical form: | |
| 125 | + | |
| 126 | + * all directory separators are changed to `/` | |
| 127 | + * redundant slashes are removed | |
| 128 | + * all `.` path components are removed | |
| 129 | + * all `..` path components are resolved | |
| 130 | + | |
| 131 | +(There are additional details we are ignoring here, but they cover rare | |
| 132 | +edge cases and follow the principle of least surprise.) | |
| 133 | + | |
| 134 | +The glob must match the *entire* canonical file name to be considered a | |
| 135 | +match. | |
| 122 | 136 | |
| 123 | 137 | The goal is to have a name that is the simplest possible for each |
| 124 | -particular file, and that will be the same on Windows, Unix, and any | |
| 125 | -other platform where fossil is run. | |
| 138 | +particular file, and that will be the same regardless of the platform | |
| 139 | +you run Fossil on. This is important when you have a repository cloned | |
| 140 | +from multiple platforms and have globs in versioned settings: you want | |
| 141 | +those settings to be interpreted the same way everywhere. | |
| 126 | 142 | |
| 127 | -Beware, however, that all glob matching is case sensitive. This will | |
| 128 | -not be a surprise on Unix where all file names are also case | |
| 129 | -sensitive. However, most Windows file systems are case preserving and | |
| 143 | +Beware, however, that all glob matching in Fossil is case sensitive | |
| 144 | +regardless of host platform and file system. This will not be a surprise | |
| 145 | +on POSIX platforms where file names are usually treated case | |
| 146 | +sensitively. However, most Windows file systems are case preserving but | |
| 130 | 147 | case insensitive. That is, on Windows, the names `ReadMe` and `README` |
| 131 | -are names of the same file; on Unix they are different files. | |
| 148 | +are usually names of the same file. The same is true in other cases, | |
| 149 | +such as by default on macOS file systems and in the file system drivers | |
| 150 | +for Windows file systems running on non-Windows systems. (e.g. exfat on | |
| 151 | +Linux.) Therefore, write your Fossil glob patterns to match the name of | |
| 152 | +the file as checked into the repository. | |
| 132 | 153 | |
| 133 | 154 | Some example cases: |
| 134 | 155 | |
| 135 | 156 | :Pattern |:Effect |
| 136 | 157 | -------------------------------------------------------------------------------- |
| @@ -478,14 +499,14 @@ | ||
| 478 | 499 | |
| 479 | 500 | |
| 480 | 501 | ## Converting `.gitignore` to `ignore-glob` |
| 481 | 502 | |
| 482 | 503 | Many other version control systems handle the specific case of |
| 483 | -ignoring certain files differently from fossil: they have you create | |
| 504 | +ignoring certain files differently from Fossil: they have you create | |
| 484 | 505 | individual "ignore" files in each folder, which specify things ignored |
| 485 | 506 | in that folder and below. Usually some form of glob patterns are used |
| 486 | -in those files, but the details differ from fossil. | |
| 507 | +in those files, but the details differ from Fossil. | |
| 487 | 508 | |
| 488 | 509 | In many simple cases, you can just store a top level "ignore" file in |
| 489 | 510 | `.fossil-settings/ignore-glob`. But as usual, there will be lots of |
| 490 | 511 | edge cases. |
| 491 | 512 | |
| @@ -495,33 +516,33 @@ | ||
| 495 | 516 | version controlled files. Some of the files used have no set name, but |
| 496 | 517 | are called out in configuration files. |
| 497 | 518 | |
| 498 | 519 | [gitignore]: https://git-scm.com/docs/gitignore |
| 499 | 520 | |
| 500 | -In contrast, fossil has a global setting and a local setting, but the local setting | |
| 501 | -overrides the global rather than extending it. Similarly, a fossil | |
| 521 | +In contrast, Fossil has a global setting and a local setting, but the local setting | |
| 522 | +overrides the global rather than extending it. Similarly, a Fossil | |
| 502 | 523 | command's `--ignore` option replaces the `ignore-glob` setting rather |
| 503 | 524 | than extending it. |
| 504 | 525 | |
| 505 | 526 | With that in mind, translating a `.gitignore` file into |
| 506 | 527 | `.fossil-settings/ignore-glob` may be possible in many cases. Here are |
| 507 | 528 | some of features of `.gitignore` and comments on how they relate to |
| 508 | -fossil: | |
| 529 | +Fossil: | |
| 509 | 530 | |
| 510 | - * "A blank line matches no files..." is the same in fossil. | |
| 511 | - * "A line starting with # serves as a comment...." not in fossil. | |
| 531 | + * "A blank line matches no files...": same in Fossil. | |
| 532 | + * "A line starting with # serves as a comment....": not in Fossil. | |
| 512 | 533 | * "Trailing spaces are ignored unless they are quoted..." is similar |
| 513 | - in fossil. All whitespace before and after a glob is trimmed in | |
| 514 | - fossil unless quoted with single or double quotes. Git uses | |
| 515 | - backslash quoting instead, which fossil does not. | |
| 516 | - * "An optional prefix "!" which negates the pattern..." not in | |
| 517 | - fossil. | |
| 518 | - * Git's globs are relative to the location of the `.gitignore` file; | |
| 519 | - fossil's globs are relative to the root of the workspace. | |
| 520 | - * Git's globs and fossil's globs treat directory separators | |
| 534 | + in Fossil. All whitespace before and after a glob is trimmed in | |
| 535 | + Fossil unless quoted with single or double quotes. Git uses | |
| 536 | + backslash quoting instead, which Fossil does not. | |
| 537 | + * "An optional prefix "!" which negates the pattern...": not in | |
| 538 | + Fossil. | |
| 539 | + * Git's globs are relative to the location of the `.gitignore` file: | |
| 540 | + Fossil's globs are relative to the root of the workspace. | |
| 541 | + * Git's globs and Fossil's globs treat directory separators | |
| 521 | 542 | differently. Git includes a notation for zero or more directories |
| 522 | - that is not needed in fossil. | |
| 543 | + that is not needed in Fossil. | |
| 523 | 544 | |
| 524 | 545 | ### Example |
| 525 | 546 | |
| 526 | 547 | In a project with source and documentation: |
| 527 | 548 | |
| @@ -550,30 +571,26 @@ | ||
| 550 | 571 | |
| 551 | 572 | |
| 552 | 573 | |
| 553 | 574 | ## Implementation and References |
| 554 | 575 | |
| 555 | -Most of the implementation of glob pattern handling in fossil is found | |
| 556 | -`glob.c`, `file.c`, and each individual command and web page that uses | |
| 557 | -a glob pattern. Find commands and pages in the fossil sources by | |
| 558 | -looking for comments like `COMMAND: add` or `WEBPAGE: timeline` in | |
| 559 | -front of the function that implements the command or page in files | |
| 560 | -`src/*.c`. (Fossil's build system creates the tables used to dispatch | |
| 561 | -commands at build time by searching the sources for those comments.) A | |
| 562 | -few starting points: | |
| 576 | +The implementation of the Fossil-specific glob pattern handling is here: | |
| 563 | 577 | |
| 564 | 578 | :File |:Description |
| 565 | 579 | -------------------------------------------------------------------------------- |
| 566 | -[`src/glob.c`][] | Implementation of glob pattern list loading, parsing, and matching. | |
| 567 | -[`src/file.c`][] | Implementation of various kinds of canonical names of a file. | |
| 580 | +[`src/glob.c`][] | pattern list loading, parsing, and generic matching code | |
| 581 | +[`src/file.c`][] | application of glob patterns to file names | |
| 568 | 582 | |
| 569 | 583 | [`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c |
| 570 | 584 | [`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c |
| 571 | 585 | |
| 572 | -The actual pattern matching is implemented in SQL, so the | |
| 573 | -documentation for `GLOB` and the other string matching operators in | |
| 574 | -[SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of | |
| 575 | -course, the SQLite [source code] | |
| 576 | -(https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768) | |
| 577 | -and [test harnesses] | |
| 578 | -(https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673) | |
| 579 | -also make entertaining reading. | |
| 586 | +See the [Adding Features to Fossil][aff] document for broader details | |
| 587 | +about finding and working with such code. | |
| 588 | + | |
| 589 | +The actual pattern matching leverages the `GLOB` operator in SQLite, so | |
| 590 | +you may find [its documentation][gdoc], [source code][gsrc] and [test | |
| 591 | +harness][gtst] helpful. | |
| 592 | + | |
| 593 | +[aff]: ./adding_code.wiki | |
| 594 | +[gdoc]: https://sqlite.org/lang_expr.html#like | |
| 595 | +[gsrc]: https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768 | |
| 596 | +[gtst]: https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673 | |
| 580 | 597 |
| --- www/globs.md | |
| +++ www/globs.md | |
| @@ -4,82 +4,92 @@ | |
| 4 | A [glob pattern][glob] is a text expression that matches one or more |
| 5 | file names using wild cards familiar to most users of a command line. |
| 6 | For example, `*` is a glob that matches any name at all and |
| 7 | `Readme.txt` is a glob that matches exactly one file. |
| 8 | |
| 9 | Note that although both are notations for describing patterns in text, |
| 10 | glob patterns are not the same thing as a [regular expression or |
| 11 | regexp][regexp]. |
| 12 | |
| 13 | [glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia) |
| 14 | [regexp]: https://en.wikipedia.org/wiki/Regular_expression |
| 15 | |
| 16 | |
| 17 | A number of fossil setting values hold one or more file glob patterns |
| 18 | that will identify files needing special treatment. Glob patterns are |
| 19 | also accepted in options to certain commands as well as query |
| 20 | parameters to certain pages. |
| 21 | |
| 22 | In many cases more than one glob may be specified in a setting, |
| 23 | option, or query parameter by listing multiple globs separated by a |
| 24 | comma or white space. |
| 25 | |
| 26 | Of course, many fossil commands also accept lists of files to act on, |
| 27 | and those also may be specified with globs. Although those glob |
| 28 | patterns are similar to what is described here, they are not defined |
| 29 | by fossil, but rather by the conventions of the operating system in |
| 30 | use. |
| 31 | |
| 32 | |
| 33 | ## Syntax |
| 34 | |
| 35 | A list of glob patterns is simply one or more glob patterns separated |
| 36 | by white space or commas. If a glob must contain white spaces or |
| 37 | commas, it can be quoted with either single or double quotation marks. |
| 38 | A list is said to match if any one (or more) globs in the list |
| 39 | matches. |
| 40 | |
| 41 | A glob pattern is a collection of characters compared to a target |
| 42 | text, usually a file name. The whole glob is said to match if it |
| 43 | successfully consumes and matches the entire target text. Glob |
| 44 | patterns are made up of ordinary characters and special characters. |
| 45 | |
| 46 | Ordinary characters consume a single character of the target and must |
| 47 | match it exactly. |
| 48 | |
| 49 | Special characters (and special character sequences) consume zero or |
| 50 | more characters from the target and describe what matches. The special |
| 51 | characters (and sequences) are: |
| 52 | |
| 53 | :Pattern |:Effect |
| 54 | --------------------------------------------------------------------- |
| 55 | `*` | Matches any sequence of zero or more characters |
| 56 | `?` | Matches exactly one character |
| 57 | `[...]` | Matches one character from the enclosed list of characters |
| 58 | `[^...]` | Matches one character not in the enclosed list |
| 59 | |
| 60 | Special character sequences have some additional features: |
| 61 | |
| 62 | * A range of characters may be specified with `-`, so `[a-d]` matches |
| 63 | exactly the same characters as `[abcd]`. Ranges reflect Unicode |
| 64 | code points without any locale-specific collation sequence. |
| 65 | * Include `-` in a list by placing it last, just before the `]`. |
| 66 | * Include `]` in a list by making the first character after the `[` or |
| 67 | `[^`. At any other place, `]` ends the list. |
| 68 | * Include `^` in a list by placing anywhere except first after the |
| 69 | `[`. |
| 70 | * Beware that ranges in lists may include more than you expect: |
| 71 | `[A-z]` Matches `A` and `Z`, but also matches `a` and some less |
| 72 | obvious characters such as `[`, `\`, and `]` with code point |
| 73 | values between `Z` and `a`. |
| 74 | * Beware that a range must be specified from low value to high |
| 75 | value: `[z-a]` does not match any character at all, preventing the |
| 76 | entire glob from matching. |
| 77 | * Note that unlike typical Unix shell globs, wildcards (`*`, `?`, |
| 78 | and character lists) are allowed to match `/` directory |
| 79 | separators as well as the initial `.` in the name of a hidden |
| 80 | file or directory. |
| 81 | |
| 82 | Some examples of character lists: |
| 83 | |
| 84 | :Pattern |:Effect |
| 85 | --------------------------------------------------------------------- |
| @@ -92,45 +102,56 @@ | |
| 92 | `[]^]` | Matches either `]` or `^` |
| 93 | `[^-]` | Matches exactly one character other than `-` |
| 94 | |
| 95 | White space means the specific ASCII characters TAB, LF, VT, FF, CR, |
| 96 | and SPACE. Note that this does not include any of the many additional |
| 97 | spacing characters available in Unicode, and specifically does not |
| 98 | include U+00A0 NO-BREAK SPACE. |
| 99 | |
| 100 | Because both LF and CR are white space and leading and trailing spaces |
| 101 | are stripped from each glob in a list, a list of globs may be broken |
| 102 | into lines between globs when the list is stored in a file (as for a |
| 103 | versioned setting). |
| 104 | |
| 105 | Similarly 'single quotes' and "double quotes" are the ASCII straight |
| 106 | quote characters, not any of the other quotation marks provided in |
| 107 | Unicode and specifically not the "curly" quotes preferred by |
| 108 | typesetters and word processors. |
| 109 | |
| 110 | |
| 111 | ## File Names to Match |
| 112 | |
| 113 | Before it is compared to a glob pattern, each file name is transformed |
| 114 | to a canonical form. The glob must match the entire canonical file |
| 115 | name to be considered a match. |
| 116 | |
| 117 | The canonical name of a file has all directory separators changed to |
| 118 | `/`, redundant slashes are removed, all `.` path components are |
| 119 | removed, and all `..` path components are resolved. (There are |
| 120 | additional details we are ignoring here, but they cover rare edge |
| 121 | cases and also follow the principle of least surprise.) |
| 122 | |
| 123 | The goal is to have a name that is the simplest possible for each |
| 124 | particular file, and that will be the same on Windows, Unix, and any |
| 125 | other platform where fossil is run. |
| 126 | |
| 127 | Beware, however, that all glob matching is case sensitive. This will |
| 128 | not be a surprise on Unix where all file names are also case |
| 129 | sensitive. However, most Windows file systems are case preserving and |
| 130 | case insensitive. That is, on Windows, the names `ReadMe` and `README` |
| 131 | are names of the same file; on Unix they are different files. |
| 132 | |
| 133 | Some example cases: |
| 134 | |
| 135 | :Pattern |:Effect |
| 136 | -------------------------------------------------------------------------------- |
| @@ -478,14 +499,14 @@ | |
| 478 | |
| 479 | |
| 480 | ## Converting `.gitignore` to `ignore-glob` |
| 481 | |
| 482 | Many other version control systems handle the specific case of |
| 483 | ignoring certain files differently from fossil: they have you create |
| 484 | individual "ignore" files in each folder, which specify things ignored |
| 485 | in that folder and below. Usually some form of glob patterns are used |
| 486 | in those files, but the details differ from fossil. |
| 487 | |
| 488 | In many simple cases, you can just store a top level "ignore" file in |
| 489 | `.fossil-settings/ignore-glob`. But as usual, there will be lots of |
| 490 | edge cases. |
| 491 | |
| @@ -495,33 +516,33 @@ | |
| 495 | version controlled files. Some of the files used have no set name, but |
| 496 | are called out in configuration files. |
| 497 | |
| 498 | [gitignore]: https://git-scm.com/docs/gitignore |
| 499 | |
| 500 | In contrast, fossil has a global setting and a local setting, but the local setting |
| 501 | overrides the global rather than extending it. Similarly, a fossil |
| 502 | command's `--ignore` option replaces the `ignore-glob` setting rather |
| 503 | than extending it. |
| 504 | |
| 505 | With that in mind, translating a `.gitignore` file into |
| 506 | `.fossil-settings/ignore-glob` may be possible in many cases. Here are |
| 507 | some of features of `.gitignore` and comments on how they relate to |
| 508 | fossil: |
| 509 | |
| 510 | * "A blank line matches no files..." is the same in fossil. |
| 511 | * "A line starting with # serves as a comment...." not in fossil. |
| 512 | * "Trailing spaces are ignored unless they are quoted..." is similar |
| 513 | in fossil. All whitespace before and after a glob is trimmed in |
| 514 | fossil unless quoted with single or double quotes. Git uses |
| 515 | backslash quoting instead, which fossil does not. |
| 516 | * "An optional prefix "!" which negates the pattern..." not in |
| 517 | fossil. |
| 518 | * Git's globs are relative to the location of the `.gitignore` file; |
| 519 | fossil's globs are relative to the root of the workspace. |
| 520 | * Git's globs and fossil's globs treat directory separators |
| 521 | differently. Git includes a notation for zero or more directories |
| 522 | that is not needed in fossil. |
| 523 | |
| 524 | ### Example |
| 525 | |
| 526 | In a project with source and documentation: |
| 527 | |
| @@ -550,30 +571,26 @@ | |
| 550 | |
| 551 | |
| 552 | |
| 553 | ## Implementation and References |
| 554 | |
| 555 | Most of the implementation of glob pattern handling in fossil is found |
| 556 | `glob.c`, `file.c`, and each individual command and web page that uses |
| 557 | a glob pattern. Find commands and pages in the fossil sources by |
| 558 | looking for comments like `COMMAND: add` or `WEBPAGE: timeline` in |
| 559 | front of the function that implements the command or page in files |
| 560 | `src/*.c`. (Fossil's build system creates the tables used to dispatch |
| 561 | commands at build time by searching the sources for those comments.) A |
| 562 | few starting points: |
| 563 | |
| 564 | :File |:Description |
| 565 | -------------------------------------------------------------------------------- |
| 566 | [`src/glob.c`][] | Implementation of glob pattern list loading, parsing, and matching. |
| 567 | [`src/file.c`][] | Implementation of various kinds of canonical names of a file. |
| 568 | |
| 569 | [`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c |
| 570 | [`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c |
| 571 | |
| 572 | The actual pattern matching is implemented in SQL, so the |
| 573 | documentation for `GLOB` and the other string matching operators in |
| 574 | [SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of |
| 575 | course, the SQLite [source code] |
| 576 | (https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768) |
| 577 | and [test harnesses] |
| 578 | (https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673) |
| 579 | also make entertaining reading. |
| 580 |
| --- www/globs.md | |
| +++ www/globs.md | |
| @@ -4,82 +4,92 @@ | |
| 4 | A [glob pattern][glob] is a text expression that matches one or more |
| 5 | file names using wild cards familiar to most users of a command line. |
| 6 | For example, `*` is a glob that matches any name at all and |
| 7 | `Readme.txt` is a glob that matches exactly one file. |
| 8 | |
| 9 | A glob should not be confused with a [regular expression][regexp] (RE), |
| 10 | even though they use some of the same special characters for similar |
| 11 | purposes, because [they are not fully compatible][greinc] pattern |
| 12 | matching languages. Fossil uses globs when matching file names with the |
| 13 | settings described in this document, not REs. |
| 14 | |
| 15 | [glob]: https://en.wikipedia.org/wiki/Glob_(programming) |
| 16 | [greinc]: https://unix.stackexchange.com/a/57958/138 |
| 17 | [regexp]: https://en.wikipedia.org/wiki/Regular_expression |
| 18 | |
| 19 | These settings hold one or more file glob patterns to cause Fossil to |
| 20 | give matching named files special treatment. Glob patterns are also |
| 21 | accepted in options to certain commands and as query parameters to |
| 22 | certain Fossil UI web pages. |
| 23 | |
| 24 | Where Fossil also accepts globs in commands, this handling may interact |
| 25 | with your OS’s command shell or its C runtime system, because they may |
| 26 | have their own glob pattern handling. We will detail such interactions |
| 27 | below. |
| 28 | |
| 29 | |
| 30 | ## Syntax |
| 31 | |
| 32 | Where Fossil accepts glob patterns, it will usually accept a *list* of |
| 33 | such patterns, each individual pattern separated from the others |
| 34 | by white space or commas. If a glob must contain white spaces or |
| 35 | commas, it can be quoted with either single or double quotation marks. |
| 36 | A list is said to match if any one glob in the list |
| 37 | matches. |
| 38 | |
| 39 | A glob pattern matches a given file name if it successfully consumes and |
| 40 | matches the *entire* name. Partial matches are failed matches. |
| 41 | |
| 42 | Most characters in a glob pattern consume a single character of the file |
| 43 | name and must match it exactly. For instance, “a” in a glob simply |
| 44 | matches the letter “a” in the file name unless it is inside a special |
| 45 | character sequence. |
| 46 | |
| 47 | Other characters have special meaning, and they may include otherwise |
| 48 | normal characters to give them special meaning: |
| 49 | |
| 50 | :Pattern |:Effect |
| 51 | --------------------------------------------------------------------- |
| 52 | `*` | Matches any sequence of zero or more characters |
| 53 | `?` | Matches exactly one character |
| 54 | `[...]` | Matches one character from the enclosed list of characters |
| 55 | `[^...]` | Matches one character *not* in the enclosed list |
| 56 | |
| 57 | Note that unlike [POSIX globs][pg], these special characters and |
| 58 | sequences are allowed to match `/` directory separators as well as the |
| 59 | initial `.` in the name of a hidden file or directory. This is because |
| 60 | Fossil file names are stored as complete path names. The distinction |
| 61 | between file name and directory name is “below” Fossil in this sense. |
| 62 | |
| 63 | [pg]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13 |
| 64 | |
| 65 | The bracket expresssions above require some additional explanation: |
| 66 | |
| 67 | * A range of characters may be specified with `-`, so `[a-f]` matches |
| 68 | exactly the same characters as `[abcdef]`. Ranges reflect Unicode |
| 69 | code points without any locale-specific collation sequence. |
| 70 | Therefore, this particular sequence never matches the Unicode |
| 71 | pre-composed character `é`, for example. (U+00E9) |
| 72 | |
| 73 | * This dependence on character/code point ordering may have other |
| 74 | effects to surprise you. For example, the glob `[A-z]` not only |
| 75 | matches upper and lowercase ASCII letters, it also matches several |
| 76 | punctuation characters placed between `Z` and `a` in both ASCII and |
| 77 | Unicode: `[`, `\`, `]`, `^`, `_`, and <tt>\`</tt>. |
| 78 | |
| 79 | * You may include a literal `-` in a list by placing it last, just |
| 80 | before the `]`. |
| 81 | |
| 82 | * You may include a literal `]` in a list by making the first |
| 83 | character after the `[` or `[^`. At any other place, `]` ends the list. |
| 84 | |
| 85 | * You may include a literal `^` in a list by placing it anywhere |
| 86 | except after the opening `[`. |
| 87 | |
| 88 | * Beware that a range must be specified from low value to high |
| 89 | value: `[z-a]` does not match any character at all, preventing the |
| 90 | entire glob from matching. |
| 91 | |
| 92 | Some examples of character lists: |
| 93 | |
| 94 | :Pattern |:Effect |
| 95 | --------------------------------------------------------------------- |
| @@ -92,45 +102,56 @@ | |
| 102 | `[]^]` | Matches either `]` or `^` |
| 103 | `[^-]` | Matches exactly one character other than `-` |
| 104 | |
| 105 | White space means the specific ASCII characters TAB, LF, VT, FF, CR, |
| 106 | and SPACE. Note that this does not include any of the many additional |
| 107 | spacing characters available in Unicode such as |
| 108 | U+00A0, NO-BREAK SPACE. |
| 109 | |
| 110 | Because both LF and CR are white space and leading and trailing spaces |
| 111 | are stripped from each glob in a list, a list of globs may be broken |
| 112 | into lines between globs when the list is stored in a file, as for a |
| 113 | versioned setting. |
| 114 | |
| 115 | Note that 'single quotes' and "double quotes" are the ASCII straight |
| 116 | quote characters, not any of the other quotation marks provided in |
| 117 | Unicode and specifically not the "curly" quotes preferred by |
| 118 | typesetters and word processors. |
| 119 | |
| 120 | |
| 121 | ## File Names to Match |
| 122 | |
| 123 | Before it is compared to a glob pattern, each file name is transformed |
| 124 | to a canonical form: |
| 125 | |
| 126 | * all directory separators are changed to `/` |
| 127 | * redundant slashes are removed |
| 128 | * all `.` path components are removed |
| 129 | * all `..` path components are resolved |
| 130 | |
| 131 | (There are additional details we are ignoring here, but they cover rare |
| 132 | edge cases and follow the principle of least surprise.) |
| 133 | |
| 134 | The glob must match the *entire* canonical file name to be considered a |
| 135 | match. |
| 136 | |
| 137 | The goal is to have a name that is the simplest possible for each |
| 138 | particular file, and that will be the same regardless of the platform |
| 139 | you run Fossil on. This is important when you have a repository cloned |
| 140 | from multiple platforms and have globs in versioned settings: you want |
| 141 | those settings to be interpreted the same way everywhere. |
| 142 | |
| 143 | Beware, however, that all glob matching in Fossil is case sensitive |
| 144 | regardless of host platform and file system. This will not be a surprise |
| 145 | on POSIX platforms where file names are usually treated case |
| 146 | sensitively. However, most Windows file systems are case preserving but |
| 147 | case insensitive. That is, on Windows, the names `ReadMe` and `README` |
| 148 | are usually names of the same file. The same is true in other cases, |
| 149 | such as by default on macOS file systems and in the file system drivers |
| 150 | for Windows file systems running on non-Windows systems. (e.g. exfat on |
| 151 | Linux.) Therefore, write your Fossil glob patterns to match the name of |
| 152 | the file as checked into the repository. |
| 153 | |
| 154 | Some example cases: |
| 155 | |
| 156 | :Pattern |:Effect |
| 157 | -------------------------------------------------------------------------------- |
| @@ -478,14 +499,14 @@ | |
| 499 | |
| 500 | |
| 501 | ## Converting `.gitignore` to `ignore-glob` |
| 502 | |
| 503 | Many other version control systems handle the specific case of |
| 504 | ignoring certain files differently from Fossil: they have you create |
| 505 | individual "ignore" files in each folder, which specify things ignored |
| 506 | in that folder and below. Usually some form of glob patterns are used |
| 507 | in those files, but the details differ from Fossil. |
| 508 | |
| 509 | In many simple cases, you can just store a top level "ignore" file in |
| 510 | `.fossil-settings/ignore-glob`. But as usual, there will be lots of |
| 511 | edge cases. |
| 512 | |
| @@ -495,33 +516,33 @@ | |
| 516 | version controlled files. Some of the files used have no set name, but |
| 517 | are called out in configuration files. |
| 518 | |
| 519 | [gitignore]: https://git-scm.com/docs/gitignore |
| 520 | |
| 521 | In contrast, Fossil has a global setting and a local setting, but the local setting |
| 522 | overrides the global rather than extending it. Similarly, a Fossil |
| 523 | command's `--ignore` option replaces the `ignore-glob` setting rather |
| 524 | than extending it. |
| 525 | |
| 526 | With that in mind, translating a `.gitignore` file into |
| 527 | `.fossil-settings/ignore-glob` may be possible in many cases. Here are |
| 528 | some of features of `.gitignore` and comments on how they relate to |
| 529 | Fossil: |
| 530 | |
| 531 | * "A blank line matches no files...": same in Fossil. |
| 532 | * "A line starting with # serves as a comment....": not in Fossil. |
| 533 | * "Trailing spaces are ignored unless they are quoted..." is similar |
| 534 | in Fossil. All whitespace before and after a glob is trimmed in |
| 535 | Fossil unless quoted with single or double quotes. Git uses |
| 536 | backslash quoting instead, which Fossil does not. |
| 537 | * "An optional prefix "!" which negates the pattern...": not in |
| 538 | Fossil. |
| 539 | * Git's globs are relative to the location of the `.gitignore` file: |
| 540 | Fossil's globs are relative to the root of the workspace. |
| 541 | * Git's globs and Fossil's globs treat directory separators |
| 542 | differently. Git includes a notation for zero or more directories |
| 543 | that is not needed in Fossil. |
| 544 | |
| 545 | ### Example |
| 546 | |
| 547 | In a project with source and documentation: |
| 548 | |
| @@ -550,30 +571,26 @@ | |
| 571 | |
| 572 | |
| 573 | |
| 574 | ## Implementation and References |
| 575 | |
| 576 | The implementation of the Fossil-specific glob pattern handling is here: |
| 577 | |
| 578 | :File |:Description |
| 579 | -------------------------------------------------------------------------------- |
| 580 | [`src/glob.c`][] | pattern list loading, parsing, and generic matching code |
| 581 | [`src/file.c`][] | application of glob patterns to file names |
| 582 | |
| 583 | [`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c |
| 584 | [`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c |
| 585 | |
| 586 | See the [Adding Features to Fossil][aff] document for broader details |
| 587 | about finding and working with such code. |
| 588 | |
| 589 | The actual pattern matching leverages the `GLOB` operator in SQLite, so |
| 590 | you may find [its documentation][gdoc], [source code][gsrc] and [test |
| 591 | harness][gtst] helpful. |
| 592 | |
| 593 | [aff]: ./adding_code.wiki |
| 594 | [gdoc]: https://sqlite.org/lang_expr.html#like |
| 595 | [gsrc]: https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768 |
| 596 | [gtst]: https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673 |
| 597 |