Fossil SCM

Assorted improvements to www/globs.md, mainly to clarity and grammar.

wyoung 2020-03-21 19:57 trunk
Commit 7898593d9de25804fc1c737148ab0f96b646cf861d24fcdd8afdf5ed57a48670
1 file changed +119 -102
+119 -102
--- www/globs.md
+++ www/globs.md
@@ -4,82 +4,92 @@
44
A [glob pattern][glob] is a text expression that matches one or more
55
file names using wild cards familiar to most users of a command line.
66
For example, `*` is a glob that matches any name at all and
77
`Readme.txt` is a glob that matches exactly one file.
88
9
-Note that although both are notations for describing patterns in text,
10
-glob patterns are not the same thing as a [regular expression or
11
-regexp][regexp].
9
+A glob should not be confused with a [regular expression][regexp] (RE),
10
+even though they use some of the same special characters for similar
11
+purposes, because [they are not fully compatible][greinc] pattern
12
+matching languages. Fossil uses globs when matching file names with the
13
+settings described in this document, not REs.
1214
13
-[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia)
15
+[glob]: https://en.wikipedia.org/wiki/Glob_(programming)
16
+[greinc]: https://unix.stackexchange.com/a/57958/138
1417
[regexp]: https://en.wikipedia.org/wiki/Regular_expression
1518
16
-
17
-A number of fossil setting values hold one or more file glob patterns
18
-that will identify files needing special treatment. Glob patterns are
19
-also accepted in options to certain commands as well as query
20
-parameters to certain pages.
21
-
22
-In many cases more than one glob may be specified in a setting,
23
-option, or query parameter by listing multiple globs separated by a
24
-comma or white space.
25
-
26
-Of course, many fossil commands also accept lists of files to act on,
27
-and those also may be specified with globs. Although those glob
28
-patterns are similar to what is described here, they are not defined
29
-by fossil, but rather by the conventions of the operating system in
30
-use.
19
+These settings hold one or more file glob patterns to cause Fossil to
20
+give matching named files special treatment. Glob patterns are also
21
+accepted in options to certain commands and as query parameters to
22
+certain Fossil UI web pages.
23
+
24
+Where Fossil also accepts globs in commands, this handling may interact
25
+with your OS’s command shell or its C runtime system, because they may
26
+have their own glob pattern handling. We will detail such interactions
27
+below.
3128
3229
3330
## Syntax
3431
35
-A list of glob patterns is simply one or more glob patterns separated
32
+Where Fossil accepts glob patterns, it will usually accept a *list* of
33
+such patterns, each individual pattern separated from the others
3634
by white space or commas. If a glob must contain white spaces or
3735
commas, it can be quoted with either single or double quotation marks.
38
-A list is said to match if any one (or more) globs in the list
36
+A list is said to match if any one glob in the list
3937
matches.
4038
41
-A glob pattern is a collection of characters compared to a target
42
-text, usually a file name. The whole glob is said to match if it
43
-successfully consumes and matches the entire target text. Glob
44
-patterns are made up of ordinary characters and special characters.
45
-
46
-Ordinary characters consume a single character of the target and must
47
-match it exactly.
48
-
49
-Special characters (and special character sequences) consume zero or
50
-more characters from the target and describe what matches. The special
51
-characters (and sequences) are:
39
+A glob pattern matches a given file name if it successfully consumes and
40
+matches the *entire* name. Partial matches are failed matches.
41
+
42
+Most characters in a glob pattern consume a single character of the file
43
+name and must match it exactly. For instance, “a” in a glob simply
44
+matches the letter “a” in the file name unless it is inside a special
45
+character sequence.
46
+
47
+Other characters have special meaning, and they may include otherwise
48
+normal characters to give them special meaning:
5249
5350
:Pattern |:Effect
5451
---------------------------------------------------------------------
5552
`*` | Matches any sequence of zero or more characters
5653
`?` | Matches exactly one character
5754
`[...]` | Matches one character from the enclosed list of characters
58
-`[^...]` | Matches one character not in the enclosed list
55
+`[^...]` | Matches one character *not* in the enclosed list
5956
60
-Special character sequences have some additional features:
57
+Note that unlike [POSIX globs][pg], these special characters and
58
+sequences are allowed to match `/` directory separators as well as the
59
+initial `.` in the name of a hidden file or directory. This is because
60
+Fossil file names are stored as complete path names. The distinction
61
+between file name and directory name is “below” Fossil in this sense.
6162
62
- * A range of characters may be specified with `-`, so `[a-d]` matches
63
- exactly the same characters as `[abcd]`. Ranges reflect Unicode
63
+[pg]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13
64
+
65
+The bracket expresssions above require some additional explanation:
66
+
67
+ * A range of characters may be specified with `-`, so `[a-f]` matches
68
+ exactly the same characters as `[abcdef]`. Ranges reflect Unicode
6469
code points without any locale-specific collation sequence.
65
- * Include `-` in a list by placing it last, just before the `]`.
66
- * Include `]` in a list by making the first character after the `[` or
67
- `[^`. At any other place, `]` ends the list.
68
- * Include `^` in a list by placing anywhere except first after the
69
- `[`.
70
- * Beware that ranges in lists may include more than you expect:
71
- `[A-z]` Matches `A` and `Z`, but also matches `a` and some less
72
- obvious characters such as `[`, `\`, and `]` with code point
73
- values between `Z` and `a`.
70
+ Therefore, this particular sequence never matches the Unicode
71
+ pre-composed character `é`, for example. (U+00E9)
72
+
73
+ * This dependence on character/code point ordering may have other
74
+ effects to surprise you. For example, the glob `[A-z]` not only
75
+ matches upper and lowercase ASCII letters, it also matches several
76
+ punctuation characters placed between `Z` and `a` in both ASCII and
77
+ Unicode: `[`, `\`, `]`, `^`, `_`, and <tt>\`</tt>.
78
+
79
+ * You may include a literal `-` in a list by placing it last, just
80
+ before the `]`.
81
+
82
+ * You may include a literal `]` in a list by making the first
83
+ character after the `[` or `[^`. At any other place, `]` ends the list.
84
+
85
+ * You may include a literal `^` in a list by placing it anywhere
86
+ except after the opening `[`.
87
+
7488
* Beware that a range must be specified from low value to high
7589
value: `[z-a]` does not match any character at all, preventing the
7690
entire glob from matching.
77
- * Note that unlike typical Unix shell globs, wildcards (`*`, `?`,
78
- and character lists) are allowed to match `/` directory
79
- separators as well as the initial `.` in the name of a hidden
80
- file or directory.
8191
8292
Some examples of character lists:
8393
8494
:Pattern |:Effect
8595
---------------------------------------------------------------------
@@ -92,45 +102,56 @@
92102
`[]^]` | Matches either `]` or `^`
93103
`[^-]` | Matches exactly one character other than `-`
94104
95105
White space means the specific ASCII characters TAB, LF, VT, FF, CR,
96106
and SPACE. Note that this does not include any of the many additional
97
-spacing characters available in Unicode, and specifically does not
98
-include U+00A0 NO-BREAK SPACE.
107
+spacing characters available in Unicode such as
108
+U+00A0, NO-BREAK SPACE.
99109
100110
Because both LF and CR are white space and leading and trailing spaces
101111
are stripped from each glob in a list, a list of globs may be broken
102
-into lines between globs when the list is stored in a file (as for a
103
-versioned setting).
112
+into lines between globs when the list is stored in a file, as for a
113
+versioned setting.
104114
105
-Similarly 'single quotes' and "double quotes" are the ASCII straight
115
+Note that 'single quotes' and "double quotes" are the ASCII straight
106116
quote characters, not any of the other quotation marks provided in
107117
Unicode and specifically not the "curly" quotes preferred by
108118
typesetters and word processors.
109119
110120
111121
## File Names to Match
112122
113123
Before it is compared to a glob pattern, each file name is transformed
114
-to a canonical form. The glob must match the entire canonical file
115
-name to be considered a match.
116
-
117
-The canonical name of a file has all directory separators changed to
118
-`/`, redundant slashes are removed, all `.` path components are
119
-removed, and all `..` path components are resolved. (There are
120
-additional details we are ignoring here, but they cover rare edge
121
-cases and also follow the principle of least surprise.)
124
+to a canonical form:
125
+
126
+ * all directory separators are changed to `/`
127
+ * redundant slashes are removed
128
+ * all `.` path components are removed
129
+ * all `..` path components are resolved
130
+
131
+(There are additional details we are ignoring here, but they cover rare
132
+edge cases and follow the principle of least surprise.)
133
+
134
+The glob must match the *entire* canonical file name to be considered a
135
+match.
122136
123137
The goal is to have a name that is the simplest possible for each
124
-particular file, and that will be the same on Windows, Unix, and any
125
-other platform where fossil is run.
138
+particular file, and that will be the same regardless of the platform
139
+you run Fossil on. This is important when you have a repository cloned
140
+from multiple platforms and have globs in versioned settings: you want
141
+those settings to be interpreted the same way everywhere.
126142
127
-Beware, however, that all glob matching is case sensitive. This will
128
-not be a surprise on Unix where all file names are also case
129
-sensitive. However, most Windows file systems are case preserving and
143
+Beware, however, that all glob matching in Fossil is case sensitive
144
+regardless of host platform and file system. This will not be a surprise
145
+on POSIX platforms where file names are usually treated case
146
+sensitively. However, most Windows file systems are case preserving but
130147
case insensitive. That is, on Windows, the names `ReadMe` and `README`
131
-are names of the same file; on Unix they are different files.
148
+are usually names of the same file. The same is true in other cases,
149
+such as by default on macOS file systems and in the file system drivers
150
+for Windows file systems running on non-Windows systems. (e.g. exfat on
151
+Linux.) Therefore, write your Fossil glob patterns to match the name of
152
+the file as checked into the repository.
132153
133154
Some example cases:
134155
135156
:Pattern |:Effect
136157
--------------------------------------------------------------------------------
@@ -478,14 +499,14 @@
478499
479500
480501
## Converting `.gitignore` to `ignore-glob`
481502
482503
Many other version control systems handle the specific case of
483
-ignoring certain files differently from fossil: they have you create
504
+ignoring certain files differently from Fossil: they have you create
484505
individual "ignore" files in each folder, which specify things ignored
485506
in that folder and below. Usually some form of glob patterns are used
486
-in those files, but the details differ from fossil.
507
+in those files, but the details differ from Fossil.
487508
488509
In many simple cases, you can just store a top level "ignore" file in
489510
`.fossil-settings/ignore-glob`. But as usual, there will be lots of
490511
edge cases.
491512
@@ -495,33 +516,33 @@
495516
version controlled files. Some of the files used have no set name, but
496517
are called out in configuration files.
497518
498519
[gitignore]: https://git-scm.com/docs/gitignore
499520
500
-In contrast, fossil has a global setting and a local setting, but the local setting
501
-overrides the global rather than extending it. Similarly, a fossil
521
+In contrast, Fossil has a global setting and a local setting, but the local setting
522
+overrides the global rather than extending it. Similarly, a Fossil
502523
command's `--ignore` option replaces the `ignore-glob` setting rather
503524
than extending it.
504525
505526
With that in mind, translating a `.gitignore` file into
506527
`.fossil-settings/ignore-glob` may be possible in many cases. Here are
507528
some of features of `.gitignore` and comments on how they relate to
508
-fossil:
529
+Fossil:
509530
510
- * "A blank line matches no files..." is the same in fossil.
511
- * "A line starting with # serves as a comment...." not in fossil.
531
+ * "A blank line matches no files...": same in Fossil.
532
+ * "A line starting with # serves as a comment....": not in Fossil.
512533
* "Trailing spaces are ignored unless they are quoted..." is similar
513
- in fossil. All whitespace before and after a glob is trimmed in
514
- fossil unless quoted with single or double quotes. Git uses
515
- backslash quoting instead, which fossil does not.
516
- * "An optional prefix "!" which negates the pattern..." not in
517
- fossil.
518
- * Git's globs are relative to the location of the `.gitignore` file;
519
- fossil's globs are relative to the root of the workspace.
520
- * Git's globs and fossil's globs treat directory separators
534
+ in Fossil. All whitespace before and after a glob is trimmed in
535
+ Fossil unless quoted with single or double quotes. Git uses
536
+ backslash quoting instead, which Fossil does not.
537
+ * "An optional prefix "!" which negates the pattern...": not in
538
+ Fossil.
539
+ * Git's globs are relative to the location of the `.gitignore` file:
540
+ Fossil's globs are relative to the root of the workspace.
541
+ * Git's globs and Fossil's globs treat directory separators
521542
differently. Git includes a notation for zero or more directories
522
- that is not needed in fossil.
543
+ that is not needed in Fossil.
523544
524545
### Example
525546
526547
In a project with source and documentation:
527548
@@ -550,30 +571,26 @@
550571
551572
552573
553574
## Implementation and References
554575
555
-Most of the implementation of glob pattern handling in fossil is found
556
-`glob.c`, `file.c`, and each individual command and web page that uses
557
-a glob pattern. Find commands and pages in the fossil sources by
558
-looking for comments like `COMMAND: add` or `WEBPAGE: timeline` in
559
-front of the function that implements the command or page in files
560
-`src/*.c`. (Fossil's build system creates the tables used to dispatch
561
-commands at build time by searching the sources for those comments.) A
562
-few starting points:
576
+The implementation of the Fossil-specific glob pattern handling is here:
563577
564578
:File |:Description
565579
--------------------------------------------------------------------------------
566
-[`src/glob.c`][] | Implementation of glob pattern list loading, parsing, and matching.
567
-[`src/file.c`][] | Implementation of various kinds of canonical names of a file.
580
+[`src/glob.c`][] | pattern list loading, parsing, and generic matching code
581
+[`src/file.c`][] | application of glob patterns to file names
568582
569583
[`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c
570584
[`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c
571585
572
-The actual pattern matching is implemented in SQL, so the
573
-documentation for `GLOB` and the other string matching operators in
574
-[SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of
575
-course, the SQLite [source code]
576
-(https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768)
577
-and [test harnesses]
578
-(https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673)
579
-also make entertaining reading.
586
+See the [Adding Features to Fossil][aff] document for broader details
587
+about finding and working with such code.
588
+
589
+The actual pattern matching leverages the `GLOB` operator in SQLite, so
590
+you may find [its documentation][gdoc], [source code][gsrc] and [test
591
+harness][gtst] helpful.
592
+
593
+[aff]: ./adding_code.wiki
594
+[gdoc]: https://sqlite.org/lang_expr.html#like
595
+[gsrc]: https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768
596
+[gtst]: https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673
580597
--- www/globs.md
+++ www/globs.md
@@ -4,82 +4,92 @@
4 A [glob pattern][glob] is a text expression that matches one or more
5 file names using wild cards familiar to most users of a command line.
6 For example, `*` is a glob that matches any name at all and
7 `Readme.txt` is a glob that matches exactly one file.
8
9 Note that although both are notations for describing patterns in text,
10 glob patterns are not the same thing as a [regular expression or
11 regexp][regexp].
 
 
12
13 [glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia)
 
14 [regexp]: https://en.wikipedia.org/wiki/Regular_expression
15
16
17 A number of fossil setting values hold one or more file glob patterns
18 that will identify files needing special treatment. Glob patterns are
19 also accepted in options to certain commands as well as query
20 parameters to certain pages.
21
22 In many cases more than one glob may be specified in a setting,
23 option, or query parameter by listing multiple globs separated by a
24 comma or white space.
25
26 Of course, many fossil commands also accept lists of files to act on,
27 and those also may be specified with globs. Although those glob
28 patterns are similar to what is described here, they are not defined
29 by fossil, but rather by the conventions of the operating system in
30 use.
31
32
33 ## Syntax
34
35 A list of glob patterns is simply one or more glob patterns separated
 
36 by white space or commas. If a glob must contain white spaces or
37 commas, it can be quoted with either single or double quotation marks.
38 A list is said to match if any one (or more) globs in the list
39 matches.
40
41 A glob pattern is a collection of characters compared to a target
42 text, usually a file name. The whole glob is said to match if it
43 successfully consumes and matches the entire target text. Glob
44 patterns are made up of ordinary characters and special characters.
45
46 Ordinary characters consume a single character of the target and must
47 match it exactly.
48
49 Special characters (and special character sequences) consume zero or
50 more characters from the target and describe what matches. The special
51 characters (and sequences) are:
52
53 :Pattern |:Effect
54 ---------------------------------------------------------------------
55 `*` | Matches any sequence of zero or more characters
56 `?` | Matches exactly one character
57 `[...]` | Matches one character from the enclosed list of characters
58 `[^...]` | Matches one character not in the enclosed list
59
60 Special character sequences have some additional features:
 
 
 
 
61
62 * A range of characters may be specified with `-`, so `[a-d]` matches
63 exactly the same characters as `[abcd]`. Ranges reflect Unicode
 
 
 
 
64 code points without any locale-specific collation sequence.
65 * Include `-` in a list by placing it last, just before the `]`.
66 * Include `]` in a list by making the first character after the `[` or
67 `[^`. At any other place, `]` ends the list.
68 * Include `^` in a list by placing anywhere except first after the
69 `[`.
70 * Beware that ranges in lists may include more than you expect:
71 `[A-z]` Matches `A` and `Z`, but also matches `a` and some less
72 obvious characters such as `[`, `\`, and `]` with code point
73 values between `Z` and `a`.
 
 
 
 
 
 
 
 
 
74 * Beware that a range must be specified from low value to high
75 value: `[z-a]` does not match any character at all, preventing the
76 entire glob from matching.
77 * Note that unlike typical Unix shell globs, wildcards (`*`, `?`,
78 and character lists) are allowed to match `/` directory
79 separators as well as the initial `.` in the name of a hidden
80 file or directory.
81
82 Some examples of character lists:
83
84 :Pattern |:Effect
85 ---------------------------------------------------------------------
@@ -92,45 +102,56 @@
92 `[]^]` | Matches either `]` or `^`
93 `[^-]` | Matches exactly one character other than `-`
94
95 White space means the specific ASCII characters TAB, LF, VT, FF, CR,
96 and SPACE. Note that this does not include any of the many additional
97 spacing characters available in Unicode, and specifically does not
98 include U+00A0 NO-BREAK SPACE.
99
100 Because both LF and CR are white space and leading and trailing spaces
101 are stripped from each glob in a list, a list of globs may be broken
102 into lines between globs when the list is stored in a file (as for a
103 versioned setting).
104
105 Similarly 'single quotes' and "double quotes" are the ASCII straight
106 quote characters, not any of the other quotation marks provided in
107 Unicode and specifically not the "curly" quotes preferred by
108 typesetters and word processors.
109
110
111 ## File Names to Match
112
113 Before it is compared to a glob pattern, each file name is transformed
114 to a canonical form. The glob must match the entire canonical file
115 name to be considered a match.
116
117 The canonical name of a file has all directory separators changed to
118 `/`, redundant slashes are removed, all `.` path components are
119 removed, and all `..` path components are resolved. (There are
120 additional details we are ignoring here, but they cover rare edge
121 cases and also follow the principle of least surprise.)
 
 
 
 
122
123 The goal is to have a name that is the simplest possible for each
124 particular file, and that will be the same on Windows, Unix, and any
125 other platform where fossil is run.
 
 
126
127 Beware, however, that all glob matching is case sensitive. This will
128 not be a surprise on Unix where all file names are also case
129 sensitive. However, most Windows file systems are case preserving and
 
130 case insensitive. That is, on Windows, the names `ReadMe` and `README`
131 are names of the same file; on Unix they are different files.
 
 
 
 
132
133 Some example cases:
134
135 :Pattern |:Effect
136 --------------------------------------------------------------------------------
@@ -478,14 +499,14 @@
478
479
480 ## Converting `.gitignore` to `ignore-glob`
481
482 Many other version control systems handle the specific case of
483 ignoring certain files differently from fossil: they have you create
484 individual "ignore" files in each folder, which specify things ignored
485 in that folder and below. Usually some form of glob patterns are used
486 in those files, but the details differ from fossil.
487
488 In many simple cases, you can just store a top level "ignore" file in
489 `.fossil-settings/ignore-glob`. But as usual, there will be lots of
490 edge cases.
491
@@ -495,33 +516,33 @@
495 version controlled files. Some of the files used have no set name, but
496 are called out in configuration files.
497
498 [gitignore]: https://git-scm.com/docs/gitignore
499
500 In contrast, fossil has a global setting and a local setting, but the local setting
501 overrides the global rather than extending it. Similarly, a fossil
502 command's `--ignore` option replaces the `ignore-glob` setting rather
503 than extending it.
504
505 With that in mind, translating a `.gitignore` file into
506 `.fossil-settings/ignore-glob` may be possible in many cases. Here are
507 some of features of `.gitignore` and comments on how they relate to
508 fossil:
509
510 * "A blank line matches no files..." is the same in fossil.
511 * "A line starting with # serves as a comment...." not in fossil.
512 * "Trailing spaces are ignored unless they are quoted..." is similar
513 in fossil. All whitespace before and after a glob is trimmed in
514 fossil unless quoted with single or double quotes. Git uses
515 backslash quoting instead, which fossil does not.
516 * "An optional prefix "!" which negates the pattern..." not in
517 fossil.
518 * Git's globs are relative to the location of the `.gitignore` file;
519 fossil's globs are relative to the root of the workspace.
520 * Git's globs and fossil's globs treat directory separators
521 differently. Git includes a notation for zero or more directories
522 that is not needed in fossil.
523
524 ### Example
525
526 In a project with source and documentation:
527
@@ -550,30 +571,26 @@
550
551
552
553 ## Implementation and References
554
555 Most of the implementation of glob pattern handling in fossil is found
556 `glob.c`, `file.c`, and each individual command and web page that uses
557 a glob pattern. Find commands and pages in the fossil sources by
558 looking for comments like `COMMAND: add` or `WEBPAGE: timeline` in
559 front of the function that implements the command or page in files
560 `src/*.c`. (Fossil's build system creates the tables used to dispatch
561 commands at build time by searching the sources for those comments.) A
562 few starting points:
563
564 :File |:Description
565 --------------------------------------------------------------------------------
566 [`src/glob.c`][] | Implementation of glob pattern list loading, parsing, and matching.
567 [`src/file.c`][] | Implementation of various kinds of canonical names of a file.
568
569 [`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c
570 [`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c
571
572 The actual pattern matching is implemented in SQL, so the
573 documentation for `GLOB` and the other string matching operators in
574 [SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of
575 course, the SQLite [source code]
576 (https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768)
577 and [test harnesses]
578 (https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673)
579 also make entertaining reading.
 
 
 
580
--- www/globs.md
+++ www/globs.md
@@ -4,82 +4,92 @@
4 A [glob pattern][glob] is a text expression that matches one or more
5 file names using wild cards familiar to most users of a command line.
6 For example, `*` is a glob that matches any name at all and
7 `Readme.txt` is a glob that matches exactly one file.
8
9 A glob should not be confused with a [regular expression][regexp] (RE),
10 even though they use some of the same special characters for similar
11 purposes, because [they are not fully compatible][greinc] pattern
12 matching languages. Fossil uses globs when matching file names with the
13 settings described in this document, not REs.
14
15 [glob]: https://en.wikipedia.org/wiki/Glob_(programming)
16 [greinc]: https://unix.stackexchange.com/a/57958/138
17 [regexp]: https://en.wikipedia.org/wiki/Regular_expression
18
19 These settings hold one or more file glob patterns to cause Fossil to
20 give matching named files special treatment. Glob patterns are also
21 accepted in options to certain commands and as query parameters to
22 certain Fossil UI web pages.
23
24 Where Fossil also accepts globs in commands, this handling may interact
25 with your OS’s command shell or its C runtime system, because they may
26 have their own glob pattern handling. We will detail such interactions
27 below.
 
 
 
 
 
 
28
29
30 ## Syntax
31
32 Where Fossil accepts glob patterns, it will usually accept a *list* of
33 such patterns, each individual pattern separated from the others
34 by white space or commas. If a glob must contain white spaces or
35 commas, it can be quoted with either single or double quotation marks.
36 A list is said to match if any one glob in the list
37 matches.
38
39 A glob pattern matches a given file name if it successfully consumes and
40 matches the *entire* name. Partial matches are failed matches.
41
42 Most characters in a glob pattern consume a single character of the file
43 name and must match it exactly. For instance, “a” in a glob simply
44 matches the letter “a” in the file name unless it is inside a special
45 character sequence.
46
47 Other characters have special meaning, and they may include otherwise
48 normal characters to give them special meaning:
 
49
50 :Pattern |:Effect
51 ---------------------------------------------------------------------
52 `*` | Matches any sequence of zero or more characters
53 `?` | Matches exactly one character
54 `[...]` | Matches one character from the enclosed list of characters
55 `[^...]` | Matches one character *not* in the enclosed list
56
57 Note that unlike [POSIX globs][pg], these special characters and
58 sequences are allowed to match `/` directory separators as well as the
59 initial `.` in the name of a hidden file or directory. This is because
60 Fossil file names are stored as complete path names. The distinction
61 between file name and directory name is “below” Fossil in this sense.
62
63 [pg]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13
64
65 The bracket expresssions above require some additional explanation:
66
67 * A range of characters may be specified with `-`, so `[a-f]` matches
68 exactly the same characters as `[abcdef]`. Ranges reflect Unicode
69 code points without any locale-specific collation sequence.
70 Therefore, this particular sequence never matches the Unicode
71 pre-composed character `é`, for example. (U+00E9)
72
73 * This dependence on character/code point ordering may have other
74 effects to surprise you. For example, the glob `[A-z]` not only
75 matches upper and lowercase ASCII letters, it also matches several
76 punctuation characters placed between `Z` and `a` in both ASCII and
77 Unicode: `[`, `\`, `]`, `^`, `_`, and <tt>\`</tt>.
78
79 * You may include a literal `-` in a list by placing it last, just
80 before the `]`.
81
82 * You may include a literal `]` in a list by making the first
83 character after the `[` or `[^`. At any other place, `]` ends the list.
84
85 * You may include a literal `^` in a list by placing it anywhere
86 except after the opening `[`.
87
88 * Beware that a range must be specified from low value to high
89 value: `[z-a]` does not match any character at all, preventing the
90 entire glob from matching.
 
 
 
 
91
92 Some examples of character lists:
93
94 :Pattern |:Effect
95 ---------------------------------------------------------------------
@@ -92,45 +102,56 @@
102 `[]^]` | Matches either `]` or `^`
103 `[^-]` | Matches exactly one character other than `-`
104
105 White space means the specific ASCII characters TAB, LF, VT, FF, CR,
106 and SPACE. Note that this does not include any of the many additional
107 spacing characters available in Unicode such as
108 U+00A0, NO-BREAK SPACE.
109
110 Because both LF and CR are white space and leading and trailing spaces
111 are stripped from each glob in a list, a list of globs may be broken
112 into lines between globs when the list is stored in a file, as for a
113 versioned setting.
114
115 Note that 'single quotes' and "double quotes" are the ASCII straight
116 quote characters, not any of the other quotation marks provided in
117 Unicode and specifically not the "curly" quotes preferred by
118 typesetters and word processors.
119
120
121 ## File Names to Match
122
123 Before it is compared to a glob pattern, each file name is transformed
124 to a canonical form:
125
126 * all directory separators are changed to `/`
127 * redundant slashes are removed
128 * all `.` path components are removed
129 * all `..` path components are resolved
130
131 (There are additional details we are ignoring here, but they cover rare
132 edge cases and follow the principle of least surprise.)
133
134 The glob must match the *entire* canonical file name to be considered a
135 match.
136
137 The goal is to have a name that is the simplest possible for each
138 particular file, and that will be the same regardless of the platform
139 you run Fossil on. This is important when you have a repository cloned
140 from multiple platforms and have globs in versioned settings: you want
141 those settings to be interpreted the same way everywhere.
142
143 Beware, however, that all glob matching in Fossil is case sensitive
144 regardless of host platform and file system. This will not be a surprise
145 on POSIX platforms where file names are usually treated case
146 sensitively. However, most Windows file systems are case preserving but
147 case insensitive. That is, on Windows, the names `ReadMe` and `README`
148 are usually names of the same file. The same is true in other cases,
149 such as by default on macOS file systems and in the file system drivers
150 for Windows file systems running on non-Windows systems. (e.g. exfat on
151 Linux.) Therefore, write your Fossil glob patterns to match the name of
152 the file as checked into the repository.
153
154 Some example cases:
155
156 :Pattern |:Effect
157 --------------------------------------------------------------------------------
@@ -478,14 +499,14 @@
499
500
501 ## Converting `.gitignore` to `ignore-glob`
502
503 Many other version control systems handle the specific case of
504 ignoring certain files differently from Fossil: they have you create
505 individual "ignore" files in each folder, which specify things ignored
506 in that folder and below. Usually some form of glob patterns are used
507 in those files, but the details differ from Fossil.
508
509 In many simple cases, you can just store a top level "ignore" file in
510 `.fossil-settings/ignore-glob`. But as usual, there will be lots of
511 edge cases.
512
@@ -495,33 +516,33 @@
516 version controlled files. Some of the files used have no set name, but
517 are called out in configuration files.
518
519 [gitignore]: https://git-scm.com/docs/gitignore
520
521 In contrast, Fossil has a global setting and a local setting, but the local setting
522 overrides the global rather than extending it. Similarly, a Fossil
523 command's `--ignore` option replaces the `ignore-glob` setting rather
524 than extending it.
525
526 With that in mind, translating a `.gitignore` file into
527 `.fossil-settings/ignore-glob` may be possible in many cases. Here are
528 some of features of `.gitignore` and comments on how they relate to
529 Fossil:
530
531 * "A blank line matches no files...": same in Fossil.
532 * "A line starting with # serves as a comment....": not in Fossil.
533 * "Trailing spaces are ignored unless they are quoted..." is similar
534 in Fossil. All whitespace before and after a glob is trimmed in
535 Fossil unless quoted with single or double quotes. Git uses
536 backslash quoting instead, which Fossil does not.
537 * "An optional prefix "!" which negates the pattern...": not in
538 Fossil.
539 * Git's globs are relative to the location of the `.gitignore` file:
540 Fossil's globs are relative to the root of the workspace.
541 * Git's globs and Fossil's globs treat directory separators
542 differently. Git includes a notation for zero or more directories
543 that is not needed in Fossil.
544
545 ### Example
546
547 In a project with source and documentation:
548
@@ -550,30 +571,26 @@
571
572
573
574 ## Implementation and References
575
576 The implementation of the Fossil-specific glob pattern handling is here:
 
 
 
 
 
 
 
577
578 :File |:Description
579 --------------------------------------------------------------------------------
580 [`src/glob.c`][] | pattern list loading, parsing, and generic matching code
581 [`src/file.c`][] | application of glob patterns to file names
582
583 [`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c
584 [`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c
585
586 See the [Adding Features to Fossil][aff] document for broader details
587 about finding and working with such code.
588
589 The actual pattern matching leverages the `GLOB` operator in SQLite, so
590 you may find [its documentation][gdoc], [source code][gsrc] and [test
591 harness][gtst] helpful.
592
593 [aff]: ./adding_code.wiki
594 [gdoc]: https://sqlite.org/lang_expr.html#like
595 [gsrc]: https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768
596 [gtst]: https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673
597

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button