|
1
|
<title>How CGI Works In Fossil</title> |
|
2
|
|
|
3
|
<h2>Introduction</h2> |
|
4
|
|
|
5
|
CGI or "Common Gateway Interface" is a venerable yet reliable technique for |
|
6
|
generating dynamic web content. This article gives a quick background on how |
|
7
|
CGI works and describes how Fossil can act as a CGI service. |
|
8
|
|
|
9
|
This is a "how it works" guide. This document provides background |
|
10
|
information on the CGI protocol so that you can better understand what |
|
11
|
is going on behind the scenes. If you just want to set up Fossil |
|
12
|
as a CGI server, see the [./server/ | Fossil Server Setup] page. Or |
|
13
|
if you want to develop CGI-based extensions to Fossil, see |
|
14
|
the [./serverext.wiki|CGI Server Extensions] page. |
|
15
|
|
|
16
|
<h2>A Quick Review Of CGI</h2> |
|
17
|
|
|
18
|
An HTTP request is a block of text that is sent by a client application |
|
19
|
(usually a web browser) and arrives at the web server over a network |
|
20
|
connection. The HTTP request contains a URL that describes the information |
|
21
|
being requested. The URL in the HTTP request is typically the same URL |
|
22
|
that appears in the URL bar at the top of the web browser that is making |
|
23
|
the request. The URL might contain a "?" character followed by |
|
24
|
query parameters. The HTTP will usually also contain other information |
|
25
|
such as the name of the application that made the request, whether or |
|
26
|
not the requesting application can accept a compressed reply, POST |
|
27
|
parameters from forms, and so forth. |
|
28
|
|
|
29
|
The job of the web server is to interpret the HTTP request and formulate |
|
30
|
an appropriate reply. |
|
31
|
The web server is free to interpret the HTTP request in any way it wants, |
|
32
|
but most web servers follow a similar pattern, described below. |
|
33
|
(Note: details may vary from one web server to another.) |
|
34
|
|
|
35
|
Suppose the filename component of the URL in the HTTP request looks like this: |
|
36
|
|
|
37
|
<pre>/one/two/timeline/four</pre> |
|
38
|
|
|
39
|
Most web servers will search their content area for files that match |
|
40
|
some prefix of the URL. The search starts with <b>/one</b>, then goes to |
|
41
|
<b>/one/two</b>, then <b>/one/two/timeline</b>, and finally |
|
42
|
<b>/one/two/timeline/four</b> is checked. The search stops at the first |
|
43
|
match. |
|
44
|
|
|
45
|
Suppose the first match is <b>/one/two</b>. If <b>/one/two</b> is an |
|
46
|
ordinary file in the content area, then that file is returned as static |
|
47
|
content. The "<b>/timeline/four</b>" suffix is silently ignored. |
|
48
|
|
|
49
|
If <b>/one/two</b> is a CGI script (or program), then the web server |
|
50
|
executes the <b>/one/two</b> script. The output generated by |
|
51
|
the script is collected and repackaged as the HTTP reply. |
|
52
|
|
|
53
|
Before executing the CGI script, the web server will set up various |
|
54
|
environment variables with information useful to the CGI script: |
|
55
|
<table> |
|
56
|
<tr><th>Variable<th>Meaning |
|
57
|
<tr><td>GATEWAY_INTERFACE<td>Always set to "CGI/1.0" |
|
58
|
<tr><td>REQUEST_URI |
|
59
|
<td>The input URL from the HTTP request. |
|
60
|
<tr><td>SCRIPT_NAME |
|
61
|
<td>The prefix of the input URL that matches the CGI script name. |
|
62
|
In this example: "/one/two". |
|
63
|
<tr><td>PATH_INFO |
|
64
|
<td>The suffix of the URL beyond the name of the CGI script. |
|
65
|
In this example: "timeline/four". |
|
66
|
<tr><td>QUERY_STRING |
|
67
|
<td>The query string that follows the "?" in the URL, if there is one. |
|
68
|
</table> |
|
69
|
|
|
70
|
There are other CGI environment variables beyond those listed above. |
|
71
|
Many Fossil servers implement the |
|
72
|
[https://fossil-scm.org/home/test-env/two/three?abc=xyz|test-env] |
|
73
|
webpage that shows some of the CGI environment |
|
74
|
variables that Fossil pays attention to. |
|
75
|
|
|
76
|
In addition to setting various CGI environment variables, if the HTTP |
|
77
|
request contains POST content, then the web server relays the POST content |
|
78
|
to standard input of the CGI script. |
|
79
|
|
|
80
|
In summary, the task of the |
|
81
|
CGI script is to read the various CGI environment variables and |
|
82
|
the POST content on standard input (if any), figure out an appropriate |
|
83
|
reply, then write that reply on standard output. |
|
84
|
The web server will read the output from the CGI script, reformat it |
|
85
|
into an appropriate HTTP reply, and relay the result back to the |
|
86
|
requesting application. |
|
87
|
The CGI script exits as soon as it generates a single reply. |
|
88
|
The web server will (usually) persist and handle multiple HTTP requests, |
|
89
|
but a CGI script handles just one HTTP request and then exits. |
|
90
|
|
|
91
|
The above is a rough outline of how CGI works. |
|
92
|
There are many details omitted from this brief discussion. |
|
93
|
See other on-line CGI tutorials for further information. |
|
94
|
|
|
95
|
<h2>How Fossil Acts As A CGI Program</h2> |
|
96
|
|
|
97
|
An appropriate CGI script for running Fossil will look something |
|
98
|
like the following: |
|
99
|
|
|
100
|
<pre> |
|
101
|
#!/usr/bin/fossil |
|
102
|
repository: /home/www/repos/project.fossil |
|
103
|
</pre> |
|
104
|
|
|
105
|
The first line of the script is a |
|
106
|
"[https://en.wikipedia.org/wiki/Shebang_%28Unix%29|shebang]" |
|
107
|
that tells the operating system what program to use as the interpreter |
|
108
|
for this script. On unix, when you execute a script that starts with |
|
109
|
a shebang, the operating system runs the program identified by the |
|
110
|
shebang with a single argument that is the full pathname of the script |
|
111
|
itself. |
|
112
|
In our example, the interpreter is Fossil, and the argument might |
|
113
|
be something like "/var/www/cgi-bin/one/two" (depending on how your |
|
114
|
particular web server is configured). |
|
115
|
|
|
116
|
The Fossil program that is run as the script interpreter |
|
117
|
is the same Fossil that runs when |
|
118
|
you type ordinary Fossil commands like "fossil sync" or "fossil commit". |
|
119
|
But in this case, as soon as it launches, the Fossil program |
|
120
|
recognizes that the GATEWAY_INTERFACE environment variable is |
|
121
|
set to "CGI/1.0" and it therefore knows that it is being used as |
|
122
|
CGI rather than as an ordinary command-line tool, and behaves accordingly. |
|
123
|
|
|
124
|
When Fossil recognizes that it is being run as CGI, it opens and reads |
|
125
|
the file identified by its sole argument (the file named by |
|
126
|
<code>argv[1]</code>). In our example, the second line of that file |
|
127
|
tells Fossil the location of the repository it will be serving. |
|
128
|
Fossil then starts looking at the CGI environment variables to figure |
|
129
|
out what web page is being requested, generates that one web page, |
|
130
|
then exits. |
|
131
|
|
|
132
|
Usually, the webpage being requested is the first term of the |
|
133
|
PATH_INFO environment variable. (Exceptions to this rule are noted |
|
134
|
in the sequel.) For our example, the first term of PATH_INFO |
|
135
|
is "timeline", which means that Fossil will generate |
|
136
|
the [/help/www/timeline|/timeline] webpage. |
|
137
|
|
|
138
|
With Fossil, terms of PATH_INFO beyond the webpage name are converted into |
|
139
|
the "name" query parameter. Hence, the following two URLs mean |
|
140
|
exactly the same thing to Fossil: |
|
141
|
<ol type='A'> |
|
142
|
<li> [https://fossil-scm.org/home/info/c14ecc43] |
|
143
|
<li> [https://fossil-scm.org/home/info?name=c14ecc43] |
|
144
|
</ol> |
|
145
|
|
|
146
|
In both cases, the CGI script is called "/fossil". For case (A), |
|
147
|
the PATH_INFO variable will be "info/c14ecc43" and so the |
|
148
|
"[/help/www/info|/info]" webpage will be generated and the suffix of |
|
149
|
PATH_INFO will be converted into the "name" query parameter, which |
|
150
|
identifies the artifact about which information is requested. |
|
151
|
In case (B), the PATH_INFO is just "info", but the same "name" |
|
152
|
query parameter is set explicitly by the URL itself. |
|
153
|
|
|
154
|
<h2>Serving Multiple Fossil Repositories From One CGI Script</h2> |
|
155
|
|
|
156
|
The previous example showed how to serve a single Fossil repository |
|
157
|
using a single CGI script. |
|
158
|
On a website that wants to serve multiple repositories, one could |
|
159
|
simply create multiple CGI scripts, one script for each repository. |
|
160
|
But it is also possible to serve multiple Fossil repositories from |
|
161
|
a single CGI script. |
|
162
|
|
|
163
|
If the CGI script for Fossil contains a "directory:" line instead of |
|
164
|
a "repository:" line, then the argument to "directory:" is the name |
|
165
|
of a directory that contains multiple repository files, each ending |
|
166
|
with ".fossil". For example: |
|
167
|
|
|
168
|
<pre> |
|
169
|
#!/usr/bin/fossil |
|
170
|
directory: /home/www/repos |
|
171
|
</pre> |
|
172
|
|
|
173
|
Suppose the /home/www/repos directory contains files named |
|
174
|
<b>one.fossil</b>, <b>two.fossil</b>, and <b>subdir/three.fossil</b>. |
|
175
|
Further suppose that the name of the CGI script (relative to the root |
|
176
|
of the webserver document area) is "cgis/example2". Then to |
|
177
|
see the timeline for the "three.fossil" repository, the URL would be: |
|
178
|
|
|
179
|
<pre> |
|
180
|
http://example.com/cgis/example2/subdir/three/timeline |
|
181
|
</pre> |
|
182
|
|
|
183
|
Here is what happens: |
|
184
|
<ol> |
|
185
|
<li> The input URI on the HTTP request is |
|
186
|
<b>/cgis/example2/subdir/three/timeline</b> |
|
187
|
<li> The web server searches prefixes of the input URI until it finds |
|
188
|
the "cgis/example2" script. The web server then sets |
|
189
|
PATH_INFO to the "subdir/three/timeline" suffix and invokes the |
|
190
|
"cgis/example2" script. |
|
191
|
<li> Fossil runs and sees the "directory:" line pointing to |
|
192
|
"/home/www/repos". Fossil then starts pulling terms off the |
|
193
|
front of the PATH_INFO looking for a repository. It first looks |
|
194
|
at "/home/www/resps/subdir.fossil" but there is no such repository. |
|
195
|
So then it looks at "/home/www/repos/subdir/three.fossil" and finds |
|
196
|
a repository. The PATH_INFO is shortened by removing |
|
197
|
"subdir/three/" leaving it at just "timeline". |
|
198
|
<li> Fossil looks at the rest of PATH_INFO to see that the webpage |
|
199
|
requested is "timeline". |
|
200
|
</ol> |
|
201
|
<a id="cgivar"></a> |
|
202
|
|
|
203
|
The web server sets many environment variables in step 2 in addition |
|
204
|
to just PATH_INFO. The following diagram shows a few of these variables |
|
205
|
and their relationship to the request URL: |
|
206
|
|
|
207
|
<verbatim type="pikchr"> |
|
208
|
charwid = 0.075 |
|
209
|
thickness = 0 |
|
210
|
|
|
211
|
SCHEME: box "https://" mono fit |
|
212
|
DOMAIN: box "example.com" mono fit |
|
213
|
SCRIPT: box "/cgis/example2" mono fit |
|
214
|
PATH: box "/subdir/three/timeline" mono fit |
|
215
|
QUERY: box "?c=55d7e1" mono fit |
|
216
|
|
|
217
|
thickness = 0.01 |
|
218
|
|
|
219
|
DB: box at 0.3 below DOMAIN "HTTP_HOST" mono fit invis |
|
220
|
SB: box at 0.3 below SCRIPT "SCRIPT_NAME" mono fit invis |
|
221
|
PB: box at 0.3 below PATH "PATH_INFO" mono fit invis |
|
222
|
QB: box at 0.3 below QUERY "QUERY_STRING" mono fit invis |
|
223
|
RB: box at 0.5 above PATH "REQUEST_URI" mono fit invis |
|
224
|
|
|
225
|
color = lightgray |
|
226
|
|
|
227
|
box at SCHEME width SCHEME.width height SCHEME.height |
|
228
|
line fill 0x7799CC behind QUERY \ |
|
229
|
from SCRIPT.nw \ |
|
230
|
to RB.sw \ |
|
231
|
to RB.se \ |
|
232
|
to QUERY.ne \ |
|
233
|
close |
|
234
|
line fill 0x99CCFF behind DOMAIN \ |
|
235
|
from DOMAIN.nw \ |
|
236
|
to DOMAIN.sw \ |
|
237
|
to DB.n \ |
|
238
|
to DOMAIN.se \ |
|
239
|
to DOMAIN.ne \ |
|
240
|
close |
|
241
|
line fill 0xCCEEFF behind SCRIPT \ |
|
242
|
from SCRIPT.nw \ |
|
243
|
to SCRIPT.sw \ |
|
244
|
to SB.n \ |
|
245
|
to SCRIPT.se \ |
|
246
|
to SCRIPT.ne \ |
|
247
|
close |
|
248
|
line fill 0x99CCFF behind PATH \ |
|
249
|
from PATH.nw \ |
|
250
|
to PATH.sw \ |
|
251
|
to PB.n \ |
|
252
|
to PATH.se \ |
|
253
|
to PATH.ne \ |
|
254
|
close |
|
255
|
line fill 0xCCEEFF behind QUERY \ |
|
256
|
from QUERY.nw \ |
|
257
|
to QUERY.sw \ |
|
258
|
to QB.n \ |
|
259
|
to QUERY.se \ |
|
260
|
to QUERY.ne \ |
|
261
|
close |
|
262
|
</verbatim> |
|
263
|
|
|
264
|
<h2>Additional CGI Script Options</h2> |
|
265
|
|
|
266
|
The CGI script can have additional options used to fine-tune |
|
267
|
Fossil's behavior. See the [./cgi.wiki|CGI script documentation] |
|
268
|
for details. |
|
269
|
|
|
270
|
<h2>Additional Observations</h2> |
|
271
|
<ol type="I"> |
|
272
|
<li><p> |
|
273
|
Fossil does not distinguish between the various HTTP methods (GET, PUT, |
|
274
|
DELETE, etc). Fossil figures out what it needs to do purely from the |
|
275
|
webpage term of the URI.</p></li> |
|
276
|
<li><p> |
|
277
|
Fossil does not distinguish between query parameters that are part of the |
|
278
|
URI, application/x-www-form-urlencoded or multipart/form-data encoded |
|
279
|
parameter that are part of the POST content, and cookies. Each information |
|
280
|
source is seen as a space of key/value pairs which are loaded into an |
|
281
|
internal property hash table. The code that runs to generate the reply |
|
282
|
can then reference various properties values. |
|
283
|
Fossil does not care where the value of each property comes from (POST |
|
284
|
content, cookies, or query parameters) only that the property exists |
|
285
|
and has a value.</p></li> |
|
286
|
<li><p> |
|
287
|
The "[/help/ui|fossil ui]" and "[/help/server|fossil server]" commands |
|
288
|
are implemented using a simple built-in web server that accepts incoming HTTP |
|
289
|
requests, translates each request into a CGI invocation, then creates a |
|
290
|
separate child Fossil process to handle each request. In other words, CGI |
|
291
|
is used internally to implement "fossil ui/server". |
|
292
|
<br><br> |
|
293
|
SCGI is processed using the same built-in web server, just modified |
|
294
|
to parse SCGI requests instead of HTTP requests. Each SCGI request is |
|
295
|
converted into CGI, then Fossil creates a separate child Fossil |
|
296
|
process to handle each CGI request.</p></li> |
|
297
|
<li><p> |
|
298
|
Fossil is itself often launched using CGI. But Fossil can also then |
|
299
|
turn around and launch [./serverext.wiki|sub-CGI scripts to implement |
|
300
|
extensions].</p></li> |
|
301
|
</ol> |
|
302
|
|