Fossil SCM

fossil-scm / www / aboutcgi.wiki
1
<title>How CGI Works In Fossil</title>
2
3
<h2>Introduction</h2>
4
5
CGI or "Common Gateway Interface" is a venerable yet reliable technique for
6
generating dynamic web content. This article gives a quick background on how
7
CGI works and describes how Fossil can act as a CGI service.
8
9
This is a "how it works" guide. This document provides background
10
information on the CGI protocol so that you can better understand what
11
is going on behind the scenes. If you just want to set up Fossil
12
as a CGI server, see the [./server/ | Fossil Server Setup] page. Or
13
if you want to develop CGI-based extensions to Fossil, see
14
the [./serverext.wiki|CGI Server Extensions] page.
15
16
<h2>A Quick Review Of CGI</h2>
17
18
An HTTP request is a block of text that is sent by a client application
19
(usually a web browser) and arrives at the web server over a network
20
connection. The HTTP request contains a URL that describes the information
21
being requested. The URL in the HTTP request is typically the same URL
22
that appears in the URL bar at the top of the web browser that is making
23
the request. The URL might contain a "?" character followed by
24
query parameters. The HTTP will usually also contain other information
25
such as the name of the application that made the request, whether or
26
not the requesting application can accept a compressed reply, POST
27
parameters from forms, and so forth.
28
29
The job of the web server is to interpret the HTTP request and formulate
30
an appropriate reply.
31
The web server is free to interpret the HTTP request in any way it wants,
32
but most web servers follow a similar pattern, described below.
33
(Note: details may vary from one web server to another.)
34
35
Suppose the filename component of the URL in the HTTP request looks like this:
36
37
<pre>/one/two/timeline/four</pre>
38
39
Most web servers will search their content area for files that match
40
some prefix of the URL. The search starts with <b>/one</b>, then goes to
41
<b>/one/two</b>, then <b>/one/two/timeline</b>, and finally
42
<b>/one/two/timeline/four</b> is checked. The search stops at the first
43
match.
44
45
Suppose the first match is <b>/one/two</b>. If <b>/one/two</b> is an
46
ordinary file in the content area, then that file is returned as static
47
content. The "<b>/timeline/four</b>" suffix is silently ignored.
48
49
If <b>/one/two</b> is a CGI script (or program), then the web server
50
executes the <b>/one/two</b> script. The output generated by
51
the script is collected and repackaged as the HTTP reply.
52
53
Before executing the CGI script, the web server will set up various
54
environment variables with information useful to the CGI script:
55
<table>
56
<tr><th>Variable<th>Meaning
57
<tr><td>GATEWAY_INTERFACE<td>Always set to "CGI/1.0"
58
<tr><td>REQUEST_URI
59
<td>The input URL from the HTTP request.
60
<tr><td>SCRIPT_NAME
61
<td>The prefix of the input URL that matches the CGI script name.
62
In this example: "/one/two".
63
<tr><td>PATH_INFO
64
<td>The suffix of the URL beyond the name of the CGI script.
65
In this example: "timeline/four".
66
<tr><td>QUERY_STRING
67
<td>The query string that follows the "?" in the URL, if there is one.
68
</table>
69
70
There are other CGI environment variables beyond those listed above.
71
Many Fossil servers implement the
72
[https://fossil-scm.org/home/test-env/two/three?abc=xyz|test-env]
73
webpage that shows some of the CGI environment
74
variables that Fossil pays attention to.
75
76
In addition to setting various CGI environment variables, if the HTTP
77
request contains POST content, then the web server relays the POST content
78
to standard input of the CGI script.
79
80
In summary, the task of the
81
CGI script is to read the various CGI environment variables and
82
the POST content on standard input (if any), figure out an appropriate
83
reply, then write that reply on standard output.
84
The web server will read the output from the CGI script, reformat it
85
into an appropriate HTTP reply, and relay the result back to the
86
requesting application.
87
The CGI script exits as soon as it generates a single reply.
88
The web server will (usually) persist and handle multiple HTTP requests,
89
but a CGI script handles just one HTTP request and then exits.
90
91
The above is a rough outline of how CGI works.
92
There are many details omitted from this brief discussion.
93
See other on-line CGI tutorials for further information.
94
95
<h2>How Fossil Acts As A CGI Program</h2>
96
97
An appropriate CGI script for running Fossil will look something
98
like the following:
99
100
<pre>
101
#!/usr/bin/fossil
102
repository: /home/www/repos/project.fossil
103
</pre>
104
105
The first line of the script is a
106
"[https://en.wikipedia.org/wiki/Shebang_%28Unix%29|shebang]"
107
that tells the operating system what program to use as the interpreter
108
for this script. On unix, when you execute a script that starts with
109
a shebang, the operating system runs the program identified by the
110
shebang with a single argument that is the full pathname of the script
111
itself.
112
In our example, the interpreter is Fossil, and the argument might
113
be something like "/var/www/cgi-bin/one/two" (depending on how your
114
particular web server is configured).
115
116
The Fossil program that is run as the script interpreter
117
is the same Fossil that runs when
118
you type ordinary Fossil commands like "fossil sync" or "fossil commit".
119
But in this case, as soon as it launches, the Fossil program
120
recognizes that the GATEWAY_INTERFACE environment variable is
121
set to "CGI/1.0" and it therefore knows that it is being used as
122
CGI rather than as an ordinary command-line tool, and behaves accordingly.
123
124
When Fossil recognizes that it is being run as CGI, it opens and reads
125
the file identified by its sole argument (the file named by
126
<code>argv&#91;1&#93;</code>). In our example, the second line of that file
127
tells Fossil the location of the repository it will be serving.
128
Fossil then starts looking at the CGI environment variables to figure
129
out what web page is being requested, generates that one web page,
130
then exits.
131
132
Usually, the webpage being requested is the first term of the
133
PATH_INFO environment variable. (Exceptions to this rule are noted
134
in the sequel.) For our example, the first term of PATH_INFO
135
is "timeline", which means that Fossil will generate
136
the [/help/www/timeline|/timeline] webpage.
137
138
With Fossil, terms of PATH_INFO beyond the webpage name are converted into
139
the "name" query parameter. Hence, the following two URLs mean
140
exactly the same thing to Fossil:
141
<ol type='A'>
142
<li> [https://fossil-scm.org/home/info/c14ecc43]
143
<li> [https://fossil-scm.org/home/info?name=c14ecc43]
144
</ol>
145
146
In both cases, the CGI script is called "/fossil". For case (A),
147
the PATH_INFO variable will be "info/c14ecc43" and so the
148
"[/help/www/info|/info]" webpage will be generated and the suffix of
149
PATH_INFO will be converted into the "name" query parameter, which
150
identifies the artifact about which information is requested.
151
In case (B), the PATH_INFO is just "info", but the same "name"
152
query parameter is set explicitly by the URL itself.
153
154
<h2>Serving Multiple Fossil Repositories From One CGI Script</h2>
155
156
The previous example showed how to serve a single Fossil repository
157
using a single CGI script.
158
On a website that wants to serve multiple repositories, one could
159
simply create multiple CGI scripts, one script for each repository.
160
But it is also possible to serve multiple Fossil repositories from
161
a single CGI script.
162
163
If the CGI script for Fossil contains a "directory:" line instead of
164
a "repository:" line, then the argument to "directory:" is the name
165
of a directory that contains multiple repository files, each ending
166
with ".fossil". For example:
167
168
<pre>
169
#!/usr/bin/fossil
170
directory: /home/www/repos
171
</pre>
172
173
Suppose the /home/www/repos directory contains files named
174
<b>one.fossil</b>, <b>two.fossil</b>, and <b>subdir/three.fossil</b>.
175
Further suppose that the name of the CGI script (relative to the root
176
of the webserver document area) is "cgis/example2". Then to
177
see the timeline for the "three.fossil" repository, the URL would be:
178
179
<pre>
180
http://example.com/cgis/example2/subdir/three/timeline
181
</pre>
182
183
Here is what happens:
184
<ol>
185
<li> The input URI on the HTTP request is
186
<b>/cgis/example2/subdir/three/timeline</b>
187
<li> The web server searches prefixes of the input URI until it finds
188
the "cgis/example2" script. The web server then sets
189
PATH_INFO to the "subdir/three/timeline" suffix and invokes the
190
"cgis/example2" script.
191
<li> Fossil runs and sees the "directory:" line pointing to
192
"/home/www/repos". Fossil then starts pulling terms off the
193
front of the PATH_INFO looking for a repository. It first looks
194
at "/home/www/resps/subdir.fossil" but there is no such repository.
195
So then it looks at "/home/www/repos/subdir/three.fossil" and finds
196
a repository. The PATH_INFO is shortened by removing
197
"subdir/three/" leaving it at just "timeline".
198
<li> Fossil looks at the rest of PATH_INFO to see that the webpage
199
requested is "timeline".
200
</ol>
201
<a id="cgivar"></a>
202
203
The web server sets many environment variables in step 2 in addition
204
to just PATH_INFO. The following diagram shows a few of these variables
205
and their relationship to the request URL:
206
207
<verbatim type="pikchr">
208
charwid = 0.075
209
thickness = 0
210
211
SCHEME: box "https://" mono fit
212
DOMAIN: box "example.com" mono fit
213
SCRIPT: box "/cgis/example2" mono fit
214
PATH: box "/subdir/three/timeline" mono fit
215
QUERY: box "?c=55d7e1" mono fit
216
217
thickness = 0.01
218
219
DB: box at 0.3 below DOMAIN "HTTP_HOST" mono fit invis
220
SB: box at 0.3 below SCRIPT "SCRIPT_NAME" mono fit invis
221
PB: box at 0.3 below PATH "PATH_INFO" mono fit invis
222
QB: box at 0.3 below QUERY "QUERY_STRING" mono fit invis
223
RB: box at 0.5 above PATH "REQUEST_URI" mono fit invis
224
225
color = lightgray
226
227
box at SCHEME width SCHEME.width height SCHEME.height
228
line fill 0x7799CC behind QUERY \
229
from SCRIPT.nw \
230
to RB.sw \
231
to RB.se \
232
to QUERY.ne \
233
close
234
line fill 0x99CCFF behind DOMAIN \
235
from DOMAIN.nw \
236
to DOMAIN.sw \
237
to DB.n \
238
to DOMAIN.se \
239
to DOMAIN.ne \
240
close
241
line fill 0xCCEEFF behind SCRIPT \
242
from SCRIPT.nw \
243
to SCRIPT.sw \
244
to SB.n \
245
to SCRIPT.se \
246
to SCRIPT.ne \
247
close
248
line fill 0x99CCFF behind PATH \
249
from PATH.nw \
250
to PATH.sw \
251
to PB.n \
252
to PATH.se \
253
to PATH.ne \
254
close
255
line fill 0xCCEEFF behind QUERY \
256
from QUERY.nw \
257
to QUERY.sw \
258
to QB.n \
259
to QUERY.se \
260
to QUERY.ne \
261
close
262
</verbatim>
263
264
<h2>Additional CGI Script Options</h2>
265
266
The CGI script can have additional options used to fine-tune
267
Fossil's behavior. See the [./cgi.wiki|CGI script documentation]
268
for details.
269
270
<h2>Additional Observations</h2>
271
<ol type="I">
272
<li><p>
273
Fossil does not distinguish between the various HTTP methods (GET, PUT,
274
DELETE, etc). Fossil figures out what it needs to do purely from the
275
webpage term of the URI.</p></li>
276
<li><p>
277
Fossil does not distinguish between query parameters that are part of the
278
URI, application/x-www-form-urlencoded or multipart/form-data encoded
279
parameter that are part of the POST content, and cookies. Each information
280
source is seen as a space of key/value pairs which are loaded into an
281
internal property hash table. The code that runs to generate the reply
282
can then reference various properties values.
283
Fossil does not care where the value of each property comes from (POST
284
content, cookies, or query parameters) only that the property exists
285
and has a value.</p></li>
286
<li><p>
287
The "[/help/ui|fossil ui]" and "[/help/server|fossil server]" commands
288
are implemented using a simple built-in web server that accepts incoming HTTP
289
requests, translates each request into a CGI invocation, then creates a
290
separate child Fossil process to handle each request. In other words, CGI
291
is used internally to implement "fossil ui/server".
292
<br><br>
293
SCGI is processed using the same built-in web server, just modified
294
to parse SCGI requests instead of HTTP requests. Each SCGI request is
295
converted into CGI, then Fossil creates a separate child Fossil
296
process to handle each CGI request.</p></li>
297
<li><p>
298
Fossil is itself often launched using CGI. But Fossil can also then
299
turn around and launch [./serverext.wiki|sub-CGI scripts to implement
300
extensions].</p></li>
301
</ol>
302

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button