Discussion:
[fossil-users] Support for HTTP Proxies
D. Richard Hipp
2008-05-05 23:38:51 UTC
Permalink
The latest version of fossil now support HTTP proxies so that you can
use it from behind restrictive firewalls. There are two ways to
enable a proxy:

fossil setting proxy http://proxy.firewall.bigcorp.com:8123/

or

export http_proxy=http://proxy.firewall.bigcorp.com:8123/

If the two settings conflict, the "fossil setting" overrides the
environment variable. To turn the proxy back off and begin using
direct HTTP again, you do:

fossil setting proxy off
unset http_proxy

QUESTION:

The above all works great as long as you are *always* going thru the
proxy. But sometimes you want to be able use the proxy for external
servers but you need to go direct for internal servers. The problem
with that I don't know how to code fossil to tell the difference.
I've though about adding command-line options:

fossil sync http://internal/bigproject --no-proxy

That would, of course, require the user to type the "--no-proxy"
option every time they visit an internal server. But I suppose that
is easier than:

fossil setting proxy off
fossil sync http://internal/bigproject
fossil setting proxy http://proxy.firewall.bigcorp.com:8123/

Does anybody have any ideas on how I can make accessing an internal
server easier when there is an HTTP proxy configured?

The "settings" configured by the "fossil setting" command can be
either "global" or "local". "Global" settings apply everywhere.
"Local" settings apply only to a single local repository that contains
the setting. "Local" settings override "global" settings. So one
possible solution is that a user could specify different proxies for
use by different local repositories. If one project always synced
against an internal server and another project always synced against
an external server, then the first project could specify "proxy off"
and the second could specify the real proxy and things would always
work. But if a single project sometimes syncs against both internal
and external servers, it could get to be irritating to have to
constantly flip the proxy setting. Ideas on how to better deal with
this are appreciated.

D. Richard Hipp
***@hwaci.com
Kevin Kenny
2008-05-06 02:13:26 UTC
Permalink
Post by D. Richard Hipp
fossil setting proxy off
fossil sync http://internal/bigproject
fossil setting proxy http://proxy.firewall.bigcorp.com:8123/
Does anybody have any ideas on how I can make accessing an internal
server easier when there is an HTTP proxy configured?
The usual next step is to have an environment variable, 'no_proxy',
which consists of a set of glob patterns (separated by semicolons,
commas, or blanks) matching the names of the 'internal' hosts that
are not to be passed to the proxy server.

The next step after that (implemented in Tcllib's file,
'modules/http/autoproxy.tcl') is to store these things in the
Registry on Windows; there are defined keys that IE uses, and
a quick look at autoproxy.tcl will show what they are.

All of this seems to be leading up to reinventing the
'proxy automatic configurator' file,
http://en.wikipedia.org/wiki/Proxy_auto-config
As you can see, this is a nasty file format, since
it's actually Javascript code that accepts a hostname
and tells you what to do with it. Nevertheless, both
IE and Mozilla support it, and it's the usual thing in
use on corporate networks (which may have internal
firewalls as well as walls between them and the outside
world).

Some sites also use (and most browsers attempt) WPAD -
Web Proxy AutoDiscovery -
http://en.wikipedia.org/wiki/Web_Proxy_Autodiscovery_Protocol
It's even nastier, and like PAC it has security implications.

My guess is that 'no_proxy' plus the Windows registry
keys will suffice for "almost all" users (I routinely
switch my browser back and forth between manual and
automatic configuration several times in a typical day).
PAC might be doable using some open-source JavaScript
interpreter like SpiderMonkey.

I really wish I had something nicer to offer.
The 'pacparser' library (http://code.google.com/p/pacparser/)
provides a nice C and Python API, but it may have
license incompatibilities (it's LGPL) and it has a
dependency on SpiderMonkey.

--
73 de ke9tv/2, Kevin
Stephan Beal
2008-05-06 15:43:02 UTC
Permalink
Post by D. Richard Hipp
The "settings" configured by the "fossil setting" command can be
either "global" or "local". "Global" settings apply everywhere.
"Local" settings apply only to a single local repository that contains
the setting. "Local" settings override "global" settings. So one
possible solution is that a user could specify different proxies for
use by different local repositories.
IMO that is the "most correct" decision, philosophically speaking. i would
personally expect any proxy-related settings to be per-sandbox (per local
Post by D. Richard Hipp
if a single project sometimes syncs against both internal
and external servers, it could get to be irritating to have to
constantly flip the proxy setting.
i think that's a corner case which probably won't happen too often. And when
it does it's a simple (though admittedly annoying) matter to drop a few
scripts in your source tree which pass on the proper proxy params.

i may be off base here, but my prediction is that fossil is in a good
position to sweep the market for smaller/personal projects (mainly because
of its ability to run as a CGI (that's why i use it)), but that it won't be
used much for huge projects (i can't see it scaling to a project the size of
OpenOffice, Mozilla, or an OS kernel). Normally only the largest projects
have multiple SCM servers, and the developers on such projects tend to be
technically capable enough to deal with minor annoyance such as HTTP
proxying.
Post by D. Richard Hipp
PAC might be doable using some open-source
JavaScript interpreter like SpiderMonkey.
Unfortunately, there are only 2 open source JS engines which are currently
usable: SpiderMonkey and Rhino. Rhino is Java, so it can't be used directly
with Fossil, but SpiderMonkey is doable. That said, it's a pretty big
dependency but it could also be packages inside the of the Fossil source
tree (i do that for the SpiderApe project: http://SpiderApe.sf.net). There's
another JS engine coming up (can't remember the name at the moment, but IIRC
it's from Adobe?), but last i checked (a few months ago) it wasn't ready for
use (or the source wasn't available for d/l... can't remember).

All of that said, adding yet another interpreter to Fossil is probably
overkill, especially if its only use is to support HTTP proxying. It might
be interesting to consider Monkey for other scripting purposes, however. For
example, as part of the SpiderApe project i have implemented an essentially
complete sqlite3 wrapper, and sqlite3 is of course the basis for Fossil.
Such a fusion with Fossil could allow implementing much of the Fossil
functionality in JavaScript.
(See http://spiderape.sourceforge.net/plugins/sqlite/)

Then again, if Fossil were refactored to be a library it would probably be
very little work to generate all kinds of script bindings with SWIG (
http://www.swig.org/).
Post by D. Richard Hipp
The 'pacparser' library (http://code.google.com/p/pacparser/)
provides a nice C and Python API, but it may have
license incompatibilities (it's LGPL) and it has a
dependency on SpiderMonkey.
SpiderMonkey itself has a triple license: MPL/GPL2/LGPL2.1, so it poses no
licensing problem vis-a-vis Fossil.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
Kevin Kenny
2008-05-06 23:52:10 UTC
Permalink
Post by D. Richard Hipp
if a single project sometimes syncs against both internal
and external servers, it could get to be irritating to have to
constantly flip the proxy setting.
i think that's a corner case which probably won't happen too often. And
when it does it's a simple (though admittedly annoying) matter to drop a
few scripts in your source tree which pass on the proper proxy params.
As long as we don't lose the ability to flip proxies reasonably simply,
I'm fine with this. But I do flip proxies regularly, because I develop
on a laptop (it's nice to be able to work aboard a train), and I
sometimes need to sync with a different proxy, or with no proxy at all.
Simply supporting 'http_proxy' or 'fossil settings proxy' solves 90+%
of my problem - because I can script things from there; I was posting
my original message chiefly to reveal how big a can of worms Richard
was opening.
--
73 de ke9tv/2, Kevin
D. Richard Hipp
2008-05-07 12:07:33 UTC
Permalink
Post by Kevin Kenny
Post by D. Richard Hipp
if a single project sometimes syncs against both internal
and external servers, it could get to be irritating to have to
constantly flip the proxy setting.
i think that's a corner case which probably won't happen too often. And
when it does it's a simple (though admittedly annoying) matter to drop a
few scripts in your source tree which pass on the proper proxy params.
As long as we don't lose the ability to flip proxies reasonably simply,
I'm fine with this....
I added the --proxy option to the sync, push, pull, and clone commands
while waiting on a flight yesterday. (But as I write this it occurs
to me that I probably also ought to add it to update and commit in
case the autosync option is enabled.) The command-line argument
overrides any "setting" value or environment variable. So, for
example, kbk might have a setting or environment variable for the GE
proxy server. But he can still sync while connected via starbucks
wifi by adding "--proxy off" to the command line.

D. Richard Hipp
***@hwaci.com

Andreas Kupries
2008-05-06 15:44:16 UTC
Permalink
Post by D. Richard Hipp
The above all works great as long as you are *always* going thru the
proxy. But sometimes you want to be able use the proxy for external
servers but you need to go direct for internal servers. The problem
with that I don't know how to code fossil to tell the difference.
fossil sync http://internal/bigproject --no-proxy
That would, of course, require the user to type the "--no-proxy"
option every time they visit an internal server. But I suppose that
fossil setting proxy off
fossil sync http://internal/bigproject
fossil setting proxy http://proxy.firewall.bigcorp.com:8123/
Does anybody have any ideas on how I can make accessing an internal
server easier when there is an HTTP proxy configured?
Well, not quite the same as what Kevin said, how about associated proxy
settings with server urls

fossil setting proxy http://proxy.firewall.bigcorp.com:8123/
fossil setting proxy off for http://internal/bigproject

In this example we have a general catch all proxy setting activating a proxy
for pattern *. However should your partner be http://internal/bigproject
then use no proxy at all.

In essence each proxy setting has a pattern associated with it which tells
fossil for which servers to apply the setting. Need some kind of ordering,
and rule that either first or last matching is taken, that however can be
sorted out.

--
Andreas Kupries <***@ActiveState.com>
Developer @ http://www.ActiveState.com
Tel: +1 778-786-1122
Loading...