"I'm not proud of being a congenital pain in the ass. But I will take money for it."

Selective HTTP proxy through a firewall

Tue 02 April 2019 | -- (permalink)

Somewhat common situation: your $DAYJOB has a lab, that lab is behind a firewall, there's some internal web service in that lab you need to get to, your normal work environment is a laptop thousands of kilometers away from that lab, the company's preferred VPN software is a piece of junk that breaks half the applications you need to run on your laptop, and you never much liked VPNs anyway. You've managed to negotiate a locked-down path that lets you ssh into the lab, which is fine for terminal windows, git, and so forth, but how do you get to that web service? Some kind of HTTP proxy, but how do you do it without breaking your brower's ability to get to anything else?

Well, one obvious and simple possibility is just to ignore the issue and route all web traffic through a web proxy in the lab. But maybe the lab's external connectivity is often worse than your own. So you really only want to proxy traffic that's intended for the lab.

Some browsers include a selective proxy rule mechanism, but unfortunately the envisioned use case is backwards -- they assume that you're on the inside looking out, and that you want to proxy everything except some small enumerable set of sites. Oops.

Fortunately, there's a mechanism that does work, and Firefox (at least) supports it.

Choose your own components, but the solution I came up with here involves three pieces:

  • A tinyproxy instance running inside the lab;
  • An ssh tunnel from laptop through the firewall into the lab; and
  • A "proxy autoconfiguration" script running locally on the laptop.

See https://en.wikipedia.org/wiki/Proxy_auto-config for an introduction to proxy autoconfiguration. The mechanism is general enough that you could do all sorts of dangerous things with it, but a local proxy.pac on your laptop should be safe enough if you're careful.

First, install tinyproxy on some server inside the lab. We'll call that server proxy.lab.example.com. The default port of 8888 is fine unless you have some reason to avoid it, but if you're paranoid you probably want to change the listener address in /etc/tinyproxy.conf to 127.0.0.1 so that tinyproxy is only listening for loopback traffic:

Listen 127.0.0.1

Next, set up an ssh tunnel. If you always want this to come up automatically you can use some tool like autossh, but for the moment we'll use plain old ssh and assume that you start the tunnel manually when you want it:

ssh -f -N -L8888:127.0.0.1:8888 proxy.lab.example.com

Finally, write a proxy.pac and configure Firefox to use it (this probably works with other browsers, but I haven't investigated). We'll assume here that you can easily identify everything you want to go through the proxy simply by DNS name:

function FindProxyForURL(url, host)
{
  if (shExpMatch(host, "*.lab.example.com"))
    return "PROXY 127.0.0.1:8888; DIRECT";
  else
    return "DIRECT";
}

The intent here is that Firefox tries to use the proxy for anything with a DNS name ending in .lab.example.com, falling back to attempting a direct connection if using the proxy fails; for anything else, Firefox just uses a direct connection immediately, bypassing the proxy entirely.

See https://support.mozilla.org/en-US/kb/connection-settings-firefox for details on setting proxy configuration in Firefox. For this hack, you want to set an "automatic proxy configuration URL": assuming you've put the above code in /home/myname/proxy.pac, that URL would be file:///home/myname/proxy.pac. You can change the name if you like (eg, if you don't like visible clutter you might prefer .proxy.pac, or .proxy-past-annoying-lab-firewall-bwahaha.js), just make sure you use an absolute file path URL (file:///...).

Assuming you got all of this right, it should now Just Work.

Footnote, three years later: tinyproxy, quite reasonably, restricts use of the proxy protocol's CONNECT verb, but for this particular use case, once you've locked down the proxy so it's only reachable via authorized tunnels, you might want to remove the CONNECT restrictions so that you can use $DAYJOB web applications that want to open HTTPS connections on funny ports. See the ConnectPort directive in the tinyproxy manual for details.