Privoxy Overview

By: Scott Doenges - Revised: 2014-01-23 richard


Introduction

Privoxy is a web proxy with advanced filtering capabilities for protecting privacy, filtering web page content, managing cookies, controlling access, and removing ads, banners, pop-ups and other obnoxious Internet junk. Privoxy has a very flexible configuration and can be customized to suit individual needs and tastes. Privoxy has application for both stand-alone systems and multi-user networks.

Scott will give an overview of the software and how his department has implemented it for public access kiosks to control users web access.


What Is Privoxy?

Privoxy is a free multi-platform, open-source web proxy server with advanced filtering capabilities for protecting privacy, filtering web page content, managing cookies, controlling access, and removing ads, banners, pop-ups and other obnoxious junk. Privoxy has a very flexible configuration and can be customized to suit individual needs and tastes, and can be used for both stand-alone systems and multi-user networks. It is based on the Internet Junkbuster, which was discontinued in 1998. It can run on Mac OS X, many Linux variants, Debian, Windows, Solaris, OS/2, and AmigaOS.

Privoxy has far too many options and special capabilities to go into here, so I will just give a brief overview of its features and configuration details, and discuss some of the problems/workarounds involved. A much more detailed overview can be found in the official documentation.

Configuration & Features

We began using Privoxy in the Library as a solution for limiting internet access to specific sites on kiosk Macs throughout the library, since all other public computers began requiring users to log in to get full internet access. Privoxy is very straightforward to install, although it takes a lot of time to tweak it just the way you want it. The default installation has enabled tons of filters, ad and cookie blockers, etc - and you will need to adjust many of these to make it do what you want. But once you've got it customized it is very stable and easy to maintain.

Installation: Privoxy comes in a .pkg installer, and is installed in /Library/Privoxy, as well as an item in /Library/StartupItems to ensure it is turned on at system startup. You need to enter a few things in its primary settings file - the config file discussed below - before you can start using it as a proxy server.

Web Interface: You can access the web interface on a proxy-enabled machine using http://config.privoxy.org or the shortcut http://p.p. There you can view all of and edit some of your action/filter files and config file, toggle options on and off, and look up what actions/filters will be applied to a specific URL. Also note that you can simply edit the settings files with a text editor if you disable the web interface.


Settings Files

config: Privoxy's primary settings file. Here you configure the location of your Privoxy install, as well as where to keep the logfile, and which action and filter files to use. You can set different levels of debug to help troubleshoot problems using the logfile. You can also turn on/off the web interface toggle, which is a good security measure in a multi-user environment such as ours.

You also set which IP address and port number (default: 8118) that Privoxy will listen on - either the actual IP address for a multi-user proxy server, or simply the localhost address (127.0.0.1) for a local proxy server. Once you have configured the IP and port, you just put that info into the Proxies tab of your Network preferences:


And most importantly, the config file defines which URLs are allowed and denied, using the following format (use separate entries for each subnet):

To permit access to a site:
permit-access 155.97.16.0/24 www.database.com
permit-access 155.97.16.0/24 search.database.com
permit-access 155.97.16.0/24 images.database.com

To deny access to a site:
deny-access 155.97.16.0/24 www.dirty-porn.com

Since we need to allow access not only to Library pages, but also hundreds of database sites that the U subscribes to, our config file has gotten enormous. Some problems we continue to experience are databases that use redirects to other servers, or store images on different servers, or use a different server for searches, etc. We have to figure out every server that is hit for an individual database and add it to our config file for each of our subnets (a great utility for finding all servers used on a web page is DeepVacuum [link dead]). Privoxy will re-read the config file when it notices that changes have been made, which will cause a brief proxy outage while it looks up each of the addresses in the config file.

default.action: the list of actions that Privoxy will apply to URLs. By default this file has tons of actions enabled that you will need to experiment with. One problem I had is that Privoxy is set by default to block images if they are a certain size - i.e. the standard size of an advertising banner. This was blocking a legitimate image that happened to be the right size, so I had to track down which filter was doing this and disable it (it was the {+handle-as-image} action).

By default Privoxy will replace blocked images with a transparent white/grey grid image that is stretched to the size of the blocked image. You can customize what image to use as the blocker by changing the path in the following action (you can use file:// or http:// URLs):

+set-image-blocker{file:///Users/scott/Pictures/adblocker.gif}

Other default actions include the following action {+block}, which will block images, popups, and other annoying content from servers or paths specified below the action name. Most of the actions come with a bunch of common advertising sites already entered:

{+block}
.hitbox.com
www.the-gadgeteer.com/cgi-bin/getimage.cgi/
www.stern.de/bilder/poweredby
images.gmx.net/images/bs/
www.gmx.de/promo

If you try to go to a site that is specifically blocked in an action, Privoxy will display an error page that lets you "see why" (see what actions are applied) or "go there anyway" (bypass the actions and view the site):


Similarly, there is a {-block} action that will unblock any content from specified sites. Other examples of filters and actions are {+crunch-all-cookies} and {+kill-popups}, to name just a few of the many. You can toggle the effect of actions and filters with a + (enabled) or a - (disabled)

default.filter: this is where you tell Privoxy to rewrite actual content on web sites. For example, some of the databases that the Library subscribes to will not work with Privoxy via the standard means of entering URLs in the config file. In these cases we have Privoxy rewrite specific HTML before providing it to the users in order to route them through a transparent proxy server that the Library uses for off-campus access to the databases.

So the following filter simply says "replace any occurrence of the HTML between the 1st and 2nd pipes with whatever is between the 2nd and 3rd pipes". The HTML is then intercepted by Privoxy and rewritten when displayed to the client, so they are pointed to the correct address:

s|a href="http://isiknowledge.com"|a href="http://t-proxy.lib.utah.edu/login?url=http://isiknowledge.com"|sigU

logfile: Privoxy will put error and activity information here, depending on what debug levels you have set in your config file. Very useful to help troubleshoot problems with particular sites, and it's a good idea to look through it once in a while to see if users are trying to access valid sites that are not yet permitted in the config file.

templates: this folder contains all of the HTML templates that Privoxy uses to display errors and for the web interface. For example there are templates for: blocked, cgi-error-404, connect-failed, no-such-domain, etc. As a localhost proxy server these can be a very useful tool to figure out why your proxy server blocked it (using the "look up actions applied to a URL" function).

However you can customize these templates as you desire. For our purposes I simply customized one of them and replaced all of the other templates with my new one - so that no matter what error our users receive they will see the same message:


StartPrivoxy.command and StopPrivoxy.command: Terminal shell scripts that do exactly what their names suggest. Opening them will start or stop Privoxy via a Terminal window. Both will require you to type your admin password.

etc: There are tons more options, actions and filters that you will need to experiment with yourself to understand how Privoxy works. After 6 months I am still finding new combinations of options that show how flexible Privoxy is as a proxy server.

Conclusion

As you can see Privoxy is not terribly easy to use and configure. However it is a highly flexible and stable proxy server that solved our problem nicely.

Being an open-source project means that it depends on the developers having enough time and interest to fix bugs, add new features, and respond to questions on the Sourceforge forums. Over the past 6 months I have posted a few questions to the forums - one or two of them were answered pretty quickly either by a developer or another user, but I never got responses on others so I had to figure them out for myself.

Privoxy has not been updated on Sourceforge for over a year (the current version 3.0.3 was released in Jan. 2004), which may not be a good sign for the future. Hopefully Apple will not change something in a future OS release that breaks Privoxy...

But there aren't really any other full-featured proxy solutions out there for Mac OS X, so we will take what we can get (as long as it continues to work).

Links

www.privoxy.org - home page with FAQ, documentation, etc.
sourceforge.net - project home page with support forum, etc.