Caching authenticated downloads with Varnish and VCL

Imagine the following scenario: You have an web application that delivers data, you have a huge number of requests per second, the data is identical for all users, but you want to limit access to authenticated users. Your back end application is slow and does not cache the created data. You also want to lighten the load on the authentication service. Fixing your back end application or your authentication service unfortunately is not an option. I wanted to solve this problem with Varnish and a VCL only solution, so no inline C or Varnish modules are required for my approach.

In my example the authenticated URLs have the following basic form:

/protected/file.ext?auth=5283f514f51511ca3e73f76ab7e8e96
It's also possible that there are other query string parameters, e.g.:
/protected/file.ext?id=xyz&auth=5283f514f51511ca3e73f76ab7e8e96&p=12345
The important bit is the auth query parameter, which contains an authentication token.

First I define a backend in the VCL, which hosts the authentication service:

backend auth {
  .host = "auth.neovatar.org";
  .port = "80";
}

In vcl_recv I need to extract the auth parameter value. I also need to create the URL used for retrieving the requested content. This is basically the original URL with the auth parameter removed:

sub vcl_recv {
  if (req.http.host == "blog.neovatar.org") {
    unset req.http.cookie;
    if (req.url ~ "^/protected/") {
      if (req.restarts == 0) {

        std.log("org.req.url:" + req.url);

        # get auth parameter value from query string
        set req.http.x-auth-token = regsub(req.url,
          "^.*(\?|&)auth=([A-Fa-f0-9]*)(&{0,1}.*|$)",
          "\2");

        # remove `auth` parameter from query string and
        # if there is no query string after removal, remove trailing `?`
        set req.http.x-orig-url = regsub(req.url,
          "(\?{0,1})(&{0,1})auth=[A-Fa-f0-9]*(&.*|$)",
          "\1\3");
        set req.http.x-orig-url = regsub(req.http.x-orig-url, "\?$", "");

        # set url to query auth backend
        set req.url = "/auth.php?auth=" + req.http.x-auth-token;
        set req.backend = auth;
        set req.http.host = "auth.neovatar.org";
        unset req.http.cookie;
      }
    }
  }
}

The interesting part in this VCL code are the regular expressions. This snippet extracts the value of the auth parameter and sets the HTTP header x-auth-token:

        set req.http.x-auth-token = regsub(req.url,
          "^.*(\?|&)auth=([A-Fa-f0-9]*)(&{0,1}.*|$)",
          "\2");

Similar to this the following expression stores the URL that points to the actual content in the HTTP header x-orig-url for later use:

        set req.http.x-orig-url = regsub(req.url,
          "(\?{0,1})(&{0,1})auth=[A-Fa-f0-9]*(&.*|$)",
          "\1\3");

Then I set the actual request URL to the authentication URL and append the auth token:

        set req.url = "/auth.php?auth=" + req.http.x-auth-token;
        set req.backend = auth;
        set req.http.host = "auth.neovatar.org";

Now varnish will do a backend request to the authentication service URL instead of the original requested URL. The service will return a HTTP 200 response if the auth token is valid and a HTTP 403 when the token is invalid. On positive authentication, I want to deliver the requested content and not the response of the authentication service. So I have to check for this in vcl_deliver:

sub vcl_deliver {
  if (req.backend == auth) {
    if (resp.status != 200) {
    }
    else {
      set req.url = req.http.x-orig-url;
      set req.backend = default;
      set req.http.host = "blog.neovatar.org";
      return(restart);
    }
  }
  if (obj.hits > 0) {
    set resp.http.X-Cache = "HIT:" + obj.hits;
  }
  else {
    set resp.http.X-Cache = "MISS";
  }

  unset resp.http.server;
  unset   resp.http.Age;
  unset   resp.http.X-Varnish;
  unset   resp.http.Via;

  return( deliver );
}

If the auth request returned an HTTP 200 status, then I know the authentication was successfull. So I set the URL to the content URL which I saved to http.x-orig-url earlier. Following this, I call restart, thus telling the varnish thread to start processing the request again from the beginning:

if (req.backend == auth) {
    if (resp.status != 200) {
    }
    else {
      set req.url = req.http.x-orig-url;
      set req.backend = default;
      set req.http.host = "blog.neovatar.org";
      return(restart);
    }
  }

Now, take a look again at my vcl_recv code snippet, there is one important bit I did not mention earlier:

    if (req.url ~ "^/protected/") {
      if (req.restarts == 0) {
        set req.backend = auth;

I check for the number of restarts, so the authentication request is only done on first pass. On subsequent passes the URL is fetched and since I changed the URL to the actual content URL, it is requested and delivered. And with the right code in vcl_fetch I can cache requests to content and to the auth service:

sub vcl_fetch {
  if (req.http.host == "blog.neovatar.org") {
    # remove cookies, so that blog is cacheable
    unset beresp.http.set-cookie;

    # enforce 1m cache time
    if (beresp.ttl <= 5m ) {
      set beresp.ttl = 5m;
    }
  }
  else if (req.http.host == "auth.neovatar.org") {
    set beresp.ttl = 1m;
  }
}

This will cache content for 5 minutes and authentication queries for 1 minute.