Directory traversal attacks

As part of programming a Gemini server, I’m dealing with some classic problems, such as directory traversal attacks:

A directory traversal (or path traversal) attack exploits insufficient security validation or sanitization of user-supplied file names, such that characters representing “traverse to parent directory” are passed through to the operating system’s file system API.

For example, if the content we are serving is in /var/gemini/, our server should only serve content from that directory. In other words, the following request should be illegal:

gemini://server.local/../../etc/passwd

That file should never be served, of course.

There are different ways of preventing this type of attack, the most common is:

Get the root of our directory, in this example /var/gemini/.
Add the requested path, and notmalize the result; basically removing relative path components (for example, the ..). In this example we would have:

/var/gemini/../../etc/passwd -> /etc/passwd

If the resulting path doesn’t begin with the root directory, then the path is invalid.

It is very straightforward. But there’s one thing I don’t like: it leaks directory information out of the root directory.

For example, let’s say this URL is valid:

gemini://server.local/../../var/gemini/

It translates to exactly the root directory, and that’s a valid path. Which I guess is not that important, but besides the fact that the server is leaking the root directory path, that URL can’t be easily normalized and we could have multiple valid URLs for the same directory.

So I thought of an alternative way, that detects that URL as invalid, and all the valid URLs are easy to normalize via a redirect.

The basic idea is to split the path by the / separator. Then calculate for each path component a value based on:

.. goes back one directory: value -1.
. is the same directory, so no change: value 0.
Any other component: value 1 (as we go forward one directory).

If the value for a path component is less than zero, we can say that the path is illegal and we can happily return a bad request response.

Let’s show some Scala code:

def validPath(path: String): Boolean =
    !path
      .split('/')
      .drop(1)
      .foldLeft(List(0)) {
        case (acc, "..") => acc.appended(acc.last - 1)
        case (acc, ".")  => acc.appended(acc.last)
        case (acc, _)    => acc.appended(acc.last + 1)
      }
      .exists(_ < 0)

This code is called with the path already decoded from the URL, and the maximum length has been checked as well.

If the URL is valid, as in we never “went out of the root directory”, we can continue, and check as well if the URL is not normalized and redirect to the normalized version.

Now we can safely normalize and redirect. For example, all the following requests are redirected to the same path (the root document):

gemini://server.local/./
gemini://server.local/directory/../
gemini://server.local/directory/another/../../

I found it all very interesting, despite being an old problem to solve.