Security 101: Securing File Downloads - Timo Zimmermann
Serving the content back to the user (or others) often is handled by returning the URL to the file. What is oftentimes missing is proper authentication and authorization, as engineers seem to believe no one will leak URLs, run enumeration attacks or simply try random strings. This is not just a data breach waiting to happen, it is one happening way too often.
In this post we will look at three options how this can be solved. The examples which you can find in the demo repository are written in Python, using Django. All three should work just fine in basically any modern language and framework used for web development, and with most web servers and reverse proxies such as Nginx. I am using Caddy, as the configuration is concise and simple to follow.
For all examples you can upload a file via Django Admin and browse and download the files by visiting /.
All examples only check if the user is authenticated. In a real system you will most likely want to extend the check a bit, to make sure the user is actually authorised to view the file they are trying to download.
Option 1 - Application Server
This is by far the easiest implementation, with the least moving parts. The downside is that you will block one of your application servers workers while the client is downloading the file. For extremely small setups running enough servers (via gunicorn or something similar) might be fine, but the moment you get more than a dozen users, this approach might bring your system to a halt.
The basic idea is you read the content of the file, assign it to HttpResponse.content and set the file name. That is it, your users can now download files.
Your application server logs will show you the request and response for browsing files, and the one for downloading data. In this case a 46kb JPEG.
Option 2 - X-Accel
It might be a shock to some of you, but people still run applications on servers or single virtual servers. They even store data locally on a disk. Not everyone migrated to AWS, GCP or Azure yet. This is actually a shockingly cheap and performant solution.
If you pursue this way, you likely have a reverse proxy setup terminating SSL and forwarding requests to your application server. Webservers are shockingly efficient in serving files and are doing a way better job than single threaded application servers. Even for applications which are able to handle the load theoretically, letting your webserver or proxy deal with files and keeping resources free to run your actual business logic is surely a welcome idea.
To make this happen we build a response as before. But this time we set the X-Accel-Redirect header, instead of adding the actual data to serve pointing to the file relative to our media directory. Our proxy will recognise the header and serve the file specified.
As a reverse proxy I am becoming a big fan of Caddy. Extreme mining is fast to configure, scalable enough, and I can actually write a configuration without studying the documentation for nearly every line I type.
We run the proxy on port 8080 and forward requests to 8000, Djangos default port for the development server. If the response contains the X-Accel-Redirect header, we serve the file from our media directory.
Our application server shows zero bytes for the response body.
Our application server on the other hand shows the actual content being served for the path /1.
If we call the URL without being signed in, we get the 500 permission error we would expect.
Option 3 - Pre-signed URLs
Using some form of object storage like AWS S3 is an easy and cheap way to store lots of data. What I have seen far to often though is people configuring their object storage to be world readable, to directly serve files from it. For truly public content this might be okay, but most data being stored in world readable buckets is not actually meant for public consumption.
Configuring your bucket as private or "not world readable" is a good start, but also means you have to get the data out of it somehow when you need to download it. You could proxy all data through your application server. Or you generate pre-signed URLs with a short expiration time.
Our example allows initiating a download for five seconds after the URL was generated. While this is still not peak security, it is far better than having your bucket world readable. Depending on the object storage, you might be able to generate true one time use URLs, or might be able to generate temporary users with access to exactly one specific file.
For this example you might want to create a new IAM user with GetObject and PutObject permissions for the S3 bucket you are going to use.
You might go one step further and let your application server handle the redirect to mask S3 as a domain, but I will leave this exercise to the those of you who truly care about the download URLs being shown to your users.
We end up with lots of data in the URL we are redirected to. I removed some data in the example to make the important parts easier to spot.
If we visit the same URL a bit later we get an error telling us the request expired and we do not get access to the data.
There is no excuse to expose uploaded content without authorization and authentication checks to the world wide web. Your users expect their data to be kept safe and only be viewed by people authorised to do so.
All three options we looked at should work across most tech stacks and cloud / hosting providers. Implementation details might vary, but being equipped with the correct terminology, most documentations will guide you through the steps necessary to make this work.