Returning 301 Redirects for URL Changes with Nginx

Since my move to Ghost from WordPress, I've been playing catch-up with my server admin skills. In that... I have none.

So I have had to find a solution for a few problems on my droplet, that weren't entirel easy to figure out, but hey I've solved them!

Play the victory montage.

So here's the situation: my old WordPress blog had posts that had a date stamp in the URL. It basically looked like this:  

And that's how it was for a long time. Google thought it would stay that way forever. Except things changed.

When it came time to switch to Ghost, I decided to drop the year and month from the URL, because I felt like it. However, since there were no real rules set up in nginx, which is my HTTP server on my DigitalOcean droplet, I had to figure out how to do 301 redirects.

If you're unfamiliar with 301 redirects, they're basically big signposts that let browsers know that your site or page has moved elsewhere. They're also the best way to take existing SEO work that you've done, and transfer it to any new domains that you have moved the content to, in this case, my blog.

Now, bear with me if you've never had to deal with a server before, but inside of my droplet was a file, called ghost that stores all of my nginx data for my Ghost install (this blog), which I was pointed towards if I wanted to get 301 redirects accomplished. It looked like this:

server {  
    listen 80 default_server;
    listen [::]:80 default_server ipv6only=on;

    server_name; # Replace with your domain

    root /usr/share/nginx/html;
    index index.html index.htm;

    client_max_body_size 10G;

    location / {
        proxy_pass http://localhost:2368;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_buffering off;

Ain't that pretty?

In addition, that location / {} block at the bottom is basically how I accomplished my 301 redirects... but I am getting ahead of myself.

Now, since I know that my URLs used to have a year and month before the post slug, we can use regular expression, or regex, to search for the relevant parts we want to replace, and serve up a URL without it.

Here's what that regex would look like: /([0-9]{4}\/[0-9]{2}\/(.*)\/

Or, if you don't read nonsense, it looks for URLS that start with a /, then four digits {4} that are between 0-9, followed by another (escaped) /, with two digits {2} that are between 0-9, with another (escaped) /, followed by whatever string. (And another (escaped) /).

Fun, right?

Anyway, after an hour or so of rebooting my server and seeing unknown directive "4}\/[0-9]" pop up in my error logs over and over again, I realized that the location block in the nginx file uses curly braces {} to denote the actions of locations that match my regex.

No problem, we'll escape the curly braces too!

Dat regex: /([0-9]\{4\}\/[0-9]\{2\})\/(.*)\/

...except this fails too. And honestly, I have no idea why, other than it was too darn clever for its own good. Which is fine, I can respect something that wants to keep it simple.

In the end, I just manully matched digits in regex inside of the location block, and I added this to the bottom of my ghost file inside of the server {} block:

    location ~ /([0-9][0-9][0-9][0-9]\/[0-9][0-9])\/(.*)\/ {
        return 301$2;

In case you want to walk through the regex with me again, here you go:

In the location block, we look for a section of the URL that starts with the forward slash /, followed by four digits between 0-9, then another /, followed by two digits between 0-9, and finally whatever string rests between the last two forward slashes /.

The () around the regex form groups, which also act as variables for the actions we take in the location block, which in this case, I match the numerical representations of the year and month, and return just the slug that follows them, without the year and month parts of the URL.

What a journey.