On short URLs

Let me tell you about my URL schema.

The more alert of my readers might have noticed that qntm.org has very short URLs. http://qntm.org/destroy, for example, is a much shorter URL than, to pluck an example out of clear air, http://uk.kotaku.com/5800473/portal-2s-erik-wolpaw-lecture-now-online-for-your-viewing-pleasure. You will also notice that unlike the URLs which fall out of a typical URL shortening service, qntm.org URLs are at least a little bit descriptive. This is one hundred percent deliberate.

Why

Extremely short URLs have several distinct advantages over verbose, highly expository URLs.

There are some disadvantages:

How I did it

Going further

(This is more of that part that I said I hadn't done yet.)

Qntm.org has pages and every page has a slug. But qntm.org also has comments and, although you wouldn't know it, every comment has a slug too. Since the slug "comments" was already taken by a page, (see what I mean?), each comment instead has an auto-generated slug of the form "kommentXXX" where XXX is an integer. (In the case of rapidly-user-generated content, I think auto-generating slugs is legitimate. No user should be forced (or permitted) to generate their own URLs on your site. That's crazy talk.)

Using the same slug schema site-wide is also legitimate. A naive schema would use "http://qntm.org/pages/*" for pages and "http://qntm.org/comments/*" for comments, but that's wordy (or, at least, charactery). Once I've moved slugs into a distinct table, which both pages and comments (and possibly other future classes of objects in my object-relational wossname) will refer to without treading on each other, the almighty slug will become a key to any object of any class on my site. A time will hopefully come when you can go to "http://qntm.org/komment1003" and see either the detail on that specific comment, or be redirected to the page upon which that comment was originally made. You'll be able to send HTTP to that endpoint and modify or delete your comment.

Credit where it's due: this last idea owes a lot to the original concept behind Everything2. On E2, "everything is a node", be it a user, a writeup, an "e2node" (glob of writeups under the same title), a usergroup, a stylesheet, a nodelet (UI component) or even an htmlcode (chunk of Perl code which makes the site go). However, E2's original implementation was different. For example, several types of object, such as votes and private messages, were not nodes, making "everything is a node" an outright lie at best. Also, E2 focused on node ID numbers rather than titles (which were permitted to collide) or slugs (which E2 lacked).

Anyway. Somebody asked, so there it all is.

Back to Blog
Back to Things Of Interest
StumbleUpon Twitter Hacker News Facebook Reddit Digg del.icio.us Email

Discussion (21)

2011-05-11 01:12:04 by Michael:

And this sort of article is why I have your site's RSS feed in my browser. Oh, and the fiction.

Maybe I'm a nerd, but I find it interesting how other people create website scheme such as this. Personally, I agree with you on wondering about the need for the date, or the full page title in the URL.

One thing that I do on my own website is to categorize things. So writing is put in /writing and then political writing is in /writing/politics and the article I wrote about why everything sucks is in /writing/politics/sucks. I prefer this scheme because the categories are obvious in the URL, and it makes navigation easier (just strip everything to the right of one of the slashes to get a category). I think this gives me memorable paths (at least for me), and I believe the advantages outweigh the advantages of your system (for me). Of course that makes URLs longer, but my website domain isn't short anyway.

2011-05-11 03:40:35 by David:

I'd like to bring up a point similar to Michael's and say that a folder structure is nice in a website.

While I don't advise you to pick up a scheme similar to Kotaku's I wonder why you chose not to use folders. For example, this article could have had a URL such as qntm.org/blog/urls or your comments article could have been qntm.org/blog/comments. While this does add a whole 5 characters to the url, perhaps you can shorten these things down. qntm.org/b/urls is still descriptive and is certainly short.

I know with the site's last two iterations you've put a lot of importance on the "breadcrumbs" that you put at the top of each page, and URLs are a way of having a built in trail of crumbs. If you decided to implement something like qntm.org/b/urls, you could have qntm.org/b redirect to qntm.org/blog.

With proper implementation this could correct your code/src problem. If you had "regarding code" have the url qntm.org/b/code and your Code directory have the URL qntm.org/code or, as a redirect, qntm.org/c.

The only problem with this is that at a certain point you will run out of short redirects. For example, if you start a directory called "Business" you'll have to have the shortened slug be /bu/ (or /bz/) instead of /b/.

Suddenly the design of most imageboards makes more sense.

2011-05-11 08:24:43 by Artanis:

Personally, though, I'd avoid making anything reference /b/. Even indirectly.

2011-05-11 09:27:40 by Sam:

Folders would be counterproductive for me.

First, as you mentioned, it makes the URL longer.

Secondly, I move stuff around all the time. Like, incessantly. If I move something from one folder to another, does that mean I have to change its URL? That's bad. Or do I need to set up a redirect? That's bad too. I originally ran into this problem way back when I used physical files.

There is a third structure here: the apparent arrangement of my site into directories and subdirectories. This is independent from both my URL schema and my directory structure. Using a single slug, I can put anything as a child of anything else without worrying about how or where users enter the site.

2011-05-11 15:23:13 by David:

Ah, I see. Perhaps these new reasons are actually more important to your decision making process than those others one.

To me, it seems that now we've gotten to the root of the problem and the original pros and cons you wrote about were merely justifications. :)

Either way, it's your site and you do a darn fine job of running it.

Kinda, off topic, but do you have plan's to incorporate more of your articles from the Sam's Archive iteration of the site? I personally remember a "bounce house games" article and your idea for an Ocean's 11 TV show article that aren't around anymore.

2011-05-11 15:53:51 by Sam:

Those are still in the blog, although you may need to search for them. I'm meaning to make the site more fluid and well-connected so you can find stuff more easily now that most things are in one directory.

2011-05-12 20:50:28 by aaroncrane:

@Michael: for sites with different needs — I’m thinking particularly of, as Sam puts it, those with “more than one-fifth of a full-time contributor” — one large advantage of putting some representation of the date into the URL is that slugs have to be unique only within the smallest time period you can indicate in the URL, rather than globally.

For example, The Register http://www.theregister.co.uk/ puts the year, month, and day into article URLs, and uses a slug within that. This has some of the advantages Sam points out for his use of slugs, while mitigating the bad effects if (when) a journalist initially picks a terrible slug (given that slugs are expected to last forever, and that journalists tend to worry a lot more about their copy than about URL quality). With the Reg typically publishing several dozen articles per day, the requirement for uniqueness within a single day is in practice easy for journalists to live with.

The Reg also arranges that manual URL editing works as you might expect: if you remove the slug from an article URL, you get the URL of a daily archive page; if you remove the day-of-month from a daily archive URL, you get a monthly archive; and so on. I believe this is a valuable service to its readers. (Technically, there are one or two places where that isn’t true; http://www.theregister.co.uk/Wrap/playmobil/ exists, but http://www.theregister.co.uk/Wrap/ doesn’t, for example. But the principle mostly holds.)

2011-05-12 23:30:41 by YarKramer:

For the record, the TinyURL link to http://qntm.org/destroy (http://tinyurl.com/zu228 ) is actually one character *longer* than the original. I just thought some people would find that amusing.

2011-05-13 12:21:16 by Adam:

I particularly like date-less URLs for content like qntm.org/uk (I love this, as it makes it so easy to link people to), but how do you reconcile date-less URLs with content that might become out-dated? As well as that, what about the informative nature of having the date there in the first place? qntm.org/11/05/urls immediately informs you that it might in fact be out-of-date. Is this something we should be caring about at all, and instead, for time-sensitive content (what content isn't time-sensitive, on a long enough scale?), put the date prominently with the title on the page? What if you really, really want to write about URLs again 3 years from now? With a date in the URL, you can publish at qntm.org/16/05/urls easily.

I spent days considering whether to include dates in my URLs when I was writing my custom super-minimal CMS. I'm still sitting right on the fence, because I can't decide.

2011-05-13 14:21:54 by Naleh:

aaroncrane has a point - having dates in their urls helps news websites deal with the slug limitation, and also gives an easy way to access time-period archives, which is something a news website may wish to offer.

So, reasonable for a news website. In most cases, though...

@Adam:
I wouldn't bother putting a date in. Put it at the top of the page if it's going to be important - which, depending on your content, it most often won't be. I generally don't bother to read the clutter in the middle of a url anyway, and I suspect I'm not alone.

2011-05-13 16:30:09 by Sam:

Adam: firstly, I have no idea what date schema allows you to start at "11/05" and add three years and get "16/05". If by "11/05" you mean "May 2011" then already the schema is nonsensical, it could just as easily mean "November 2005", "5th November" or "11th May". As you can see, it is very easy to get this stuff wrong.

Secondly, as you will notice, qntm.org articles do not have prominent dates, even in the body of the web page. They are dated, but you'll have to look at the footer. I have been considering changing this policy, but on the other hand, sometimes the only thing that a prominent date achieves is to announce "This content is out of date-- do not read it!" to the whole world.

2011-05-13 21:26:07 by Adam:

Well, yes, you could pick /2011/05, then.

I think one good influence of date-less URLs described her is the reminder that you shouldn't just write ephemeral junk.

2011-05-14 04:51:30 by ebenezer:

I cannot see why David's single-character category idea would not work for sites other than qntm.org, particularly if they have fixed categories and do not mind adding two characters to their URLs. If you start your site knowing that you intend to write about rugs, science, and politics, for example, you could write about those three under /r/, /s/, and /p/. In doing so you not only include the context of the article (to a reader who regularly visits /p/, that bit in the URL tells him he's getting sent to an article relating to politics), but you also open any slug imaginable to be used in each category once.

And @David, in case you haven't found them already: the articles you remembered are at http://qntm.org/bouncy and http://qntm.org/o11.

2011-05-16 02:44:17 by Mike:

@ebenezer Point, but single-word slugs only work in sites with distinct alphabetical categories of content, like 4chan. Otherwise? Confusing, as they could stand for whatever you want them to.

2011-05-16 04:37:53 by ebenezer:

@Mike: If by "single-word slugs" you meant the use of single-character categories (as /p/), then I should clarify. I said that this would work provided that the site's categories are fixed -- i.e., they are chosen once and do not change. It is true that a single letter could stand for any category, but if these categories were chosen and remained the same, then an article's URL would still provide some degree of context. This would be most useful for returning visitors who knew that a given character stood for a particular category, but it would be in this sense preferable to having every article directly off the site root.

For a sort of example of this, see the URLs for Mark Reid's two blogs (at http://mark.reid.name). The URL for his blog Inductio ex Machina is http://mark.reid.name/iem/, and the URL for his blog Structure & Process is http://mark.reid.name/sap/. To someone who has never seen either site and has only come across the URL, these three-letter abbreviations mean nothing. But to a returning visitor, they specify the blog on which an article was posted (each blog's articles are accessed as http://mark.reid.name/xxx/article).

(If your objection was other than the one I answered, please explain.)

2011-05-17 04:03:08 by LabrynianRebel:

Well all of that work was worth it because I have really enjoyed your website's layout and structure compared to other websites. Great job!

2011-05-23 17:18:57 by John:

testing test to see if commenter is real

2011-05-23 17:21:43 by me:

just testing

2011-05-23 18:15:19 by Sam:

Yes, you have successfully figured out how to use the commenting system. Now start saying something other than "just testing" or I'm making the question harder.

2011-05-27 00:32:32 by ebenezer:

But before you do that, I have a quick question: You mention that if you had thought more clearly you "would have ruled out digits, underscores and hyphens entirely." I understand why you would want to leave underscores and hyphens out of your URLs. But what problem do you have with digits?

2011-08-10 01:59:08 by Jeremy:

"Each comment instead has an auto-generated slug of the form "kommentXXX" where XXX is an integer."

хорошо, Komrade!

add comment