On short URLs

Let me tell you about my URL schema.

The more alert of my readers might have noticed that qntm.org has very short URLs. http://qntm.org/destroy, for example, is a much shorter URL than, to pluck an example out of clear air, http://uk.kotaku.com/5800473/portal-2s-erik-wolpaw-lecture-now-online-for-your-viewing-pleasure. You will also notice that unlike the URLs which fall out of a typical URL shortening service, qntm.org URLs are at least a little bit descriptive. This is one hundred percent deliberate.

Why

Extremely short URLs have several distinct advantages over verbose, highly expository URLs.

  • They're memorable. I don't know about my readers, but I know for a fact that "How To Destroy The Earth" is located at http://qntm.org/destroy. If I want to see that page, I can just type that into my address bar. I don't have to load up my own index page and browse the site until I find HTDTE manually and I don't have to do a Google search, not even a site-specific Google search. Anybody who has more than a flicker of admiration for command lines as a concept will appreciate my appreciation for this angle.
  • I can type them faster. In the majority of cases it's faster than even a copy and paste.
  • I can type them at all, which is more than can be said for the majority of URLs in the world today, including even the extremely short ones coming from URL shorteners. Notice that for the most part, qntm.org URLs use solely lower-case letters (with the occasional digit, underscore or hyphen-- had I thought more clearly I would have ruled out digits, underscores and hyphens entirely, but it's too late for that). And they're usually based on English words. As anybody who's had to invent a secure password will know, actual words are much quicker and easier to remember and type than mere conflations of random characters.
  • I can make sense of my site usage statistics (entry pages, popular pages, visitor paths) more easily. I don't have to have the URLs expanded out to page titles, nor do I have to look up content from some random page ID to remember exactly what links to what.
  • Visitors do not need to use URL shorteners to link to me. When you click a qntm.org link, you know what you are getting, unlike a bit.ly link or similar. There is an extra layer of trust, or, to put it another way, one less layer of uncertainty. Most qntm.org URLs will fit alongside a decent-sized tweet. The longest URL on my site is http://qntm.org/news_stickmanstickman, 37 characters, and that's an outlier, referring to a very rarely-visited, unpopular page.
  • Because each page has a short, unique "slug" (e.g. in "http://qntm.org/destroy", the slug is "destroy"), each page has a unique URL. There is no redundancy in my URL schema - i.e., it is not possible to reach the same resource via multiple URLs. This keeps my site's complexity to a minimum and I'm told it's also good for SEO, although I honestly pay very little attention to the latter. (Note: actually I went through several more verbose schemata before I reached this one. But requests conforming to older schemata are 301 redirected to the new one.)
  • Because each page has a short, unique "slug", I can create internal links very easily. I just type [[destroy|something like this]] and it becomes something like this. If I change the schema (which seems unlikely right now, but is possible) then I can change that internal link expansion routine to match it.
  • Because each page has a short, unique "slug", each URL can serve as a web service endpoint for that page. I haven't actually implemented this yet, because there's no pressing need to do so (what I currently have works fine), but I'm hoping to turn qntm.org into a vaguely RESTful web service, whereby I can create new pages as children of existing pages, reparent and modify pages and so on using HTTP verbs.
  • It doesn't actually matter that much. Most people don't care about URL schemata. They don't remember and type, they copy and paste. Going for short URLs won't actually irritate those people.

There are some disadvantages:

  • The namespace tends to fill up quite fast. It turns out that I already have [[code|a page whose slug is "code"]], so the new "Code" directory that I recently created had to be given the slug "src". If this is a minor irritation for me, with only about 600 distinct pages and growing at about 4 pages per month on average, then it'll be a real problem for any site with more than one-fifth of a full-time contributor, and any kind of tight focus on a specific subject. (Hint: if your site is about rugs, the "rugs" slug may be in demand.)
  • You do actually have to think of something. Meaningful, super-brief slugs can't be generated mechanically. (Or can they? Probably, but that would take your content management system programmer quite a long way off-topic.)
  • A slug is for life. They have to be picked carefully. They are like IDs in database tables-- permanent. Sometime down the line you may regret giving up one particular memorable slug for what turned out to be a very boring and unpopular page/resource, when it would have been a really good fit for a new, more interesting resource that you are creating right now. (Hint: don't use "main", "index", "page" or "article". In my case, I've also had to avoid "timeline", "science" and "fiction".) It is not for nothing that "Naming things" is one of the Two Hard Problems Of Computer Science (the others are "cache invalidation" and "fencepost errors").
  • You can't see the whole headline or the date or anything else you previously included in your URLs. Although, why people include those things to begin with, I couldn't say.
  • It doesn't work for dynamic queries for obvious reasons. Sometimes a user needs to be able to put arbitrary search terms into a URL.

How I did it

  • I picked a short domain name. It didn't have to be amazingly short. Admittedly, "qntm.org" is nowhere near "u.nu" in the grand scheme of things, but it's still pretty good. In my defence, when I bought the domain five years ago URL shorteners were not so prevalent and Twitter did not exist.
  • I wrote my own content management system. Whether this is possible in other, "actual" CMSes, I couldn't say.
  • I made a clear mental break between (1) my URL schema and (2) the arrangement of files on my web server. These two structures need bear no relationship to one another whatsoever. You can do almost anything with mod_rewrite, and where mod_rewrite falters, PHP can pick up the slack.
  • I figured out slugs for all my existing pages. (As mentioned above, choose carefully. If you decide it's too much work and auto-generate, choose even more carefully.)
  • The front page has slug "". That's the empty string.

Going further

(This is more of that part that I said I hadn't done yet.)

Qntm.org has pages and every page has a slug. But qntm.org also has comments and, although you wouldn't know it, every comment has a slug too. Since the slug "comments" was already taken by a page, (see what I mean?), each comment instead has an auto-generated slug of the form "kommentXXX" where XXX is an integer. (In the case of rapidly-user-generated content, I think auto-generating slugs is legitimate. No user should be forced (or permitted) to generate their own URLs on your site. That's crazy talk.)

Using the same slug schema site-wide is also legitimate. A naive schema would use "http://qntm.org/pages/*" for pages and "http://qntm.org/comments/*" for comments, but that's wordy (or, at least, charactery). Once I've moved slugs into a distinct table, which both pages and comments (and possibly other future classes of objects in my object-relational wossname) will refer to without treading on each other, the almighty slug will become a key to any object of any class on my site. A time will hopefully come when you can go to "http://qntm.org/komment1003" and see either the detail on that specific comment, or be redirected to the page upon which that comment was originally made. You'll be able to send HTTP to that endpoint and modify or delete your comment.

Credit where it's due: this last idea owes a lot to the original concept behind Everything2. On E2, "everything is a node", be it a user, a writeup, an "e2node" (glob of writeups under the same title), a usergroup, a stylesheet, a nodelet (UI component) or even an htmlcode (chunk of Perl code which makes the site go). However, E2's original implementation was different. For example, several types of object, such as votes and private messages, were not nodes, making "everything is a node" an outright lie at best. Also, E2 focused on node ID numbers rather than titles (which were permitted to collide) or slugs (which E2 lacked).

Anyway. Somebody asked, so there it all is.

Back to Blog
Back to Things Of Interest

Discussion (23)

2011-05-11 00:12:04 by Michael:

And *this* sort of article is why I have your site's RSS feed in my browser. Oh, and the fiction.

Maybe I'm a nerd, but I find it interesting how other people create website scheme such as this. Personally, I agree with you on wondering about the need for the date, or the full page title in the URL.

One thing that I do on my own website is to categorize things. So writing is put in "/writing" and then political writing is in "/writing/politics" and the article I wrote about why everything sucks is in "/writing/politics/sucks". I prefer this scheme because the categories are obvious in the URL, and it makes navigation easier (just strip everything to the right of one of the slashes to get a category). I think this gives me memorable paths (at least for me), and I believe the advantages outweigh the advantages of your system (for me). Of course that makes URLs longer, but my website domain isn't short anyway.

2011-05-11 02:40:35 by David:

I'd like to bring up a point similar to Michael's and say that a folder structure is nice in a website.

While I don't advise you to pick up a scheme similar to Kotaku's I wonder why you chose not to use folders. For example, this article could have had a URL such as qntm.org/blog/urls or your comments article could have been qntm.org/blog/comments. While this does add a whole 5 characters to the url, perhaps you can shorten these things down. qntm.org/b/urls is still descriptive and is certainly short.

I know with the site's last two iterations you've put a lot of importance on the "breadcrumbs" that you put at the top of each page, and URLs are a way of having a built in trail of crumbs. If you decided to implement something like qntm.org/b/urls, you could have qntm.org/b redirect to qntm.org/blog.

With proper implementation this could correct your code/src problem. If you had "regarding code" have the url qntm.org/b/code and your Code directory have the URL qntm.org/code or, as a redirect, qntm.org/c.

The only problem with this is that at a certain point you will run out of short redirects. For example, if you start a directory called "Business" you'll have to have the shortened slug be /bu/ (or /bz/) instead of /b/.

Suddenly the design of most imageboards makes more sense.

2011-05-11 07:24:43 by Artanis:

Personally, though, I'd avoid making anything reference /b/. Even indirectly.

2011-05-11 08:27:40 by qntm:

Folders would be counterproductive for me.

First, as you mentioned, it makes the URL longer.

Secondly, I move stuff around *all the time*. Like, incessantly. If I move something from one folder to another, does that mean I have to change its URL? That's bad. Or do I need to set up a redirect? That's bad too. I originally ran into this problem way back when I used physical files.

There is a third structure here: the apparent arrangement of my site into directories and subdirectories. This is independent from both my URL schema *and* my directory structure. Using a single slug, I can put anything as a child of anything else without worrying about how or where users enter the site.

2011-05-11 14:23:13 by David:

Ah, I see. Perhaps these new reasons are actually more important to your decision making process than those others one.

To me, it seems that now we've gotten to the root of the problem and the original pros and cons you wrote about were merely justifications. :)

Either way, it's your site and you do a darn fine job of running it.

Kinda, off topic, but do you have plan's to incorporate more of your articles from the Sam's Archive iteration of the site? I personally remember a "bounce house games" article and your idea for an Ocean's 11 TV show article that aren't around anymore.

2011-05-11 14:53:51 by qntm:

Those are still in the blog, although you may need to search for them. I'm meaning to make the site more fluid and well-connected so you can find stuff more easily now that most things are in one directory.

2011-05-12 19:50:28 by aaroncrane:

@Michael: for sites with different needs — I’m thinking particularly of, as Sam puts it, those with “more than one-fifth of a full-time contributor” — one large advantage of putting some representation of the date into the URL is that slugs have to be unique only within the smallest time period you can indicate in the URL, rather than globally.

For example, The Register http://www.theregister.co.uk/ puts the year, month, and day into article URLs, and uses a slug within that. This has some of the advantages Sam points out for his use of slugs, while mitigating the bad effects if (when) a journalist initially picks a terrible slug (given that slugs are expected to last forever, and that journalists tend to worry a lot more about their copy than about URL quality). With the Reg typically publishing several dozen articles per day, the requirement for uniqueness within a single day is in practice easy for journalists to live with.

The Reg also arranges that manual URL editing works as you might expect: if you remove the slug from an article URL, you get the URL of a daily archive page; if you remove the day-of-month from a daily archive URL, you get a monthly archive; and so on. I believe this is a valuable service to its readers. (Technically, there are one or two places where that isn’t true; http://www.theregister.co.uk/Wrap/playmobil/ exists, but http://www.theregister.co.uk/Wrap/ doesn’t, for example. But the principle mostly holds.)

2011-05-12 22:30:41 by YarKramer:

For the record, the TinyURL link to http://qntm.org/destroy (http://tinyurl.com/zu228 ) is actually one character *longer* than the original. I just thought some people would find that amusing.

2011-05-13 11:21:16 by Adam:

I particularly like date-less URLs for content like qntm.org/uk (I love this, as it makes it so easy to link people to), but how do you reconcile date-less URLs with content that might become out-dated? As well as that, what about the informative nature of having the date there in the first place? qntm.org/11/05/urls immediately informs you that it might in fact be out-of-date. Is this something we should be caring about at all, and instead, for time-sensitive content (what content isn't time-sensitive, on a long enough scale?), put the date prominently with the title on the page? What if you really, really want to write about URLs again 3 years from now? With a date in the URL, you can publish at qntm.org/16/05/urls easily.

I spent days considering whether to include dates in my URLs when I was writing my custom super-minimal CMS. I'm still sitting right on the fence, because I can't decide.

2011-05-13 13:21:54 by Naleh:

aaroncrane has a point - having dates in their urls helps news websites deal with the slug limitation, and also gives an easy way to access time-period archives, which is something a news website may wish to offer.

So, reasonable for a news website. In most cases, though...

@Adam:
I wouldn't bother putting a date in. Put it at the top of the page if it's going to be important - which, depending on your content, it most often won't be. I generally don't bother to read the clutter in the middle of a url anyway, and I suspect I'm not alone.

2011-05-13 15:30:09 by qntm:

Adam: firstly, I have no idea what date schema allows you to start at "11/05" and add three years and get "16/05". If by "11/05" you mean "May 2011" then already the schema is nonsensical, it could just as easily mean "November 2005", "5th November" or "11th May". As you can see, it is very easy to get this stuff wrong.

Secondly, as you will notice, qntm.org articles do not have prominent dates, even in the body of the web page. They are dated, but you'll have to look at the footer. I have been considering changing this policy, but on the other hand, sometimes the only thing that a prominent date achieves is to announce "This content is out of date-- do not read it!" to the whole world.

2011-05-13 20:26:07 by Adam:

Well, yes, you could pick /2011/05, then.

I think one good influence of date-less URLs described her is the reminder that you shouldn't just write ephemeral junk.

2011-05-14 03:51:30 by ebenezer:

I cannot see why David's single-character category idea would not work for sites *other than* qntm.org, particularly if they have fixed categories and do not mind adding two characters to their URLs. If you start your site knowing that you intend to write about rugs, science, and politics, for example, you could write about those three under /r/, /s/, and /p/. In doing so you not only include the context of the article (to a reader who regularly visits /p/, that bit in the URL tells him he's getting sent to an article relating to politics), but you also open any slug imaginable to be used in each category once.

And @David, in case you haven't found them already: the articles you remembered are at http://qntm.org/bouncy and http://qntm.org/o11.

2011-05-16 01:44:17 by Mike:

@ebenezer Point, but single-word slugs only work in sites with distinct alphabetical categories of content, like 4chan. Otherwise? Confusing, as they could stand for whatever you want them to.

2011-05-16 03:37:53 by ebenezer:

@Mike: If by "single-word slugs" you meant the use of single-character categories (as /p/), then I should clarify. I said that this would work provided that the site's categories are *fixed* -- i.e., they are chosen once and do not change. It is true that a single letter could stand for any category, but if these categories were chosen and remained the same, then an article's URL would still provide some degree of context. This would be most useful for returning visitors who *knew* that a given character stood for a particular category, but it would be in this sense preferable to having every article directly off the site root.

For a sort of example of this, see the URLs for Mark Reid's two blogs (at http://mark.reid.name). The URL for his blog *Inductio ex Machina* is http://mark.reid.name/iem/, and the URL for his blog *Structure & Process* is http://mark.reid.name/sap/. To someone who has never seen either site and has only come across the URL, these three-letter abbreviations mean nothing. But to a returning visitor, they specify the blog on which an article was posted (each blog's articles are accessed as http://mark.reid.name/xxx/article).

(If your objection was other than the one I answered, please explain.)

2011-05-17 03:03:08 by LabrynianRebel:

Well all of that work was worth it because I have really enjoyed your website's layout and structure compared to other websites. Great job!

2011-05-23 16:18:57 by John:

testing test to see if commenter is real

2011-05-23 16:21:43 by me:

just testing

2011-05-23 17:15:19 by qntm:

Yes, you have successfully figured out how to use the commenting system. Now start saying something other than "just testing" or I'm making the question harder.

2011-05-26 23:32:32 by ebenezer:

But before you do that, I have a quick question: You mention that if you had thought more clearly you "would have ruled out digits, underscores and hyphens entirely." I understand why you would want to leave underscores and hyphens out of your URLs. But what problem do you have with digits?

2011-08-10 00:59:08 by Jeremy:

"Each comment instead has an auto-generated slug of the form "kommentXXX" where XXX is an integer."

хорошо, Komrade!

2012-09-15 21:22:55 by Connor:

Good website :)

2012-10-14 18:25:08 by Connor:

I've replied to you Sam ;)