Let me tell you about my URL schema.
The more alert of my readers might have noticed that qntm.org has very short URLs. http://qntm.org/destroy, for example, is a much shorter URL than, to pluck an example out of clear air, http://uk.kotaku.com/5800473/portal-2s-erik-wolpaw-lecture-now-online-for-your-viewing-pleasure. You will also notice that unlike the URLs which fall out of a typical URL shortening service, qntm.org URLs are at least a little bit descriptive. This is one hundred percent deliberate.
Extremely short URLs have several distinct advantages over verbose, highly expository URLs.
- They're memorable. I don't know about my readers, but I know for a fact that "How To Destroy The Earth" is located at http://qntm.org/destroy. If I want to see that page, I can just type that into my address bar. I don't have to load up my own index page and browse the site until I find HTDTE manually and I don't have to do a Google search, not even a site-specific Google search. Anybody who has more than a flicker of admiration for command lines as a concept will appreciate my appreciation for this angle.
- I can type them faster. In the majority of cases it's faster than even a copy and paste.
- I can type them at all, which is more than can be said for the majority of URLs in the world today, including even the extremely short ones coming from URL shorteners. Notice that for the most part, qntm.org URLs use solely lower-case letters (with the occasional digit, underscore or hyphen-- had I thought more clearly I would have ruled out digits, underscores and hyphens entirely, but it's too late for that). And they're usually based on English words. As anybody who's had to invent a secure password will know, actual words are much quicker and easier to remember and type than mere conflations of random characters.
- I can make sense of my site usage statistics (entry pages, popular pages, visitor paths) more easily. I don't have to have the URLs expanded out to page titles, nor do I have to look up content from some random page ID to remember exactly what links to what.
- Visitors do not need to use URL shorteners to link to me. When you click a qntm.org link, you know what you are getting, unlike a bit.ly link or similar. There is an extra layer of trust, or, to put it another way, one less layer of uncertainty. Most qntm.org URLs will fit alongside a decent-sized tweet. The longest URL on my site is http://qntm.org/news_stickmanstickman, 37 characters, and that's an outlier, referring to a very rarely-visited, unpopular page.
- Because each page has a short, unique "slug" (e.g. in "http://qntm.org/destroy", the slug is "destroy"), each page has a unique URL. There is no redundancy in my URL schema - i.e., it is not possible to reach the same resource via multiple URLs. This keeps my site's complexity to a minimum and I'm told it's also good for SEO, although I honestly pay very little attention to the latter. (Note: actually I went through several more verbose schemata before I reached this one. But requests conforming to older schemata are 301 redirected to the new one.)
- Because each page has a short, unique "slug", I can create internal links very easily. I just type [[destroy|something like this]] and it becomes something like this. If I change the schema (which seems unlikely right now, but is possible) then I can change that internal link expansion routine to match it.
- Because each page has a short, unique "slug", each URL can serve as a web service endpoint for that page. I haven't actually implemented this yet, because there's no pressing need to do so (what I currently have works fine), but I'm hoping to turn qntm.org into a vaguely RESTful web service, whereby I can create new pages as children of existing pages, reparent and modify pages and so on using HTTP verbs.
- It doesn't actually matter that much. Most people don't care about URL schemata. They don't remember and type, they copy and paste. Going for short URLs won't actually irritate those people.
There are some disadvantages:
- The namespace tends to fill up quite fast. It turns out that I already have a page whose slug is "code", so the new "Code" directory that I recently created had to be given the slug "src". If this is a minor irritation for me, with only about 600 distinct pages and growing at about 4 pages per month on average, then it'll be a real problem for any site with more than one-fifth of a full-time contributor, and any kind of tight focus on a specific subject. (Hint: if your site is about rugs, the "rugs" slug may be in demand.)
- You do actually have to think of something. Meaningful, super-brief slugs can't be generated mechanically. (Or can they? Probably, but that would take your content management system programmer quite a long way off-topic.)
- A slug is for life. They have to be picked carefully. They are like IDs in database tables-- permanent. Sometime down the line you may regret giving up one particular memorable slug for what turned out to be a very boring and unpopular page/resource, when it would have been a really good fit for a new, more interesting resource that you are creating right now. (Hint: don't use "main", "index", "page" or "article". In my case, I've also had to avoid "timeline", "science" and "fiction".) It is not for nothing that "Naming things" is one of the Two Hard Problems Of Computer Science (the others are "cache invalidation" and "fencepost errors").
- You can't see the whole headline or the date or anything else you previously included in your URLs. Although, why people include those things to begin with, I couldn't say.
- It doesn't work for dynamic queries for obvious reasons. Sometimes a user needs to be able to put arbitrary search terms into a URL.
How I did it
- I picked a short domain name. It didn't have to be amazingly short. Admittedly, "qntm.org" is nowhere near "u.nu" in the grand scheme of things, but it's still pretty good. In my defence, when I bought the domain five years ago URL shorteners were not so prevalent and Twitter did not exist.
- I wrote my own content management system. Whether this is possible in other, "actual" CMSes, I couldn't say.
- I made a clear mental break between (1) my URL schema and (2) the arrangement of files on my web server. These two structures need bear no relationship to one another whatsoever. You can do almost anything with mod_rewrite, and where mod_rewrite falters, PHP can pick up the slack.
- I figured out slugs for all my existing pages. (As mentioned above, choose carefully. If you decide it's too much work and auto-generate, choose even more carefully.)
- The front page has slug "". That's the empty string.
(This is more of that part that I said I hadn't done yet.)
Qntm.org has pages and every page has a slug. But qntm.org also has comments and, although you wouldn't know it, every comment has a slug too. Since the slug "comments" was already taken by a page, (see what I mean?), each comment instead has an auto-generated slug of the form "kommentXXX" where XXX is an integer. (In the case of rapidly-user-generated content, I think auto-generating slugs is legitimate. No user should be forced (or permitted) to generate their own URLs on your site. That's crazy talk.)
Using the same slug schema site-wide is also legitimate. A naive schema would use "http://qntm.org/pages/*" for pages and "http://qntm.org/comments/*" for comments, but that's wordy (or, at least, charactery). Once I've moved slugs into a distinct table, which both pages and comments (and possibly other future classes of objects in my object-relational wossname) will refer to without treading on each other, the almighty slug will become a key to any object of any class on my site. A time will hopefully come when you can go to "http://qntm.org/komment1003" and see either the detail on that specific comment, or be redirected to the page upon which that comment was originally made. You'll be able to send HTTP to that endpoint and modify or delete your comment.
Credit where it's due: this last idea owes a lot to the original concept behind Everything2. On E2, "everything is a node", be it a user, a writeup, an "e2node" (glob of writeups under the same title), a usergroup, a stylesheet, a nodelet (UI component) or even an htmlcode (chunk of Perl code which makes the site go). However, E2's original implementation was different. For example, several types of object, such as votes and private messages, were not nodes, making "everything is a node" an outright lie at best. Also, E2 focused on node ID numbers rather than titles (which were permitted to collide) or slugs (which E2 lacked).
Anyway. Somebody asked, so there it all is.