You are familiar, of course, with Urban Dictionary, in which users submit definitions of various slang terms, and the most popular (presumably the most accurate) definition gets voted up to the top of the stack of definitions underneath each slang term. That's a pretty smart way to build a dictionary of slang, which is, of course, the set of all words whose definitions are loose, fluid and pretty much whatever the world defines them as.
Let's extend this model from slang terms to all English language terms. And also, terms in other languages. And also, multi-word sentence fragments. And also, complete sentences. Instead of a definition, what you submit is a translation into some other language. And then you vote up or down on other translations of the same entry. That could work, couldn't it?
There are a few more critical components to this idea.
One is an API which can be integrated into, among other things, an instant messaging client. This means that when somebody sends you a sentence in a language you don't understand, instead of copying and pasting that into Babel Fish, your client automatically sends out a request and receives a selection of possible translations in return. You then have the option of selecting from the list the translation which turned out to be the most accurate - or responding with a more finely-tuned translation of your own.
Another (the slightly more questionable component of my idea) is the bit relating to what the machine does if the specifically requested original phrase isn't present in the database. In a situation like that, we don't particularly want the machine to just go "urk, I got nothing". Even a wild guess pieced together from direct dictionary lookups would be better than nothing - because then the receiving user can go, "Well, that's just nonsensical, but if I alter a few words I can repair it, okay, here is a slightly better translation" and send it back. Thus, the translation stored in the database will iteratively migrate towards something which is more accurate. I hope.
The last hurdle is the problem of there being millions and squillions and quintillions of possible sentences, many of them differing from each other by only a very small, trivial "edit distance". I hope to solve this by drastically restricting the maximum length of a message in the database. I'm talking "atoms": five words, or 64 bytes, or something like that. This could be the fatal flaw in my plan, but the key point of my plan is that it must require no linguistic skill on my part, which means that it cannot require any kind of language-specific translation intelligence, only basic algorithms and raw data.
My belief is this: it is easier to make a human being manually break up a complicated sentence into shorter, simpler, machine-translatable sentences than it is for a machine to accurately translate the original longer sentence. If we can train people to communicate with greater precision and terseness - and we're well on our way, we already have Twitter and "txtspk" - then we can effectively train ourselves to communicate in an unambiguous sub-language of English/French/Chinese/whatever, which a machine can translate perfectly.
Obviously, some boffin could build on this data set (once it's populated) and make an algorithm capable of translating longer sentences by referring to the various shorter sentences, but that's for another day.
This idea, like all of mine, is unrefined. Somebody want to attempt it?
Discussion (26)
2009-09-13 19:49:59 by YarKramer:
2009-09-13 21:18:53 by Robert:
2009-09-13 21:49:59 by Sam:
2009-09-13 21:53:05 by Boter:
2009-09-14 13:54:03 by Jason:
2009-09-14 14:26:18 by Sam:
2009-09-14 22:11:17 by Val:
2009-09-15 04:47:34 by ZhenLin:
2009-09-15 08:57:14 by Sam:
2009-09-15 18:09:43 by kRemit:
2009-09-15 18:24:05 by Sam:
2009-09-15 18:33:52 by kRemit:
2009-09-15 18:48:30 by Sam:
2009-09-16 00:07:44 by Fjord:
2009-09-16 07:00:28 by Boter:
2009-09-18 10:24:26 by Val:
2009-09-18 21:29:57 by Azrael:
2009-09-25 09:57:46 by kRemit:
2009-10-02 12:10:11 by MikeUnwalla:
2009-10-07 14:07:53 by D:
2009-10-12 02:27:52 by doomsought:
2009-10-23 10:52:55 by Artanis:
2010-11-03 05:03:26 by Dennis:
2010-11-03 05:05:14 by Dennis:
2011-05-17 17:45:01 by Eugene:
2011-05-18 01:26:37 by Eugene:
add comment