Should illegal numbers be possible?

Disclaimer: I'm a mathematician, not a lawyer.

The problem

  1. Any electronic file is a stream of bits, 1s and 0s. For example, a 1MB file contains 8,388,608 bits.
  2. Every stream of bits is a binary representation of a unique integer. In 1MB, that integer has about 2,525,223 decimal digits.
  3. So every electronic file corresponds to a single unique integer.
  4. It is, in many cases, illegal to copy and distribute electronic versions of movies, music, software etc., as well as things like DVD encryption keys.
  5. Therefore, it is illegal to copy and distribute certain integers.

The problem is this: numbers are just concepts, they are not concrete. They exist independently of intelligent observers. How, then, can you possibly pass restrictions on the distribution of an uncontrollable, indestructible abstract which nobody can own? If you started counting at 1, you'd get to an illegal number eventually. Should you be sued at that moment? Does that make any sense?

Actually, yes, it does.

You can't control mathematics. You can, however, control reality.

The Wastelands

The short answer is: if you started counting, the universe would end before you reached an illegal number. Regardless of how fast you counted. And regardless of where you started counting.

To elaborate:

  1. All electronic files represent, and are represented by, unique integers.
  2. However, those integers are invariably extremely large.
  3. As a proportion of all the possible integers of this size, the set of integers which actually represent real, sane, meaningful electronic files is minuscule.
  4. Thus, the vast majority of all integers in this range are completely useless.
  5. So the set of all the integers representing restricted content - indeed, the set of all "useful" numbers - is a very sparse collection of lonely integers, each one separated from its nearest neighbour by an incomprehensibly gigantic wasteland of meaningless, useless numbers.
  6. Once your electronic files grow beyond, say, a thousand bits in length, the size of the average "useless wasteland" grows to the point where not even all the computing power of the whole universe combined could generate enough processor cycles to cycle through the whole gap.

Therefore, if somebody started counting at ANY integer, the chances of them reaching an illegal number before dying are statistically insignificant. Which means that that person knew the illegal number in advance (or was fed it by somebody else who did know, or generated it using a procedure based on an illegal number, et cetera). Somewhere, there was (either legitimate or illegitimate) intent to copy the original illegal number.

Likewise, if we took somebody's hard drive, randomly read off a sequence of 1000 bits or more, and that stream of bits turned out to be of any significance at all - a prime number or a DVD encryption key or a textbook or the entirety of an MP3-encoded song - then we can draw the same conclusion; the HDD's owner deliberately downloaded that illegal file (or somebody else put the file there, or some insecure/malicious piece of software downloaded it, or whatever).

Wait! What about different formats and compressed versions of the original illegal file? Doesn't that make the set of useful numbers more crowded?

Yes, but not significantly. The number of different "reversible transforms" of this type is strictly finite and MUCH smaller than the magnitude of the number any such transformed file represents. We can consider all transformed illegal numbers to have the same legal restrictions as the original file with no problems. This is simply an extension of the fact that the original number is effectively a transformed version of the actual song (physical pattern of sound waves), having the same distribution restrictions. Naturally, you would have to show that this transform exists, and that the accused had intent/means to use it. But this is moving towards the legal end of the problem, which, as I say, I can't be authoritative about.

So we can conclude that, while it may raise philosophical and practical objections to attempt to pass restrictive laws about a number, it is perfectly practical to pass restrictive laws about real-world representations of it... if, and only if, the number is sufficiently large and complex.

So where do we set our threshold?

Size

According to Wikipedia, a total of roughly 161 exabytes of digital information were created, captured and replicated worldwide over the course of 2006. Meanwhile, every possible 64-bit number put together would take up 128 exabytes of computer space. In this case, the random appearance of a valid 64-bit encryption key where nobody intended it to appear is unacceptably likely. (True, most of the "real" information in the world displays repetition and other patterns, which would lower the odds. As we've established, the vast majority of data streams are meaningless, while the vast majority of actual data in the world IS (believe it or not) meaningful. Nevertheless, the chances of a random collision here are clearly far too high to be worth ignoring.) And the more illegal 64-bit numbers there are, the worse those odds become.

It is debatable whether that chance is "beyond all reasonable doubt"; frankly, I doubt a court of law would ever decide otherwise. But computers are getting faster and increasing in number. In a hundred years' time, "exabyte" may be such a common quantity of data as to be a household term. So to set the threshold this low would be short-sighted.

By comparison, the HD-DVD AACS encryption key which was - to the horror of the MPAA - released publically onto the internet in early May 2007 is 32 hexadecimal digits long (well, 31 if we want to be pedantic, since it has a leading zero) - that's 128 bits. Each of those 64 extra bits makes a random collision half as likely, so this takes us a full twelve orders of magnitude below the point of "certainty".

128 bits, therefore, is an excellent size for the shortest possible illegal number.

Complexity

However, obviously, it would be ridiculous to try to make it illegal to distribute (for example) the encryption key "0000:0000:0000:0000:0000:0000:0000:0000". This is because such numbers, though arbitrarily large, would very probably turn up all over the place, generated independently by multiple sources.

More technically, the remark above about compression can be carried over here. We can compress, say, a billion zeros from occupying almost an entire gibibyte of computer space to a mere six bytes as the ASCII string, "1E9 0s". Other such simple patterns can likewise be compressed immensely to relatively short streams of ASCII using that famous compression routine, "do what the ASCII tells you to do". For example, "5E8 repetitions of 01" or "first 1E9 bits of pi". Since I argued above that a reversibly transformed version of a restricted file should hold the same distribution restrictions as the original, we cannot effectively enforce restrictions about these arbitrarily large but relatively simple bit streams, since they transform to something too short, meaning collisions are too likely.

English, of course (and other languages using relatively small character sets) is even more compressed a format than ASCII, since the vast majority of possible streams of ASCII mean nothing in English. Thus, "Generate and concatenate the first million prime numbers" is a 56 byte string = a 448 bit number (which describes the construction of about 7.3MiB of decimal digits) but still probably immune to distribution restrictions.