Solving Tetris in C

Exactly one of the following statements is true.

1. In Tetris, you can always get at least one line before losing.
2. In Tetris, a sufficiently evil AI can always force you to lose without getting a single line, by providing bad pieces.

Which one?

Prove it.

The setup

I created HATETRIS last year because I wanted Tetris players to suffer. It was something I created for my own amusement and as a programming challenge and to investigate the use of JavaScript as a gaming platform.

HATETRIS explores only to a depth of 1 piece. It looks at the immediate future and chooses the single worst piece for the present situation, ignoring the possibility of the player using the current piece to set up a situation where the 2nd piece forces a line. After the first iteration of the game, I started using a bit mask to express each row of the playing field ("well"), which made testing the insertion and removal of possible pieces faster, to the extent that a new piece appeared more or less instantly instead of appearing after a second or two. This improvement allowed the algorithm to be improved to look a second piece into the future, while taking more time. This made the game sluggish, though, so I never left this modification in place.

The other reason I didn't leave HATETRIS to look at depth 2 was because the game was still too easy. Even looking two pieces into the future it was startlingly easy to contrive situations whereby no matter which pair of pieces was supplied, the player could always force a line. Consistent scores of 2 or 3 lines did not satisfy me. HATETRIS+2 wasn't difficult enough. I noted that I intended to return to the project just as soon as I could create a game which never permitted the player to get lines. Ever. I determined to create the evil AI from case 2.

About a year passed and I finally got around to thinking about the problem again. After some additional thought I decided that the chances were that case 1 was actually true. The player can always get a line. But how to prove it?

The problem

"Tetris", the mathematical structure, has to be well-defined before it can be solved.

The "standard" Tetris well has a width of 10 and a height of 20. The piece enters the well from above. In "standard" Tetris the piece spawns in rows 21 and 22 which are hidden off the top of the screen; also, the player loses when a piece spawns in collision with an existing structure. My version is a little different: rows 21, 22, 23 and 24 are clearly visible at the top of the screen, providing room for a piece to be moved laterally and rotated freely before entering the playing field; and the player loses when any landed structure protrudes into this box, even for an instant. HATETRIS is unforgiving: if the player makes a line and loses simultaneously, the player loses.

Tetris always has 7 pieces, I, J, L, O, S, T and Z, but the various orientations of each piece are also subject to variation. Exactly what happens when a piece is rotated? In the naive case, the piece could rotate around its centre of gravity, but that would result in pieces positioned off-centre in many cases, and require floating point numbers. In HATETRIS, pieces rotate around the central point of a 4x4 bounding box, which can be done with integers, but this is unlike any other Tetris game and additionally results in the I, S and Z pieces having not 2 but 4 orientations each. So, for the sake of simplicity and making the process faster, I opted instead for the Original Rotation System. This cannot be procedurally generated, but generating the orientations by hand as bitmasks was trivial.

More recent versions of Tetris implement advanced functionality such as wall and floor kicks, which I omitted for the sake of optimisation. If there is obstruction or the piece would end up partially outside the well, a move or rotation is simply forbidden. As in HATETRIS, it is assumed for the sake of argument that there is no gravity (or, to put it another way, the player is at liberty to move the piece as far horizontally and rotate it as many times as she or he desires before it falls another row). To move down, the player presses "down". A piece locks when the player moves down, but meets an obstruction. Thus, there is effectively infinite "lock delay". Given the "free area" above the playing field, described above, there is also effectively infinite "entry delay".

1. If the player can always force one line, we say that the player wins.
2. If the AI can always force the player to lose with zero lines, we say that the AI wins.

Since 1 line is all that is needed, we can thankfully omit any procedures for restructuring the playing field after a line is made.

Ruling out some cases

The question is still meaningful for Tetris wells as narrow as 4 units wide. Any narrower and the I piece can no longer spawn or rotate. A standard Tetris well has a width of 10 and wider wells are of less interest to me.

If the 4-row "free space" at the top is kept, and we only count the lines below this point, the question is also meaningful for wells as shallow as 0 units, although in the cases of height 0, 1 and 2 the solutions are trivial: the AI always wins. (Height 0: the player cannot land a single piece without losing. Height 1: the AI supplies an O block. Height 2: the AI supplies a series of S blocks.)

One quick observation will save us all a great deal of time. By supplying an endless stream of O blocks, the AI can always win any Tetris well with an odd-numbered width: 5, 7 and 9. So, the widths of interest are 4, 6, 8 and 10 only.

If the player can force a line without losing in a well of depth H, the player can do the same in any deeper well by following exactly the same procedure. So, the real question is this: given a well of width W, what is the minimum required depth for the player to win? Alternatively, prove that the AI can always win no matter the depth - or at least to depth 20, if possible.

C (slight detour)

I took this as an opportunity to become familiar with a programming language which I knew very little about, even when I used it in university for my computing projects.

There is nothing that I can say about the C programming language that hasn't already been said thirty years ago. But it's all new to me, so just let me have this.

C is a monumental culture shock to anybody familiar with modern programming languages. C is so old that the canonical reference (Kernighan & Ritchie's The C Programming Language) describes it in terms of languages with which I have literally no familiarity, such as FORTRAN and Pascal. For example, the book carefully explains that the notation a[i, j] for accessing elements of multidimensional arrays - notation which I have never seen anywhere - is illegal. At the same time, K&R singularly fail to make clear that in C it is impossible to choose the size of an array at run time. Once this fact dawned on me it made life a lot easier and, in retrospect, it is obvious that dynamic arrays would be a feature only of dynamic programming languages, which C isn't. But the book was written before dynamic programming languages existed, or the world even had a word for them. Dynamic array sizing just wasn't there at the time, in the same way that nowadays you would never stop to explain that no, this car cannot fly.

It is enlightening and relaxing to finally find out where things like printf() and fopen() are actually invariably cribbed from; to find out what malloc actually is and why (see above) it is even necessary. It's valuable to learn the true power and practical use of memory pointers. It's cool to be finally exposed - if briefly, like radioactivity - to the layer where things like binary trees and linked lists are actually important to the process and need to be built manually.

It's also aggravating and frustrating. C's strict type safety is wonderful, except when it isn't. Its type declarations are confusing and strange. (why are the square brackets on the label, not on the data type? int[10] foo makes more sense than int foo[10], doesn't it? I'm not crazy?) Handing pointers to functions around, yes! Null-terminated strings, erm, okay. Not checking array bounds? When would that ever not lead to a catastrophic error of some description? Here's another one: "Hey, what happens if you forget to free some of the memory you malloced?"

K&R deserve all the praise they have ever received for creating a language book which is so small that it is actually useful. It's the pocket Psalms and New Testament of system programming. Ever tried to use Programming Perl as a reference? "Dear Zarquon, Larry," I think to myself every time I try, "stop punning and get to the point!" But K&R also seem to have an unbecoming fetish for extreme terseness in their programs. Their persistent omission of braces is alarming. And I hope C isn't the first programming language you learn and K&R isn't the first programming language book you read, because unscrambling

```while((s[i++] = t[j++]) != '\0')
```

into something legible will take you a solid week, and it'll be a lifetime before anybody forces you to unlearn nonsense like

```isdigit(s[++i] = c = getch())
```

. Never in my life have I thought that I would have to write this down, but don't do more than three things simultaneously in one line of code! I never imagined that there was anybody in the world who needed to be told that, but here we are. It's not even faster code! It's just the idiotic idiomatic style!

Method

The basic pseudocode for the brute force algorithm is this:

```who_wins(well W) {
for each piece P {
for each landing state of piece P in well W {
well W2 = result
if(that_makes_a_line(W2)) {
next piece
}
if(who_wins(W2) == PLAYERWINS) {
next piece
}
}

// AI wins every landing state
return AIWINS
}

// Player wins every piece
return PLAYERWINS
}

print who_wins(empty_well)
```

Since it would be preferable to know who wins, and how, what I actually implemented is a little more complicated, and a full solution is returned in each case.

1. If the player wins, then the solution is a list of 7 "magic landing spots", one for each possible supplied piece. A magic landing spot need not immediately create a line. It may be part of a strategy to force a line later.
2. If the AI wins, then the solution is simply the identity of a single "killer piece". A killer piece need not immediately result in a player loss. It may be part of a strategy to force a player loss later.

The major optimisation breakthrough I made when planning this program was to observe that when you consider all 7 possible pieces in all 19 possible orientations, from left to right and top to bottom in the well, there are only approximately 4000 possible states which pieces can occupy during the course of any given game. These states form a network or directed graph, joined by edges representing player input. Starting from the original spawning state, pressing Left leads to one state, Right leads to another, as do Down and Rotate. If the wall is in the way, it is possible that pressing any given key will not move the piece, resulting in an edge leading from a state back to itself. There are in fact 7 disconnected graphs of states, one for each possible piece.

Having seen this, I wrote my program to generate every single possible state in a single procedure right at the start of execution, before beginning the brute force search. During this initial stage, all the tedious calculation of bit masks, piece rotation and piece dimensions and positions and offsets is handled and then cached in a single big array for future reference. Also cached are the edges connecting states together.

Pre-calculating all of this makes several things easier, such as detecting a collision between a piece and the contents of the well, inserting a piece into the well and removing the piece from the well afterwards. But fastest of all is locating new landing sites. By simply "painting" every state as "unreachable" and then starting from a single guaranteed "reachable" spawn point, I wrote what basically amounts to a "flood fill" routine which explores the whole state space looking for accessible states (and indirectly ruling out inaccessible states) and incidentally collecting a list of those states which happen to be landing points.

In the first few versions of this program, I ignored duplicate wells and solved every single possibility naively. This allowed me to calculate a solution for a large selection of wells. The longest solution (width 8, height 7, player wins) took some 9 hours to solve. However, I suspected that some wells were being solved over and over again using this method. At length, I rewrote the program to store the results for each well in turn. This resulted in a program which ran much faster. Unfortunately, it now rapidly rans out of memory when solving the larger wells. Even more unfortunately, the new program could not generate any new results.

Code

I used Fabrice Bellard's Tiny C Compiler to compile and run my program on Windows. I'm not in a position to compare this with other C compilers, but TCC is extremely small and fast and did the job, at a time when most of the people I asked were seriously suggesting that I switch to Linux just so that I could run "Hello world".

The program:

• first version - Uses a very small amount of memory, can theoretically handle arbitrary well sizes. Slow, and doesn't give a complete result
• second version - Caches well results, runs like the blazes, pretty code, pretty output. Uses all your memory

The only two options you should really worry about are the WIDTH and the HEIGHT of the well. These are set using #define statements at the top of the code. Possibly this doesn't provide a substantial performance improvement over taking those numbers from the command line, but it was just how I implemented the configuration to begin with and I never got around to changing it.

I'm not a C expert. I'm pretty sure, though, that this program strikes a good balance between being readable and being high-performance. The most scope for improvement would be in a re-written, more efficient algorithm, with a focus on intelligence and saving memory, and less emphasis on brute force.

Results so far

As mentioned, for odd-numbered widths, the AI wins by supplying an endless stream of O blocks.

Who wins? WIDTH
4 6 8 10 12+
HEIGHT 0 AI AI AI AI AI
1 AI AI AI AI AI
2 AI AI AI AI AI
3 AI AI AI AI ?
4 AI AI AI AI ?
5 PLAYER AI AI AI ?
6 PLAYER PLAYER AI AI ?
7 PLAYER PLAYER PLAYER ? ?
8+ PLAYER PLAYER PLAYER ? ?

Results for (width, height) = (8, 6), (8, 7) and (10, 5) were computed using tetris4.c. tetris6.c requires 13GB of memory to compute (8, 6), and runs out of memory in the other two cases.

All other results can be generated using either program - tetris6.c is recommended as it provides more informative output.

I am pleased to have completely solved Tetris for wells of width 4, 6, and 8 (and, trivially, 5, 7 and 9). I'm disappointed not to have been able to solve Tetris for width 10, but I remain firmly convinced that a player victory is inevitable there too.

What's next?

• I considered a method for avoiding having to "flood fill" the whole state space every time a new piece was added. Instead of painting every state as unreachable every time, we would keep the "reachability" attribute as a persistent attribute. First, we would also create and cache a list of "predecessor" states for each state. (In other words, the same digraph but with all the edge directions reversed.) For each state, we would also collect a list of other states overlapping that state.

Then, whenever a new piece is locked into the well, we would mark that lock state and any states listed overlapping it as unreachable. Then, we could do a sort of smaller "reverse flood fill", looking for all states which have now become unreachable because all their predecessors are unreachable too. When the new piece is removed from the well, we mark its lock state and all the states listed as overlapping it as reachable again and perform a conventional flood fill to revert everything to the way it was.

It's not clear whether this "flood fill / flood unfill" procedure would be any quicker than doing a conventional flood fill from the top every time. The "reverse flood fill" is much slower than a flood fill because every state needs checking multiple times. And just implementing the pre-caching of overlapping states and nothing else occupied much more memory and made my program take substantially longer to run.

• I realised that if the player can force a line at row 2 (for example) in a well of width 4, then the player can simply do this twice side by side in a well of width 8. This would result in a complete line at row 2 spanning the whole well, and this would in turn result in a general solution for wells of all widths divisible by 4: 4, 8, 12, 16 and so on. If the same could be done for a well of width 6, then 4+6=10, 4+4+6=14, and so on, so we would have a general solution for the whole game!

Unfortunately I ran some tests and proved that the AI, if so inclined, can deliberately prevent a line from forming at any specific row it so pleases, even while the player creates lines at other rows. This is achieved by supplying O and I blocks in particularly mean combinations. There might be some more meat in this approach but I can't get to it.

• There's limitless scope for smarter investigation of all the possibilities, and for being choosy about what to cache, freeing up solutions to solved wells when they're no longer needed in memory, and so on. But this isn't easily accomplished in C and it needs a better mathematician than me to implement. Frankly, I've lost the will to continue. If you discover any simple modification to the code which results in more results, particularly if you can prove that you can force a player victory for width 10 at any depth, please let me know. To say that Tetris has been "solved" would be awesomely cool.

• Although I remain convinced that the player can always force a win in a standard Tetris well, I'm strongly considering taking the optimised C code that was developed here and rolling it into a newer, more hate-filled version of HATETRIS at some point in the future. I need a break right now though, so that won't happen immediately.

• Constraining the AI to only O blocks solved the odd-numbered widths very quickly. Could constraining the choice to 6, 5, 4, 3, or 2 pieces make the process faster in the other cases?

• It might be possible to sort the list of possible landing states so that states advantageous to the user appear earlier. This would allow us to exit earlier from branches where the user wins and shorten processing time.

Discussion (32)

2011-07-21 01:33:03 by Jack:

Re: int[10] foo; vs. int foo[10]; :

There was an experiment in C to make declaration follow use, so if you write int *foo;, you're saying that *foo is an int. Similarly, int foo[10]; means that foo[0] through foo[9] are all int.

Most consider it a failed experiment.

2011-07-21 01:36:04 by Matt:

Even if you desperately wanted/needed to use GCC, there'd be no need to switch to Linux. I have it on windows via MinGW, and Eclipse set up with 'CDT' so I can have the compile/build happen with a press of Ctrl+B, and so I can program C or Java in the same IDE (those being the 2 languages I've seen most of in my course at uni).

Cool project... have yet to do anything on the same scale myself. Although, might just be comparing the coolness rather than the amount of coding.

As to C itself, I have a certain fondness for the way it lets you do things that are, officially speaking, kinda stupid. Y'know, just in case you ever had a legit reason to do them. Less in the way of "No, that might be wrong, I'm not doing it", more "Sure, I guess I can do that if you really want me to".

2011-07-21 02:39:43 by MWchase:

If you want to get a taste of that multidimensional indexing, mess around with Numpy some. The syntax is hybridized with Python's own slice notation, so you get to look at strings of commas interspersed with colons. Mmmm.

(This makes sense, of course, because Numpy relies on FORTRAN, and can manipulate its arrays in FORTRAN-specific formats.)

2011-07-21 03:38:25 by AndrewJenner:

I suspect there may be a bug in your program - I tried to solve tetris for width 4 a while ago (http://www.reenigne.org/blog/tet4-is-a-game-of-luck/) and came to the conclusion that the AI can always win for any height by repeating the sequence SSZ. What does your program say that the player should do in this situation?

2011-07-21 04:04:49 by RyanVoots:

for a width of 4, you can always win with SSZ

+----+
| S |
| SS|
| ZSS|
|ZZSS|
|Z S|
+----+

stack two S on one side, then put the Z on the other side.

2011-07-21 04:07:54 by RyanVoots:

Ok I figured it'd bork the ascii art. I also just realized that the other issue at stake here is that you two have two different definitions of winning. Here we're looking at, "Can the player always make at least one line" instead of "can the player keep the lines up indefinitely".

also the "captcha" doesn't understand electrical engineers

2011-07-21 04:08:24 by Marc:

Why ?

Put the two S vertically side by side, that's it, the row 2 is done.

The problem here is not "playing inedefinitely" (might be impossible in all cases in my opinion, it is an interesting problem) but "doing at least ONE line" :)

2011-07-21 04:33:58 by A:

Nice Article. Just one note Dynamic Languages did exist before C (Lisp for instance was(and its descendants still are) dynamic).

2011-07-21 04:53:18 by AndrewJenner:

Ah, sorry - misunderstood the problem.

2011-07-21 05:41:03 by EgyptUrnash:

> isdigit(s[++i] = c = getch())
> Never in my life have I thought that I would have to write this down, but don't do more than three things simultaneously in one line of code!

Here's an exercise: Shrink your text editor window to 80x50 characters. Do not use multiple windows - the only way you can see two chunks of code on-screen at once is to use the split mode of your text editor. Use a big font of course, we're not masochists here. Say, one that gives you the equivalent of an 8" screen will be fine.

Now try to make sense of a nice, verbose, linefeed-loving Python program. Try to fix a bug in it. Oh, and while you're at it, turn off your editor's autocomplete.

When you have to peer at your code through a tiny little screen like this, succinctness is a virtue - learn the idiomatic shorthands, and you can actually have a hope of fitting an entire routine that does something interesting on-screen at once.

2011-07-21 07:19:20 by Pete:

Regarding the 80x50 screen, try 80x24 next. Then try it with ed. Then, finally, try an 80xNothing terminal: send all output to a dot matrix printer to simulate a teletype. One-liners are actually more maintainable at this point, as you need not print much (incurring not just delays, but expense in the form of paper and ribbon).

In light of all that, I'm so glad that I can wander over to a consumer electronics site and get a 23" 1080p monitor for under \$200. Living in the future is awesome.

(Mostly; I still use ed as a commit message editor. It's nice to have something that doesn't take over the screen for a one-line comment about a typo cleanup.)

2011-07-21 08:39:21 by YarKramer:

I'm tempted to take tetris6.c and convert it to C#, which *can* handle all that dynamic business ...

2011-07-21 08:46:17 by riku:

"At the same time, K&R singularly fail to make clear that in C it is impossible to choose the size of an array at run time. Once this fact dawned on me it made life a lot easier and, in retrospect, it is obvious that dynamic arrays would be a feature only of dynamic programming languages, which C isn't. But the book was written before dynamic programming languages existed, or the world even had a word for them."

This is not quite true. In recent standards of C (and C++) it is possible to decide the size of an array at runtime and it's been possible within previous standard versions using alloca(). This only works for variables you declare in your code and is allocated from the stack. This is not possible for e.g. struct members or function parameters (where you should use pointers instead of arrays).

The reasons behind not being able to do Java-style variable sized arrays is that C uses a two-fold memory management scheme, where some memory (local variables) is allocated statically (at compile time) from the stack and the rest is allocated dynamically (at run time, with malloc) from the heap. This makes all the sense in the world if you're writing operating systems, but not so much for application programmers and that's why modern languages use a garbage collector (with the exception of (c)python that does reference counting) to manage a heap of memory.

Your statement about dynamic languages does not quite make sense either. "Dynamic programming languages" is not a very well defined term, but often it's used to describe languages like Ruby, Python... and Lisp. Lisp predates the C programming language by more than a decade and had "modern" features like garbage collection and homoiconicity. A key feature of dynamic languages is dynamic duck typing, which is largely unreleated to size of arrays.

A good article, but try to improve on your "background research".

2011-07-21 10:11:11 by ErikB:

Your memory problem can be easily solved with putting your data (especially the static one that will after generation never change again) into a database. Also the chance is high, that the database solves the datahandling issue faster, bigger and more reliable then anything a normal person like you and me could code over a weekend into our own code.

2011-07-21 12:37:51 by Toby:

"the book was written before dynamic programming languages existed, or the world even had a word for them"

Not true. There were dynamic languages before C, notably Lisp (1958).

2011-07-21 12:38:58 by mech:

tetris6 DOES love memory usage. I sat there watching it chew through all 12GB of my ram in about 10 seconds, then it tried to use up swap space; but by then it was fast approaching the point where the computer would have been operating too slowly to kill it.

2011-07-21 13:35:37 by Sam:

Hey, we're brute-forcing an NP-hard problem here! I found it quite entertaining to watch as it chewed up 2GB of RAM on my machine. Luckily when it runs out of RAM it does shut down and free all of that up. It shouldn't actually destroy your machine. I hope.

2011-07-21 17:20:07 by Val:

while((s[i++] = t[j++]) != '\0')

It didn't took a week for me to unscramble this, just a couple of seconds. Am I too far on the dark side, or is there still salvation for me?

2011-07-21 22:15:45 by Ross:

This is what happens when generations of programmers (or engineers, or scientists, or priests, etc) build on previous generations' knowledge. You consider C primitive. But just TRY to do the things that C could do -- on the machines it could do it on, like a PDP-7 -- with a "more powerful" language. You can't.

Modern programmers don't understand the hardware anymore. We write programs for abstract machines, which just by magic seem to run on this silicon and steel boxes.

2011-07-21 22:36:25 by Val:

C is still by all means not outdated and never will be unless/until when we completely redesign the concepts how computers and programming languages work. Most modern languages derive at least most of their syntax of it, and all the high level languages (except some highly specific functional ones) still have structured programming in the core of their classes.

Knowing C means being able to understand at first glance what any simple algorithm in nearly any existing programming language is about.

2011-07-23 04:20:14 by David:

Are you trying to solve this problem yourself, or do you just want to know the answer? If it's the latter, you should probably check out this paper: http://www.iam.ubc.ca/theses/Brzustowski/brzustowski.html

(no spoilers in the comments, please!)

2011-07-23 13:00:46 by Sam:

Those papers deal with winning Tetris forever. I'm just trying to get one line.

2011-07-24 19:47:52 by Repomancer:

C is by no means outdated. When you're looking for brute performance, it's nice to dispense with all the hidden overhead and just hammer on the hardware. You can make a run-time dimensioned array if you want, sort of; malloc() the memory, cast a pointer to it, and treat it as an array. You can usually look at a few lines of C and tell basically what assembler instructions it'll turn into after compilation.

Today's fast machines make it a lot less important, but for Life In Real Timeā¢ C (and its big brother C++) remain the overwhelming choices. Things like garbage collection, dynamic allocation, etc., come at a price -- the compiler / interpreter / run-time environment are busy behind the scenes in ways that incur overhead. Much of the time, it doesn't matter. Sometimes, it does.

You can certainly get into more trouble with a C compiler than with practically anything else. No sandbox 'round here. But look at what other languages are written in; I don't think anyone has written a Java interpreter in Java.

2011-07-27 08:19:54 by LeVentNoir:

I'm a university student now and I'm going to drop in here right now and say C is a powerful, fast, and highly suited language that has large amounts of relevance in modern computing.

Microcontrollers have C/C++ as the upper limit on their programming language due to hardware constraints. Java? Python? Almost useless when you are bitmasking out your registers.

int[10] foo makes perfect sense. I want 10 ints, and I want them all to be called foo.

A 1MHz 8 bit micro is pretty much the same today as in 1980. And so the code is the same.

Frankly, seeing how large software programs made by computer scientists run, I'm glad there are languages like C that say "no, do it the good way, not the easy to code / understand way"

2011-07-28 00:08:00 by OvermindDL:

It ate my 16 gigs well, but for note it did have a compile error (here is hoping that the comments does not eat the ascii art, it is already missing the pretty colors from the terminal after all):

tetris6.c:660:45: warning: conversion specifies type 'int' but the argument has type 'struct State *' [-Wformat]
printf("I am piece %c in orientation %d @ %d\n", pieceNames[statePtr->pieceId], statePtr->orientationId, statePtr);
~^ ~~~~~~~~
tetris6.c:665:28: warning: conversion specifies type 'int' but the argument has type 'struct State *' [-Wformat]
printf("nextPtrs[%d] is %d\n", keyId, statePtr->nextPtrs[keyId]);
~^ ~~~~~~~~~~~~~~~~~~~~~~~~~
tetris6.c:795:1: error: 'main' must return 'int'
void main(int argc, char ** argv) {
^
tetris6.c:818:27: warning: conversion specifies type 'int' but the argument has type 'long' [-Wformat]
printf("Processing took %d second(s)\n", end - start);
~^ ~~~~~~~~~~~
%ld
tetris6.c:819:28: warning: conversion specifies type 'int' but the argument has type 'long' [-Wformat]
printf("Number of wells: %d\n", countSolutions(rootNodePtr, 0));
~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%ld
tetris6.c:828:2: error: void function 'main' should not return a value [-Wreturn-type]
return 0;
^ ~
4 warnings and 2 errors generated.

Basically, your main returns int, not void, and you have a few warnings as well.
There are ways to potentially speed this up, probably a lot.
Some C++ template work would simplify a lot of it as well, while potentially making it faster.
And of course Haskell could lazilly guess at a lot of that too.

If you want, I could probably remake it in C++, more succinctly and potentially a lot faster as well. Need to find time to do that, hmm...

And yes, it did eat up most of my 16 gigs of ram quite well.

Oh, I used clang as the compiler, works on linux/bsd/mac/unix and Windows (with a little finagling) as a C compiler quite well, and it generates a lot faster code then tcc, and compiles quite fast too (and I am using the Assert-full version, which is imperceptibly slower too):

~/C++/tmp\$ time ../llvm/build/Release+Asserts/bin/clang tetris6.c -lm
tetris6.c:660:45: warning: conversion specifies type 'int' but the argument has type 'struct State *' [-Wformat]
printf("I am piece %c in orientation %d @ %d\n", pieceNames[statePtr->pieceId], statePtr->orientationId, statePtr);
~^ ~~~~~~~~
tetris6.c:665:28: warning: conversion specifies type 'int' but the argument has type 'struct State *' [-Wformat]
printf("nextPtrs[%d] is %d\n", keyId, statePtr->nextPtrs[keyId]);
~^ ~~~~~~~~~~~~~~~~~~~~~~~~~
tetris6.c:818:27: warning: conversion specifies type 'int' but the argument has type 'long' [-Wformat]
printf("Processing took %d second(s)\n", end - start);
~^ ~~~~~~~~~~~
%ld
tetris6.c:819:28: warning: conversion specifies type 'int' but the argument has type 'long' [-Wformat]
printf("Number of wells: %d\n", countSolutions(rootNodePtr, 0));
~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%ld
4 warnings generated.

real 0m2.605s
user 0m0.110s
sys 0m0.140s

2011-07-28 13:23:36 by Sam:

Those are some fairly trivial errors and trivial enough to fix. I don't particularly want to see this redone more succinctly. I think that the procedure is complicated enough from an algorithmic perspective as it is. I also fail to see how a C++ template would make any of this faster. I don't have the slightest interest in improving compilation time. If you have benchmarks for comparing actual run-time performance with tcc, share them.

2011-07-28 23:02:49 by OvermindDL:

C++ templates work can pre-calculate and optimize a lot of access. Think of C++ templates as haskell but with a different syntax, that is pretty exactly how it is.

Although to get truly faster you could use OpenCL or so and let it bulk parse a lot of that on multiple cores, video cards, etc... That would increase the speed rather substantially.

As for TCC, even its description states:
"""
Although the TCC compiler itself is exceptionally fast and produces very small executables, there is an inherent trade off between this size of the compiler and the performance of the code that TCC produces.
"""
With plenty of benchmarks around. The difference would be no where near as substantial as moving the calculations to a different framework like OpenCL or so though. FORTRAN would also do well as it consistently out-performs C in numerical code such as this, but still not by such a factor as a different framework would make.

2011-07-28 23:54:14 by OvernmindDL:

Never could get tcc to link, so I tested with (an older) clang and gcc, this was my result at 8:6 size, and I do have enough free/unused memory for it to allocate it all without ever swapping:
"""
clang
+--------+
|........|
|........|
|........|
|........|
+--------+
|........|
|........|
|........|
|........|
|........|
|........|
+--------+
An evil AI can always kill you with no lines from this position, if it gives you piece I

WIDTH = 8, HEIGHT = 6
Processing took 44 second(s)

Number of wells: 4906026

gcc
+--------+
|........|
|........|
|........|
|........|
+--------+
|........|
|........|
|........|
|........|
|........|
|........|
+--------+
An evil AI can always kill you with no lines from this position, if it gives you piece I

WIDTH = 8, HEIGHT = 6
Processing took 44 second(s)
Number of wells: 4906026
"""

I actually did not expect clang to do so well.

And something I noticed was that most of that time was taken by *just* allocating the memory. I went into the code and found a lot of little bits of malloc around the place. The program would run a *great* deal faster if the needed memory was pre-allocated first since once it does allocate the memory, the program seems to run within just a few seconds. If the data is rearranged so that parts that are accessed together are placed together would also likely reduce runtime noticeably as well.

2011-07-29 00:10:37 by OvermindDL:

It was also only pegging one out of the 6 cores on my server, multi-threading it would no doubt help too. Would want to set up separate heaps for each thread with the number of threads equaling the number of hardware cores (hyperthreading would slow it down in this case) and that would let them operate in their own memory on their own chunks for optimal caching (assuming that the memory of the structures are rearranged a bit for better caching), should communicate by message passing would probably be the most efficient (assuming there are a lot of solutions, else a semaphore would be fine).

Also perhaps accessing each 'state' of the grid as a bitarray instead of a byte array (as it is currently being done) would decrease memory usage *SUBSTANTIALLY*.

All the pointers are also not necessary, those could easily blow the cache repeatedly as well, affecting speed. If memory were to be rearranged better then you could also change all access to be more direct as well, with the states referencing indices of others.

Hmm, actually instead of that full state structure, maybe only storing the bitarray along with some bitflags of whether it is a contested state, win, or lose, and rather then have the 'pieces' and orientations and such be part of the state, have them be the edges of the directed graph. If moved to C++ and used Boost.Graph then that would help both memory and speed (although if done inline without boost then could potentially reduce the pointer count, would not be easy, not worth it really), and Boost.Graph has adapters for spreading the calculation amongst multiple nodes (MPI) so you could split the calculations across multiple computer with a little bit of work as well. That might be the way to go if you wanted to solve a full size Tetris board. Other methods to of course, hmm...

2012-06-14 22:54:32 by DSimon:

If your cache is taking up too much memory, then the classic solution is to fill the cache as it is needed rather than in advance. Whenever you're going to add a new entry to the cache, if the cache has too many entries in it, then delete the least frequently used entry. You can use a heap to keep track of usage efficiently.

2012-06-14 22:56:22 by DSimon:

Er, I should have said "least recently used", not "least frequently used". The latter could put you in a situation where you're throwing away cache entries right after you've added them.

2013-01-19 00:13:26 by Helena:

You might be interested in bastet. http://fph.altervista.org/prog/bastet.html