Why You Should (Almost) Always Be Using 32‐Bit Variables in Your Code

TL;DR: Make it a rule to always use u32, s32, or bool32 types for variables declared inside your functions and for your function parameters/return types. At the very least, your loop control variables should ABSOLUTELY ALWAYS be a 32-bit datatype.

The GBA Processor and You

Every CPU has a native datatype, a type that it is designed to deal with better than any other. For the GBA CPU, that datatype is 32-bits wide. This is also sometimes referred to as a word. The GBA's instruction sets are optimized for word-sized chunks, so that is why you should be giving it 32-bit values, as that's what it's best at handling.

On some level, 32-bit integers are all the GBA understands. The other datatypes have to be converted into a 32-bit integer before they can be used, which incurs performance penalties. This happening every so often is not a big deal, but it quickly adds up, especially in loops.

But I Want to Save Space by Using the Smallest Datatypes That Can Fit My Data!

This is actually a trap in 99% of cases. Because the compiler has to emit shift instructions to convert your smaller datatypes into 32-bit integers every time they are referred to, your smaller datatypes actually take up more space in the ROM than if you were to just use u32, s32, and bool32. They also make your code run slower.

Replacing almost every loop control variable in pokeemerald with 32-bit versions of their respective datatypes saves over 4000 bytes in ROM, for example. This is because those shifts used to convert the datatypes are no longer there.

What Is the 1% Case Where I Would Use Smaller Datatypes?

You would only use smaller datatypes if they were going to be stored in memory. An example of this would be in the saveblocks. You should use the smallest-possible datatype that will fit your data in situations like this, and just absorb the performance penalty when they are used in code (or, you could cast them to a 32-bit datatype when assigning them to a temporary, local variable if you wanted to so you only take the performance hit once).

Generally, local variables and function parameters should be 32-bit in almost all situations. Arrays, globals, and structs go into memory, so they can be of smaller datatypes. Things that are stored into EWRAM or IWRAM should be in smaller dataypes, if possible. If data is being loaded from ROM or RAM it can also be faster to load if it's in a smaller datatype, as there is a small performance hit to loading 32-bit values from those places as opposed to loading smaller ones.

The takeaway here is, any variables you are declaring in the body of your functions and all of their function parameters should probably always be 32-bit. Stick to this rule and you will be in good shape.

Can You Give Me an Example of Why This Is Important?

The following is an excerpt lifted directly from Tonc, which will be linked again for further reading in the next section.

Ints versus non-ints

Above, I noted that use of non-ints can be problematic. Because this bad habit is particularly common under GBA and NDS code (both homebrew and commercial), I'd like to show you an example of this.

// Force a number into range [min, max>
#define CLAMP(x, min, max)   \
    ( (x)>=(max) ? ((max)-1) : ( ((x)<(min)) ? (min) : (x) ) )

// Change brightness of a palette (kinda) (70)
void pal_brightness(u16 *pal, u16 size, s8 bright)
{
    u16 ii;
    s8 r, g, b;

    for(ii=0; ii<size; ii++)
    {
        r= (pal[ii]    )&31;
        g= (pal[ii] >>5)&31;
        b= (pal[ii]>>10)&31;

        r += bright;    r= CLAMP(r, 0, 32);
        g += bright;    g= CLAMP(g, 0, 32);
        b += bright;    b= CLAMP(b, 0, 32);

        pal[ii]= r |(g<<5) | (b<<10);
    }
}

This routine brightens or darkens a palette by adding a brightness-factor to the color components, each of which is then clamped to the range [0,31⟩ to avoid funky errors. The basic algorithm is sound, even the implementation is, IMHO, pretty good. What isn't good, however is the datatypes used. Using s8 and u16 here adds an extra shift-pair practically every time any variable is used! The loop itself compiles to about 90 Thumb instructions. In contrast, when using ints for everything except pal the loop is only 45 instructions long. Of course the increase in size means an increase in time as well: the int-only version is 78% faster than the one given above. To repeat that: the code has doubled in size and slowed down by 78% just by using the wrong datatype!

I'll admit that this example is particularly nasty because there is a lot of arithmetic in it. Most functions would incur a smaller penalty. However, there is no reason for losing that performance in the first place. There is no benefit of using s8 and u16; it does not increase readability – all it does is cause bloat and slow-down. Use 32-bit variables when you can, the others only when you have to.

Now, before this becomes another goto issue, non-ints do have their place. Variables can be divided into two groups: worker variables (things in registers) and memory variables. Local variables and function parameters are worker variables. These should be 32-bit. Items that are in memory (arrays, globals, structs, and what not) could benefit from being as small as possible. Of course, memory variables still have to be loaded into registers before you can do anything with them. An explicit local variable may be useful here, but it depends on the case at hand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly