Leveraging build to tune for size, performance on DSPs

I wrote earlier about steps you can take in writing your code and designing data structures to optimize for code size, performance and power in embedded systems. This need to compress software to fit was common on early computers but is now almost forgotten on common 64-bit systems with gigabytes of memory. Embedded systems take us back to the future in needing to resurrect those skills to balance code capabilities with highly constrained memory capacities.

Source: CEVA

Quite a lot of the optimization that is possible will depend on your careful design and tuning of the code. But the build tools – particularly the compiler and linker can also help. In this blog I’ll talk about options you can use with these steps in the CEVA-Toolbox. In all cases I will be concentrating on optimizing for code size, since that will be your primary constraint.

Compiler options

When you’re designing and debugging your code, you’ll almost certainly be running with a -g option, requiring that debug information should be generated. Running with this option prevents the compiler from performing any optimizations that might distort the code in any way that would complicate debug. When you’re getting serious about optimizing code size, you’re going to have to remove that option.

The next consideration is how the compiler chooses to optimize. By default it will optimize for performance through multiple methods. One method will unroll (limited size) for-loops by replicating the code for each iteration of the loop. This avoids the overhead of setting up and testing the loop index on each iteration but it obviously consumes more memory. Using the -Oz option will prevent unrolling, preferring the smaller and slightly slower implementation

Another option the compiler can use to optimize for performance is to inline certain functions (particularly small functions). This can eliminate the overhead of pushing and popping arguments onto and off from the stack and jumps to and from the called function. But again it will increase code size if the function is called more than once. Use the -INLINE=no option to suppress this automatic inlining.

Another optimization may seem hardly worth the effort on conventional platforms but can make an important difference to code size on DSPs. This is to disable (as appropriate) compiler protections against pointer aliasing. The point of such protection is to ensure that, when choosing to parallelize a set of instructions on a VLIW machine (such as a DSP), the compiler will ensure that any pointer references in those instructions cannot get into a race condition if two or more of them are pointing to the same bit of data. This can limit the extent to which some instructions can run in parallel. You can force an interpretation that no such cases can occur using the option -alias=restrict, which should allow for more parallelism to be inferred. Naturally you should check and carefully and regress fully to ensure that this interpretation is safe.

Linker

The linker can also perform size-related optimizations; one of these is to remove unreferenced functions. This requires some care. Some functions may be called through a data pointer or even by a direct jump to a hard-coded address. And interrupt service functions are typically accessed through the conventional call protocol. Therefore this option must consider multiple possibilities. It will be invoked automatically and can be disabled using the -keepUnrefFuncs option.

Another linker optimization can further reduce size of the code where certain symbols have been left unresolved by the assembler, to be addressed at link time. Since they start unresolved, the assembler has to assume maximum possible size to address the target processor, which may actually turn out to be wasteful when finally resolved in the linker. If no special action is taken, many such symbols may resolve to quite small addresses, but still occupying the maximum possible address word-size. Shrinking these locations can significantly shrink total code size. This is another delicate task; in shrinking any given address, direct references anywhere in the code to locations following that symbol must be adjusted. Also data alignment requirements (sometimes processor-specific) must be taken into account. Each reduction has to be considered for the best minimal encoding not only on space-saving impact but on implications for the rest of the code. Fortunately for you, these optimizations are performed by default.

Taken together with best coding practices, these compiler and linker options, used with care, can help further shrink code and data size to fit your embedded system most cost-effectively. Which in turn increases your value as an experienced embedded system programmer, always a desirable goal!

Published on Embedded.com.

Ariel Hershkovitz

Tool support in optimizing your DSP application

Get in touch