Retargetability of the Orange C toolchain

When one talks about retargability, one often is talking about retargeting a compiler, so that it can handle either a different frontend {which would the difference between C and Pascal for example} or a different backend {which would be the difference between running on an Intel processor and a Motorola processor for example}.

But there is more to it than that if one starts considering how to retarget an entire toolchain.

Retargeting the compiler is still important, and it turns out that is one of the more difficult things, but other things in the tool chain need to be retargetable as well, for example the mapping in the assembler between human readable code and binary data needs to be retargetable.

The following items need to be retargetable to make a truly retargetable toolchain:

These items will be elaborated in the following sections.

the mapping from the source code language such as C or C++ to some intermediate code representation

Basically this means the source code language parser and the intermediate code need to share as little as possible in terms of data structures. In OrangeC it also means that aliasing calculations are pushed out of the front end into the intermediate code to genericize the process. There is an interesting research paper that shows that you don't really lose much from the loss of context this implies.

optimizations and code improvements need to be done as much as possible in the intermediate code

The idea is that the work in the front end and back end should be minimized, to allow easier retargeting. In practice this is problematic however, as the backend sometimes can generate more optimal code if it has live register information for example. To ease the process of retargeting OrangeC does the register allocation in the intermediate code phase, with some guidance from the back end in terms of what registers are dedicated to different purposes.

the mapping between the intermediate code and the assembly language

This is where some future work in Orange C lies - there is an architecture description under development that presently guides the process of translating assembly language to binary codes; eventually it will also guide the translation into assembly language given the intermediate code.

the mapping between the assembly language and the binary output

As it turns out the assembler is already highly retargetable; past work has left the project with an architecture description file which should be suitable for guiding an arbitrary translation from assembly language mnemonics to binary codes.

the mapping from the binary output to the actual data file an OS requires of its executables

The linker has generic output file format. However, it has the concept of something similar to 'plugins' which can be configured in an XML file and then selected from the command line. Each plugin comes with a specification file that is used to guide the formation of sections such as code and data sections, and may also allow specification of constants that will be used by a post-link step to generate the output file executable. These constants may be further taylored from the linker command line using a concept similar to defining macros on a C Compiler command line; for example WIN32 PE files have internal alignment values that may be affected this way.