Source-to-source compiler
A source-to-source compiler, transcompiler, or transpiler is a type of compiler that takes the source code of a programming language as its input and outputs the source code into another programming language. A source-to-source compiler translates between programming languages that operate at approximately the same level of abstraction, while a traditional compiler translates from a higher level programming language to a lower level programming language. For example, it may perform a translation of a program from Pascal to C. An automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g., OpenMP) or language constructs (e.g. Fortran's DOALL
statements).[1]
Another purpose of source-to-source-compiling is translating legacy code to use the next version of the underlying programming language or an API that breaks backward compatibility. It will perform automatic code refactoring which is useful when the programs to refactor are outside the control of the original implementer (for example, converting programs from Python 2 to Python 3, or converting programs from an old API to the new API) or when the size of the program makes it impractical or time consuming to refactor it by hand.
Contents |
History
One of the earliest programs of this kind was Digital Research's XLT86 in 1981, a program written by Gary Kildall, which translated .ASM source code for the Intel 8080 processor into .A86 source code for the Intel 8086. Using global data flow analysis on 8080 register usage, the translator would also optimize the output for code size and take care of calling conventions, so that CP/M-80 and MP/M-80 programs could be ported to the CP/M-86 and MP/M-86 platforms automatically. XLAT86 itself was written in PL/I-80 and was available for CP/M-80 platforms as well as for DEC VMS (for VAX 11/750 or 11/780).[2]
A similar, but much less sophisticated program was TRANS.COM, written by Tim Paterson in 1981 as part of 86-DOS. It could translate some Z80 assembly source code into .ASM source code for the 8086, but supported only a subset of opcodes, registers and modes, often still requiring significant manual correction and rework afterwards. Also it did not carry out any register and jump optimizations.[3]
Programming language implementation
The first implementations of some programming languages started as transcompilers, and the default implementation for some of those languages are still transcompilers:
- C++ (known at the time as "C with classes") was transcompiling to C with the cfront transcompiler
- Eiffel, transcompiling to C
- Lisaac, transcompiling to C
- Vala transcompiling to C (with additional libraries such as GObject).
- CoffeeScript, transcompiling to JavaScript
- haXe, transcompiling to JavaScript, PHP, C++, C#, and Java. Also compiling to bytecode such as ActionScript bytecode.
- Dart, transcompiling to JavaScript
- Mirah, transcompiling to Java
- Efene, transcompiling to Erlang
- Xtend, transcompiling to Java[4]
- PHP, transcompiling to C++ using HipHop at Facebook
Porting a codebase
When developers want to switch to a different language while retaining most of an existing codebase, it might be better to use a transcompiler compared to rewriting the whole software by hand. In this case, the code often needs manual correction because the automated translation does not work in all cases.
- the 2to3 script can turn Python 2 programs into Python 3 programs. Even though 2to3 does its best at automating the translation process, further manual corrections are often needed.
- Emscripten compiles LLVM bytecode to ECMAScript. This allows running C/C++ codebases in a browser for example.
- Naca[5] transcodes COBOL code into Java code running on top of the NacaRT runtime library. It helped saving 3 millions of euros per year in a project aiming at replacing IBM mainframes with commodity hardware and software (Linux and Apache Tomcat)[6]. The technology and tools are being further developed by the company Eranea[7]: further Cobol coverage, other legacy languages, additional transactional monitors, etc.
- Google Web Toolkit transcodes a Java program that uses a specific API to JavaScript. The Java code is a little bit contrained compared to normal Java code.
- Js_of_ocaml[8] of Ocsigen compiles an OCaml program into JavaScript.
- JSIL compiles CLI bytecode into human readable ECMAScript.
Examples
DMS Software Reengineering Toolkit
DMS Software Reengineering Toolkit is a source-to-source program transformation tool, parameterized by explicit source and target (may be the same) computer language definitions. It can be used for translating from one computer language to another, for compiling domain-specific languages to a general purpose language, or for carrying out optimizations or massive modifications within a specific language. DMS has a library of language definitions for most widely used computer languages (including full C++, and a means for defining other languages which it does not presently know).
LLVM
Low Level Virtual Machine (LLVM) can translate from any language supported by gcc 4.2.1 (Ada, C, C++, Fortran, Java, Objective-C, or Objective-C++) or by clang to any of: C, C++, or MSIL by way of the "arch" command in llvm-gcc.
% llvm-g++ -emit-llvm x.cpp -o program.bc -c % llc -march=c program.bc -o x.c % cc x.c -lstdc++ % llvm-g++ x.cpp -o program.bc -c % llc -march=msil program.bc -o program.msil
Refactoring tools
The refactoring tools automate transforming source code into another:
- Python's 2to3 tool transforms non-forward-compatible Python 2 code into Python 3 code.
- Qt's qt3to4 tool convert non forward-compatible usage of the Qt3 API into Qt4 API usage.
- Coccinelle uses semantic patches to describe refactoring to apply to C code. It's been applied successfully to refactor the drivers of the Linux kernel due to kernel API changes.[9]
- RefactoringNG is a Netbeans module for refactoring Java code where you can write transformations rules of a program's abstract syntax tree.
See also
- Binary recompiler
- Program transformation
- ROSE compiler framework - a source-to-source compiler framework
- Compiler-compiler
- Translation (computing)
- Language binding
- Language-independent specification
References
- ^ "Types of compilers". compilers.net. 1997-2005. http://www.compilers.net/paedia/compiler/index.htm. Retrieved 28 October 2010.
- ^ Digital Research (1981): XLT86 - 8080 to 8086 Assembly Language Translator - User's Guide. Digital Research Inc, Pacific Grove ([1]).
- ^ Seattle Computer Products (1981): 86-DOS - Disk Operating System for the 8086. User's manual, version 0.3. Seattle Computer Products, Seattle ([2]).
- ^ Eclipse Xtend[3]
- ^ http://code.google.com/p/naca/
- ^ http://media-tech.blogspot.fr/2009/01/project-naca-migration-from-ibm.html
- ^ http://www.eranea.com
- ^ Js_of_ocaml Overview
- ^ Valerie Henson (January 20, 2009). "Semantic patching with Coccinelle". lwn.net. http://lwn.net/Articles/315686/. Retrieved 28 October 2010.