How does a Java Virtual Machine written in Java work?

11

Seeing the Jikes RVM I was curious to know how this works (in theory), but only found material in English .

Is it correct to assume that the JVM today is done in C / C ++, which in turn is done in Assembler?

How can a language "self-interpret"? Has this been done in other languages?

    
asked by anonymous 19.01.2017 / 18:06

2 answers

8

First, to make a compiler of the X language itself written in the X language, do this:

  • Using the Y language, code and compile a compiler C 1 for the X language, which produces executable code on the P platform. Compile it with the previously existing compiler K of the Y language.

  • Using the X language, code and compile a C 2 compiler for language X, which produces executable code on platform P. Compile it with compiler C 1 language. Note that this will produce an X language compiler written in the X language itself.

  • Make sure that C 2 is identical to C 2b (or at least there is no difference that you care about). If it is not, adjust the compiler codes C 1 and C 2 until it is

  • Throw away C 1 .

  • You can do this process several times from the language X1, build the language X2, from X2 build the X3, and so on. That's why javac and eclipsec, the only two Java compilers mature and active today, are made in Java themselves. This process is called a bootstrap.

    However, the case here is with a interpreter , not a compiler. But the reasoning is similar. JikesRVM needs a small, minimal C boot loader to start the bootstrap process. Everything else that is not essential to getting everything started is done in Java. The idea is that anything that can not be done in C should be done in Java, but in C there should only be something for which there is no way to implement Java.

    However, most publicly available JVMs have large parts developed in C and C ++ themselves, primarily because of performance, memory consumption, and criticality of the code. On the other hand, JikesRVM is not a commercial JVM, so it is only used in specific niches, since it is not intended to compete with other JVMs in running user programs.

    Furthermore, this concept is not very new there. LISP is a language that has done this since it was conceived. Many LISP interpreters are written in LISP itself, having only a small minimal part responsible for the most basic functionality written in some other language.

        
    19.01.2017 / 20:32
    10

    This is called bootstrapping .

    Compilers

    Languages are just specifications . Although related, languages and compilers are different things .

    Compilers and libraries form what the specification says . It is obvious that the first implementation of the language must be written in another language. You can then use the language itself to create a new implementation written on it.

    Compilers are relatively basic algorithms, full of specific complexities, of course. Enter text data, process, and there's the complexity, and it generates a data, possibly binary that a virtual or physical machine knows how to execute. It is only a transformation algorithm to follow specific rules. So they generate a program that can be run and this can be a compiler, a virtual machine, an operating system, anything.

    I even understand the curiosity, but I find it strange that it seems something very difficult to achieve. I think the only "secret" is to know that the first compiler should be done in another language.

    Actually some languages are made incrementally. It makes a compiler that treats the minimum, and adds functionality later. So you almost have an initial compiler in the language itself. Of course the first interactions of this language development will be slightly different from the one desired at the end, and somewhat limited.

    Programming languages can produce anything , so it's no secret that a language produces a compiler for itself , provided there is a first implementation.

    Of course, some languages are not the right ones to produce compilers.

    The best implementations of Java are actually written in C ++. Java today does not seem very suitable for producing compilers. It was already worse. Java is compiled and not interpreted in its database. It is possible to have an interpreter.

    Interpreters

    An interpreter is nothing more than a compiler that in the end instead of generating an executable, it already executes what has been analyzed. An interpreter is an executable program like any other. But in this case the compiler generates a bytecode and it will be "interpreted" by the virtual machine

    Of course, in case the interpreter runs itself it needs to be reentrant .

    What I can guarantee is that it has a code snippet in another language, a snippet that just does the bootstrap of the virtual machine. Reading the article on Wikipedia, talk about it:

      

    A small C loader is responsible for loading the boot image at runtime

    The compiler, from the interpreter, is separate from the virtual machine, even if it is in the same executable. The virtual machine does not interpret Java, it interprets a bytecode , and by that I understood this Jikes nor uses the bytecode .

    Obviously I do not have details of this project and I can not state every detail of how it works.

    More information

    C and e C ++ compilers have been written in C ++ for quite some time. Some still have a good deal written in C. Someone can do it in Assembly (not Assembler), but for years no one seriously does that.

    C # today has its compiler written in C # and works better than the original written in C ++. The runtime is basically written in C ++.

    I've already talked about this in The First Programming Language .

        
    19.01.2017 / 19:24