My main goals of this “Journey through the VM” are that the reader learns about the VM and to spread the VM knowledge, mostly for people who doesn’t know anything about VM. For these same reasons months ago I’ve created a special mailing list for beginners. Please, ask everything you don’t understand, correct me when I say something wrong, give your opinion, etc. There are always people ready to help in the mailing lists. So, let’s start.
Intro to VM
As you may know, there are different Smalltalk dialects and each of them has its own VM. In this Journey through the VM, we will use the PharoVM. I hope there are newcomers reading this blog, so I will explain for them: Pharo, was a fork of another Smalltalk dialect called Squeak. However, both of them still uses the same VM. To avoid confusions, I will try to just say VM, but know that it is the Pharo/Squeak VM.
As far as I am aware of, most, if not all, Smalltalk VMs are implemented in C (or maybe parts in C++). Fortunately for us, happy smalltalkers, the Pharo/Squeak VM consist of two parts of code: a) VMMaker (written in Smalltalk) ; b) “platform code” (C/Smalltalk-C hand-written code)
VMMaker is a normal repository in squeaksource, but its name is very misleading for me. One would imagine that it is a tool for building the VM, but the truth is that it is much more than that. In this VMMaker repository, there are different interesting packages but for the moment, we will concentrate in the most important one: the package VMMaker itself:
It is where the Smalltalk part of the VM is contained. I consider this part of the VM as the “core”, and it is written in Smalltalk. In fact, it is not really in Smalltalk but instead in a subset (limited) of Smalltalk called SLANG, which is then translated to C. So, even if this is much happier thank coding in C, you have to be aware that witting in SLANG has some limitations.
VMMaker has two very important classes: Interpreter and ObjectMemory. As it says in Interpreter class comment, they both together represents a complete implementation of the Smalltalk-80 virtual machine, derived originally from the Blue Book specification but quite different in some areas.
As you may know, every time to save a method in Smalltalk, the Compiler takes such source code, it analyzes, validates, parses, and finally compiles it, generating a CompiledMethod instance. In this object, all the literals and bytecodes are encoded. So, one of the main tasks of the VM is to interpret (that’s why its name) and execute Smalltalk bytecodes that reads from the image. In addition, Interpreter implements some VM responsibilities like methods lookup, cache and execution, primitives, arithmetic operations, etc.
ObjectMemory describes the object memory of the VM. Its responsibility includes allocating/deallocating memory, Garbage Collector and memory compaction, it defines the structure and flags of the object header, object enumeration, etc.
Fortunately, the VM is decoupled from the issue of the supporting platform specific stuff, hence the “platform code”. It contains those parts that depends on the OS like sockets, display and files, where performance is needed, etc. As I told you, I understand the VMMaker part as the “core”. But there are a lot of other features that the VM should provide and they are not part of the “core” but instead they are written like plugins. There several plugins like FilePlugin, SocketPlugin, SoundPlugin, etc. We are not going further now with the plugins since there will be a post later for that (you will be even able to write you own plugin!!!). What is important to understand now is that the platform has code both things: for the VM “core” part, and for plugins.
I think (I may be wrong) that another good reason to have the “platform code” is to increase VM portabilty. Suppose you need to run the VM on a particular hardware/OS that supports C. In this case, you will probably need to change a little or even anything of the VM core (VMMaker). What you will need is to create a specific platform code for that hardware/OS. I think this is one of the reasons why you see Squeak VM running everywhere.
Most of the times that you want to hack, play or do research (which is the difference anyway between this verbs? heheh) with the VM, you usually modify the VMMaker part. It is not likely that you will need to modify something in the platform code.
How the VM is built ?
So…we have two parts of the VM: VMMaker (written in SLANG) and the platform code written in C (for the new MacOS platform there are parts written in Objective-C in order to talk with the cocoa library). There are two steps:
- Translate VMMaker (ObjectMemory, Interpreter, the plugins, etc) to C. This is done by the classes that are in the category ‘VMMaker-Translation to C’.
- Compile the C source code generated from the previous step, together with the platform code.
We will not go further on this topic now because this is explained in the next post or so.
The VM is written is Smalltalk? really? is that cool?
Ok, this can be controversial, so as always, this is just my point of view. I love Smalltalk. I love browsing for senders, implementors, references, doing refactors, browsing methods and classes, etc. If you are reading this blog, you probably understand my feelings :) There are people who love C. Ok, I don’t. I’ve even contributed (with code and fixes) to the OpenDBX library (written in C), so I can do it. But…… do you know how cool is to be able to read and modify the VM from the same image that you usually use ? SLANG is quite limited, and sometimes it looks like C, of course, but it is still much better than coding in C for me:
- You use the SAME Smalltalk image you use for other purposes
- You have all the tools provided by the Smalltalk IDE: senders, implementors, refactors, monticello, versions, etc. You browse Interpreter and ObjectMemory classes like if they were regular classes (indeed they are!).
- Most of the cases, you don’t need to deal with C (sometimes you do need).
- It can be versioned in Monticello just like any other project.
- This is cool: you take you image, you modify something in VMMaker, generate sources, go to a command line and using gcc (and the platform code) compiles the vm. Then, you take the compiled VM and run the same image you used for generating it hhehehe. Isn’t that cool?
- Finally, and this is the most important for several persons, the VM can be simulated. Since it is Smalltalk, you can run a InterpreterSimulator, inside a host image/VM. But we will talk about this later in the journey.
Cog VM and current status
Some time ago, Eliot Miranda (thanks to Teleplace company) started to work on a new VM for Squeak (and all its fork, like Pharo). This VM is called Cog VM and it has been already released. In fact, is the default VM included in the PharoOneClick 1.1.1 and beyond. Cog VM includes a lot of new features, but in a glance:
- Real and optimized block closure implementation. This is why from the image side blocks are now instances of BlockClosure instead of BlockContext.
- Context-to-stack mapping.
- JIT (just in time compiler) that compiles methods to machine code.
- PIC (polymorphic inline caching).
If you are a little arround Smalltalk you may have heard about Cog VM and Stack VM. What is the big difference? Stack VM implements 1) and 2). And Cog VM is on top of the Stack VM and adds 3) and 4). Finally, there is CogMT VM which is on top of Cog VM and adds multi-threading support for external calls (like FFI for example). Now…to be able to put names to all these VMs and not get confused, we will call our old VM Interpreter VM.
Cog VMs have increased performance arround 4x-10x. In addition, it has refactored a couple of things. For example, in Interpreter VM, the Interpreter was a subclass of ObjectMemory (WTF?). But that was necessary in order to easily translate to C. In Cog, there are new classes like CoInterpreter and NewObjectMemory. But the good news is that we can have composition!! The CoInterpreter has an instance variable that is the object memory (in this case an instance of NewObjectMemory). This is awesome for us, and of course, it required changes in the SLANG to C translator. So…as you can see, Cog is really important for the Squeak and Pharo community.
One of the problems I found when trying to learn the VM is that there is little documentation and even more, it is not very known. This is why in every post I will try to put the links I know to the related topics. Keep and share them. They may be useful if you want to go further in the VM or in certain areas. Notice that it is probable that some information in there is outdated, but that’s better than nothing ;)
- Smalltalk-80: The Language and its Implementation. Also knowns as “Blue Book” or THE book. You can download it from Stephane Ducasse website or in http://www.world.st/books.
- Chapter in the Pharo collaborative book.
- Official website of the Squeak VM.
- A Tour of the Squeak Object Engine. This is a chapter in the book “Squeak, Open Personal Computing and Multimedia”.
- Screencasts related to the VM.
- Eliot Miranda Cog Blog.
- Back to the Future: The Story of Squeak, A Practical Smalltalk Written in Itself.
- VMMaker in Squeak swiki.
- The Squeak Virtual Machine in squeak swiki
So, that was all for today. I tried to give you a first overview of the VM. The first posts may be “boring” but they will get better :) If you have questions, remarks, or any kind of feedback, let me know.