Hi. I am sure the title of this post is horrible, but I didn’t find anything better. The idea is simple: in this part of the journey, we will talk about bytecodes, primitives, CompiledMethods, FFI, plugins, etc… But before going there, I would like to write some bits about what happens first in the image side. These may be topics everybody know, so in that case, just skip the post and wait for the next one 😉 My intention is that anyway can follow my posts.
A really quick intro to Smalltalk reflective model
The reflective model of Smalltalk is easy and elegant. As we can read in Pharo by Example, there are two important rules: 1) Everything is an object; 2) Every object is instance of a class. Since classes are objects and every object is an instance of a class, it follows that classes must also be instances of classes. A class whose instances are classes is called a metaclass. Whenever you create a class, the system automatically creates a metaclass. The metaclass defines the structure and behavior of the class that is its instance. The following picture shows a minimized reflective model of Smalltalk. Notice that for clarification purposes this diagram shows only a part of it.
A class contains a name, a format, a method dictionary, its superclass, a list of instance variables, etc. The method dictionary is a map where keys are the methods names (called selectors in Smalltalk) and the values are the compiled methods which are instances of CompiledMethod.
When an object receives a message, the Virtual Machine has to do first what it is commonly called as the Method Lookup. This consist of searching the message through the hierarchy chain of the receiver’s class. For each class in the chain, it checks whether the selector is included or not in the MethodDictionary. If it is not, it continues searching forward in the chain until it finds a method or sends the #doesNotUnderstand: message in case it was not found in the whole hierarchy. When a method is found, it is directly executed.
To understand these topics, I really recommend the two wonderful chapters in Pharo By Example book: Chapter 13 “Classes and Metaclasses” and Chapter 14 “Reflection”. They are both a “must read” if you are more or less new with these topics.
In the internal representation of the Virtual Machine, objects are a chuck of memory. They have an object header which (there will be a whole post about it) can be between one and three words, and following the object header, there are slots (normally of 32 or 64 bytes) that are memory addresses which usually (we will see why I didn’t say always) represent the instance variables. The object header contains bits for the Garbage Collector usage, the hash, the format, a pointer to its class, etc.
Classes and Metaclasses
How do you create a class in Smalltalk? In other languages, you normally create a new text file that after you compile. But in Smalltalk, as we are used to, everything happens by a message send. So, to create a new class you tell to the superclass, “Can you create this subclass with this name, these instance variables and this category please?”. So, when you take a browser and you do a “Ctrl + s” of this code:
Object subclass: #MyClass instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'MyCategory'
The only thing you do, is to send the message #subclass:instanceVariableNames:classVariableNames:poolDictionaries:category: to Object. If fact, you can take that piece of code, evaluate it in a Workspace, and you will get the same results 🙂
You can see implementors and you will logically find one in Class. Which should be the result of such message sent? two things: a new class and a new metaclass. Do the following test:
Metaclass instanceCount -> 3710 Class allSubclasses size -> 3710
Now, create a new class, and inspect again:
Metaclass instanceCount -> 3711 Class allSubclasses size -> 3711
The problem with Metaclasses is that they are implicit, so they are very difficult to understand. Imagine that you create a class User, then its class is “User class”. The unique instance of “User class” is “User”. And at the same time, “User class” is an instance of Metaclass. So….complicated, but if you want to understand them, take a look to the chapters I told you. How it is done? it is not really important for the purpose of this post, but it uses the ClassBuilder and also the Compiler (check senders of #compilerClass).
Creating a method
We saw what happens when we create a class. And when you save a method from the browser? what happens ? In a nutshell what happens is that the Smalltalk Compiler does its magic, that is, it receives as an input a string that represents the source code, and as a result you get a CompiledMethod instance. A CompiledMethod contains all the instructions (bytecodes) and information (literals) that the VM needs to interpret and execute such method.
Let’s see it by ourself. Take your image, create a dumy class and then put a breakpoint at the beginning of Behavior >> #compile:classified:notifying:trailer:ifFail:. Now, type the following method and accept it:
testCompiler Transcript show: 'all this code will be compiled'.
Once you accept such code, the debugger should appear. You can analyze the stacktrace if you want. Notice the arguments that the methods has: compile: code classified: category notifying: requestor trailer: bytes ifFail: failBlock. So, I told you that the basic idea was to send a piece of code as text and get the CompiledMethod instance. The parameter “code” should be the code of the method we type, and yes, it is a ByteString. If you go step by step with the debugger, and inspect the result, that is, “CompiledMethodWithNode generateMethodFromNode: methodNode trailer: bytes.” you will see it answers the CompiledMethodWithNode instance to which you can ask “method” and it is the CompiledMethod instnace. Of course, that method should be the same you get after when doing “MyClass methodDict at: #testCompiler”.
The rest of the parameters are the category in which the method should be, the requestor (someone to notify about this event), the trailer bytes (we will see this later on), and a block to execute if there is an error.
In Smalltalk compiled methods are first-class objects (classes too!), in this case instances of CompiledMethod class. However, the class CompiledMethod is quite special and a little differet from the rest. But we will see this later on….What it is important for the moment, is to know that a CompiledMethod contains a list of bytecodes and a list of literals. Bytecodes are instructions. A method is decomposed in a set of bytecodes, which are grouped in five categories: pushes, stores, sends, returns, and jumps. Literals are all those objects and selectors that are needed by the bytecodes but they are not instance variables of the receiver, hence they need to be stored somewhere.
For example, with our previous example of #testCompiler, we will have a bytecode (among others) for sending the message #show: and we will have the ‘Transcript’ and the selector name ‘show:’ in the literals. As an exersise, inspect the CompiledMethod instance. You can just evaluate: “(MyClass >> #testCompiler) inspect”. But….”exploring” is usually better that “inspect” for compiled methods…I let you see the differences 🙂 Anyway, you will see something like this:
My knowledge of the Compiler is quite limited, but is is important to notice that the Compiler does much more things than the one I have said. In a compiler, there are usually several steps like parse the code, validate it, get an intermediate representation, and finally create the CompiledMethod instance. The compiler needs to know how to translate our Smalltalk code to bytecodes understood by the VM.
In Squeak and Pharo, the compiler is mostly implemented in the class Compiler. It seems it is quite difficult to understand and it has some limitations and difficulties to get intermediate representation of the code. Because of that and probably much other reasons, the community started to work in a new compiler called Opal (which at the beginning was called NewCompiler).
- Blue book: When talking about the VM and Smalltalk in general, the bible is the book “Smalltalk-80: The Language and Its Implementation”. You can find it in pdf in Stéphane Ducasse free books page, and directly in html (actually, only the chapters 26-30) in Eliot Miranda webpage. Those chapters are the part of the “Implementation” so everything that is related to the VM is there.
- Regarding Compiler, CompiledMethod, etc, you can read in the blue book this and this. About bytecodes, an intro here and in details here.
- Pharo By Example book: Chapter 13 “Classes and Metaclasses” and Chapter 14 “Reflection”.
- “A Tour of the Squeak Object Engine” gives an excellent overview of the VM, including a description about CompiledMethod, bytecodes and friends.
- Opal Compiler.