Tag Archives: VM

Headless support for Cog Cocoa VM

Hi guys.

As you may know, I finished my PhD in Computer Science in France and I am now back in my country, Argentina. I have started working as a freelancer/consultant/contractor/independent. If you are interested in discussing with me, please send me a private email.

For a long time, Pharaoers and Squakers have been asking for headless support in the Cocoa VMs just as we have with Carbon VMs. Carbon is becoming a legacy framework so people needed this.
 I wanted to take this opportunity to thanks Square [i] International for sponsoring me to implement such a support. Not only have they sponsored the development but they have also agreed to release it under MIT license for the community. This headless support will be included in the official Pharo VM and will be, therefore, accessible to everybody. You can read more details in the ANN email.

So…thanks Square [i] International for letting me work in something so much fun and needed.

Halting when VM sends a particular message or on assertion failures

This post is mostly a reminder for myself, because each time I need to do it, I forget how it was 🙂

There are usually 2 cases where I want that the VM halts (breakpoint):

1) When a particular message is being processed.

2) When there is an assertion failure. CogVM has some kind of assertions (conditions) that when evaluated to false, mean something probably went wrong. When this happens, the condition is printed in the console and the line number is shown. For example, if we get this in the console:


It means that the condition evaluated to false. And 41946 is the line number. Great. But how can I put a breakpoint here so that the VM halts?

So….what can we do with CogVM? Of course, we first need to build the VM in “debug mode”. Here you can see how to build the VM, and here how to do it in debug mode. Then we can do something like this (taken from an Eliot’s email)

McStalker.macbuild$ gdb Debug.app
GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...<wbr />Reading symbols for shared libraries ................ done
(gdb) break warning
Breakpoint 1 at 0x105e2b: file /Users/eliot/Cog/oscogvm/macbuild/../src/vm/gcc3x-cointerp.c, line 39.
(gdb) run -breaksel initialize ~/Squeak/Squeak4.2/trunk4.2.image
Starting program: /Users/eliot/Cog/oscogvm/macbuild/Debug.app/Contents/MacOS/Croquet -breaksel initialize ~/Squeak/Squeak4.2/trunk4.2.image
Reading symbols for shared libraries .+++++++++++++++..................................................................................... done
Reading symbols for shared libraries . done

Breakpoint 1, warning (s=0x16487c "send breakpoint (heartbeat suppressed)") at /Users/eliot/Cog/oscogvm/macbuild/../src/vm/gcc3x-cointerp.c:39
39              printf("\n%s\n", s);
(gdb) where 5
#0  warning (s=0x16487c "send breakpoint (heartbeat suppressed)") at /Users/eliot/Cog/oscogvm/macbuild/../src/vm/gcc3x-cointerp.c:39
#1  0x0010b490 in interpret () at /Users/eliot/Cog/oscogvm/macbuild/../src/vm/gcc3x-cointerp.c:4747
#2  0x0011d521 in enterSmalltalkExecutiveImplementation () at /Users/eliot/Cog/oscogvm/macbuild/../src/vm/gcc3x-cointerp.c:14103
#3  0x00124bc7 in initStackPagesAndInterpret () at /Users/eliot/Cog/oscogvm/macbuild/../src/vm/gcc3x-cointerp.c:17731
#4  0x00105ec9 in interpret () at /Users/eliot/Cog/oscogvm/macbuild/../src/vm/gcc3x-cointerp.c:1933
(More stack frames follow...)

So the magic line here is “(gdb) break warning” which puts a breakpoint in the warning() function. Automatigally, the assertion failures end up using this function, and therefore, it halts. With this line we achieve 2)

To achieve 1) the key line is “Starting program: /Users/eliot/Cog/oscogvm/macbuild/Debug.app/Contents/MacOS/Croquet -breaksel initialize ~/Squeak/Squeak4.2/trunk4.2.image”. Here with “-breaksel” we can pass a selector as parameter (#initialize in this case). So each time the message #initialize is send, the VM will halt also in the warning function, so you have there all the stack to analyze whatever you want.

I am not 100% sure but the following is what I understood about how it works from Eliot:

So, if I understood correctly, I can put a breakpoint in the function warning() with “break warning”. With the -breaksel  parameter you set an instVar with the selector name and size. Then after, anywhere I can send  #compilationBreak: selectorOop point: selectorLength   and that will magically check whether the selectorOop is the one I passes with -breaksel and if true, it will call warning, who has a breakpoint, hence, I can debug 🙂   AWESOME!!!!   Now with CMake I can even generate a xcode project and debug it 🙂  

That was all. Maybe this was helpful for someone else too.

My (past) presentations at PharoConf and (future) talk at ESUG 2012

Hi. As usual, I wanted to share with you the slides of my last talks in case you are interested.


Last month I went to the first PharoConf held in Lille, France, and I gave to talks. One was about using the Fuel serializer for several different hacky things 🙂  You can find the slides here but since most of the presentation was a demo, they are almost useless. The videos of the conference are being processed and will be updated soon. I will update this post once they are finished.

The other talk I gave was about building the Pharo Virtual Machine and you can find the slides here. If you are interesting in the topic, you can see all the blog posts I have written about it.

ESUG 2012

Once again, I will be attending and presenting at ESUG (in Ghent, Belgium). This year I will present something similar to the Fuel talk at PharoConf. As you can see in the ESUG schedule, the abstract of my talk says:

Fuel is an open-source general-purpose object serialization framework developed in Pharo. It is fast, extensible and has an object-oriented design. It can serialize not only plain objects, but also closures, contexts, methods, classes, traits, among others.
This presentation will be mostly a demo with only a few slides. I will show the power of Fuel by using it in several scenarios: rebuilding Pharo from a kernel image, exporting/importing Monticello packages, moving a debugger from one image to another one, persisting (and also import/export) Pier kernels, etc.

So…see you at Ghent?


Last days I needed to migrate some old code I used to have in the VM for tracing objects usage. Luc Fabresse also wanted to be able to set and get the value of a bit in the object header to do some experiments. So…we thought it was a good idea to make it abstract and public. So….the following is only one morning work we did together with Luc, so don’t expect that much. What we did is to do a very small change in the VM to use one free bit in the object header, and then we coded 3 primitives: one to get the value, one to set it and one to unmark all objects. The idea is that you can use this code and give semantics to the bit. This is just for experimenting and prototypes, not for production code since such bit in the object header may not be available.
To download:

Gofer it
url: 'http://ss3.gemstone.com/ss/ExperimentalBit';
package: 'ConfigurationOfExperimentalBit';
Now…. you can read ConfigurationOfExperimentalBit class comment:

ExperimentalBit is a small facade for setting and getting the value of a bit in the Object Header. It requires a special VM which supports the primitives to set and get the value of such bit. You can get an already compiled MacOSX VM from: https://gforge.inria.fr/frs/download.php/30042/CogMTVM-ExperimentalBit.zip. For more details read class comment of ExperimentalBitHandler.

If you already have a compiled VM with the required primitives, then you can just load the image side part evaluating:

((Smalltalk at: #ConfigurationOfExperimentalBit) project version: ‘1.0’) load.

If you want to build a VM with the primitives we need, you need to download:

((Smalltalk at: #ConfigurationOfExperimentalBit) project version: ‘1.0’) load: ‘VMMakerGroup’.

And then follow the steps to build the VM:



And here ExperimentalBitHandler class comment:


ExperimentalBitHandler is a small facade for setting and getting the value of a bit in the Object Header. It requires a special VM which supports the primitives to set and get the value of such bit. You can get an already compiled MacOSX VM from: https://gforge.inria.fr/frs/download.php/30042/CogMTVM-ExperimentalBit.zip.

To know which version of the VM you have to use to compile, check the dependencies in ConfigurationOfExperimentalBit and also the ‘description’ of it. For example, if version 1.0 it depends on ‘CogVM’ version ‘3.7’. In the description of version 1.0 you can also read that the used Git version of the platform code was 4a65655f0e419248d09a2502ea13b6e787992691 from the blessed repo.

Basically, there are 3 operations: set the bit to a specific, get the value of the bit and turn off the bit of all objects. Examples:

‘aString’ experimentalBit: true.
‘astring ‘ experimentalBit.
Date today experimentalBit: false.
Date today experimentalBit.
ExperimentalBitHandler turnOffExperimentalBitOfAllObjects.

For more details see ExperimentalBitTest.


Happy New Year to all Smalltalk hackers!!!

Memory Addresses and Immediate Objects

Hi. After a couple of months talking about other stuff like Fuel, and presentations in conferences such as ESUG and Smalltalks, I would like now to continue with the “Journey through the Virtual Machine” for beginners. So far I have written the first and second part. Consider this post the first one of the third part.

Direct pointers vs object tables

Let’s say we have this code:

| aPoint |
| aPoint := Point x: 10 y: 20.5.

In this case, aPoint has an instance variable that refers to an integer (10) and a float (20.5). How are these references implemented in the VM?

Most virtual machines have an important part whose responsibility is managing the memory, allocating objects, releasing, etc. In Squeak/Pharo VM, such part is called Object Memory. In addition, the Object Memory defines the internal representation of objects, its references, its location, its object header, etc.  Regarding the references implementation, there are two possibilities which are the most common: object tables and direct pointers.

With the first, there is a large table with two entries. When the object aPoint refers to the float 20.5, it means that the instance variable “y” of aPoint has an index in the table where the memory address of the float 20.5 is located. With direct pointers, when aPoint refers to 20.5, it means that the instance variable “y” of aPoint has directly the memory address of 20.5.

There are pros and cons for each strategy but such discussion is out of range for this post. One of the nice things with object tables is that the primitive #become: is really fast since it is just updating one reference. With direct references, the #become: it needs to scan all the memory do detect all the objects that are pointing to a particular one. On the other hand, with object tables, we have to pay the cost of accessing an extra indirection and (I guess) this may impacts on the overall performance of the system. With direct pointers, we do not have that problem. Finally, object table uses more memory since the table itself needs memory. Few months ago there was a nice discussion in the mailing list about the prons and cons.

First Smalltalk VMs used to have an object table, but now most current VMs (included the Squeak/Pharo VM) use direct pointers. The only current VM I am aware of that uses object tables is GemStone. But… they actually have one (virtual) Object Table (OT) per committed transaction!!  How they can do those optimizations and not blowup in terabytes of memory used by OTs? Well, that’s one of GemStone keys 😉  If you are interested in this topic, you can read this thread.

Memory addresses

In the previous paragraphs you learn that each memory address in the Squeak/Pharo VM represents a direct pointer to another object. Well, that’s almost correct. We are missing what it is usually known as “immediate objects”.  Immediate objects are those that are directly encoded in the memory address and do not require an object header nor slots so they consume less memory. In the CogVM there is only one type of immediate object, and it is SmallInteger. What does it mean?

In our example, the instance variable “x” of aPoint does not have a pointer to an instance of SmallInteger with the content 10. Instead, the memory address of “x” has directly encoded the value 10. So there is no instance of SmallInteger. But now, how the VM can known whether an instance variable is a pointer to another object or a SmallInteger? We need to tag a memory address to say “this is a object pointer” or “this is a SmallInteger”. To do that, the VM uses the last bit of the word (32 bits). If such bit is 1, then it is a signed 31-bits SmallInteger. If it is 0, it is a regular object pointer (oop).

Since I told you SmallInteger were encoded in 31 bits and they were signed, it follows that we have 30 bits for the number (one bit is for the sign). Hence, SmallInteger maxVal should be (2 raisedTo: 30) -1, that is, 1073741823. Analogy, SmallInteger minVal answers -1073741824. Number are encoded using the two’s complement. If you want to know more about this, read the excellent chapter that Stéphane Ducasse wrote about it.

Now, regarding object pointers, they always point to the memory address where the object header is. In our example, the instance variable “y” of aPoint, has the memory address of 20.5‘s object header.

As you can imagine, the VM needs to check all the time whether a OOP is really an OOP or an integer:

ObjectMemory >> isIntegerObject: objectPointer

^ (objectPointer bitAnd: 1) > 0

If you have an image with Cog loaded (as I explained in all my posts about building the VM), you can check for its senders…and you will find quite a lot 😉

Previously, I explain you why SmallInteger instances do not have object headers and those instances do not really exist as “objects”. That’s exactly why “SmallInteger instanceCount” answers zero. Each SmallInteger is encoded in different instance variables of different objects.

Another funny fact is why identity is always true with SmallIntegers. Say you have  ‘1’ asNumber ==  (4-3), that answers true. Because at the end, the VM calls a regular C’s equality (=), which of course, for 2 equal numbers, it is always true. But of course, if those numbers are actually OOP (a number), if they are equal, then it means they both point to the same object:

StackInterpreter >> bytecodePrimEquivalent

| rcvr arg |
rcvr := self internalStackValue: 1.
arg := self internalStackValue: 0.
self booleanCheat: rcvr = arg.

There are more things where you can notice that SmallInteger is special. In fact, you can browse the class and see some methods it overwrites, like #nextInstance (throwing an error), #shallowCopy, #sizeInMemory, etc. And of course, there are more problems like trying to do a become. For example, (42 become: Date new) throws an error saying it cannot become SmallIntegers.

More immediate objects?

As said, in a word of 32 bits, we only use 1 bit for tagging immediate objects (SmallInteger in the case of the squeak VM). We could use more than 1 bit…but then it means we have fewer bits for the OOP, therefore, the maximum possible memory to address is smaller, because the amount of bits of the OOP limits us in how much memory we can address as maximum.

But….what happens in a 64-bits VM?  I think 63 bits can be more than enough  for memory addresses. So what about using fewer bits for OOP and more for immediate objects?  Say we can use 58 for OOP and 6 for tagging immediate objects. In that example, we have (2 raisedTo: 6) – 1 , that is,  63 different possibilities!!!  So we can not only encode SmallIntegers but also small floats, true, false, nil, characters, etc… Is that all?  No! there are even more ideas. We can not only encode instances of certain class, but also give semantics to the possibility of tagging memory addresses. For example..we could use one of the combinations of tag bits to say that memory address is in fact a proxy. It doesn’t need to be an instance of Proxy, but we just give the semantics that when a memory address finishes with that tag bit, it means that the 58 bits for the OOP is not an OOP but a proxy contents. Such content can be a number representing an offset in a table, an address in secondary memory, etc… The VM could then do something different if the object is a proxy!

Well…all that I mention is not new at all. In fact, Gemstone does something very similar. They use 61 bits for address + 3 for tags. Here is a nice set of videos about Gemstone’s internals.  And in this video you can see what we are speaking here.

Documentation and future posts

I always try to put some links together related to each post I talk about:

In the next post, I will give details about the current Object Header.

Named Primitives

In the previous post we saw different things: what is a primitive and some examples, their impact on CompiledMethod instances, pragmas, etc. Continuing with this “Journey through the Virtual Machine”, today I will talk about Named Primitives.

Important data from previous post

What is important for this post is a summary of what a primitive is. As we saw, there are methods that can be implemented in such a way that they call a Virtual Machine primitive. To declare the information related to which primitive to use, we use Pragmas. Example of the method #class:

Object >> class
"Primitive. Answer the object which is the receiver's class. Essential. See
Object documentation whatIsAPrimitive."

<primitive: 111>
self primitiveFailed

In this case, the primitive is the number 111. The primitive is implemented in the CORE of the Virtual Machine. This core is written in Slang, a subset of Smalltalk.  To see how to map primitive numbers to their implementation we can see the method StackInterpreter >> #initializePrimitiveTable. In this example, for example, we can see it is mapped to the method #primitiveClass. But don’t confuse, this is NOT a regular method. This is part of the VM (the package VMMaker) and that method is automatically translated to C while building the VM.

For more details, please read the previous posts of this blog.

Named Primitives vs. Numbered Primitives

Again, in the previous post, we saw a “weird” method like:

FileDirectory >> primDeleteFileNamed: aFileName
"Delete the file of the given name. Return self if the primitive succeeds, nil otherwise."

    <primitive: 'primitiveFileDelete' module: 'FilePlugin'>
    ^ nil

Which are the differences between this primitive and the previous one (#class)? Well…let’s see:

With “numbered primitives” like #class, those primitives are implemented in the VM core, that is, the code of the primitives is inside Interpreter classes. There is a table kept in the VM that maps numbers to methods which are then translated to C functions. The only thing is needed to know from image side to call a primitive is the primitive number. In addition, these primitives cannot be loaded dynamically and hence, it is not easy to extend the VM with new primitives. If that is desired one need to build a new VM wich such primitive and distribute that VM.

Named primitives are different. They can be written with Slang as well, but they are not part of what I call the “VM core”. The methods that implement those primitives are not part of the Interpreter classes. Instead, they are written in different classes: plugins. What is needed to know from image side to call a named primitive is the name and its module. What is a module? Let’s say that it is the plugin name. Contrary to numbered primitives, named ones can be loaded dynamically and hence, it is easy to extend the VM with new primitives. One can generate the binaries of the plugin and distribute it with the regular VM. Named primitives can reside in an external library (.so on Unix, DLL on Windows, etc).

Named Primitives / Plugins / Pluggable Primitives

So…do they all mean the same?  Yes, at least for me, they all represent something similar. For me, named and pluggable primitives are the same concept. And I see a plugin like a set of named/pluggable primitives.

When someone says “this is done with a plugin” or “did you load the plugin”, they refer to that. Even if in a future post we will see how to implement our custom plugin, I will give a small introduction.

Plugins are translated to a different C file, not to the same C file of the VM (result of Interpreter classes translation). In fact, plugins are translated and placed in the directory /src/plugin. Each plugin is implemented in the VM as a subclass of InterpreterPlugin. Just for fun, inspect “InterpreterPlugin allSubclasses”. Usually, a plugin needs functionality provided by the VM core. For this purpose, the class InterpreterPlugin has an instance variable InterpreterProxy, which acts as its name says, as a proxy to the Interpreter (the vm). InterpreterProxy provides only the methods that the VM wants to provide to primitives. Some examples are #fetchInteger:ofObject:, #pop:thenPush:, #superclassOf:, etc….So, plugins can only use those provided methods of the interpreter.

We saw that from the image side, named primitives are implemented using the following pragma: “<primitive: ‘primitiveXXX’ module: ‘YYYPlugin’>”. For example, “<primitive: ‘primitiveFileDelete’ module: ‘FilePlugin’>”. The first parameter is the primitive name, which has to map to the method that implementes such primitive (notice the difference with the table for numbered primitives). So in this case, there must be a method (implemented in Slang) called #primitiveFileDelete. The second parameter is the plugin name. A plugin is rified as a subclass of InterpreterPlugin and the plugin name can be defined by implementing the method #moduleName. If a plugin does not do that then the class name is used by default, as it happens with FilePlugin. So….FilePlugin is a subclass of InterpreterPlugin and implements the method #primitiveFileDelete, which looks like:


| namePointer nameIndex nameSize  okToDelete |

<export: true>

namePointer := interpreterProxy stackValue: 0.
(interpreterProxy isBytes: namePointer)
ifFalse: [^ interpreterProxy primitiveFail].
nameIndex := interpreterProxy firstIndexableField: namePointer.
nameSize := interpreterProxy byteSizeOf: namePointer.
"If the security plugin can be loaded, use it to check for permission.
If not, assume it's ok"
sCDFfn ~= 0
ifTrue: [okToDelete := self cCode: ' ((sqInt (*)(char *, sqInt))sCDFfn)(nameIndex, nameSize)'.
ifFalse: [^ interpreterProxy primitiveFail]].
sqFileDeleteName: nameIndex
Size: nameSize.
interpreterProxy failed
ifFalse: [interpreterProxy pop: 1]

How plugins are compiled with the VM, as well as telling the VM which plugins to compile, is explained in a previous posts such as this one and this one.

Plugins: internal or external?

Plugins can be compiled in two ways: internal or external. Notice that it is just the way they are compiled, but the way they are written is the same: using SLANG. Each plugin is a class subclass of InterpreterPlugin or SmartSyntaxInterpreterPlugin. A plugin can then be compiled in the mentioned ways.

Internal plugins are linked together with the core of the classical VM, that is, the binaries of the plugins are put together with the binary of the VM. So for the final user, there is just one binary representing the VM. External plugins are distributed as separate shared library (a .dll in windows, a .so in Unix, etc). The functions (remember that slang is then translated to C so what we coded as methods will become C functions hahaha) of the shared libraries representing the plugins are accessed using system calls.

Which one to use?  Well, that depends on what the developer of the plugin wants. In my case I usually try to build them externally since you don’t need to do anything at all to the VM. It is easier to distribute: just compile the plugin and use it with a regular VM. And from security point of view they are even simpler to eliminate or disable, just removing the binary file.

But not everything is pink in this world. Unfortunately, there are some plugins that cannot be compiled in both ways, but with one in particular. Most existing plugins are optional. Nevertheless, there are some plugins that are mandatory for the core of the VM, that is, the VM cannot run without those plugins. There are lots of available plugins. Which ones are needed? Some plugins only work in certain Operating System. Some only work not even in certain OS but also in a particular version. Plugins may need different compiler flags in different OS. Etc…

To solve the problem of knowing all that, CMakeVMMaker provides an easy way to compile the plugins of a VM. I assume you have been following this “journey” so you read how to compile the VM from scratch in https://marianopeck.wordpress.com/2011/04/10/building-the-vm-from-scratch-using-git-and-cmakevmmaker/ and https://marianopeck.wordpress.com/2011/04/16/building-the-vm-second-part/. So if you installed ConfigurationOfCog, you installed CMakeVMMaker as well. Check for the methods #defaultInternalPlugins and #defaultExternalPlugins. Each CMakeVMMaker configuration class implements those methods correctly. Each of them knows which plugins should be compiled and whether internally or externally. So, the user, someone who wants to build the VM, doesn’t need to worry about that. In addition, CMakeVMMaker let us customize which plugins to use with the method #internalPlugins: and #externalPlugins.

I know, I know.  You want to write and compile your own plugin? Ok, there will be a future post about that. But if you want to try it, check subclasses of InterpreterPlugin or SmartSyntaxInterpreterPlugin  (I recommend the last one since makes a lot of stuff simpler) and then build the VM with something like:

| config |
config := CogUnixConfig new.
config externalPlugins: (config externalPlugins copyWith: #MyHackyFirstPlugin).
config generateWithSources.

Named Primitives and their relation to CompiledMethod

In the previous post we saw that methods that contained a numbered primitive have something special in the CompiledMethod instance: the penultimate literal does not have the Symbol containing the selector but instead an instance of AdditionalMethodState which has a pragma with the primitive information. In the case of numbered primitives we have that, but in addition, there is one more special object in the first literal of the CompiledMethod. That object is an Array that with 4 elements. The first is the plugin name, which is answered by #moduleName (what you put in the module:). The second one is the selector. The third is the session ID which is obsolete, not used anymore, and hence it is usually zero. The last one, is the function index (Integer) in a table that resides in the VM: externalPrimitiveTable. As far as I understood, such table and this index is used as a cache. What is funny is that the VM writes that index in the CompiledMethod instance. For more details, read the method #primitiveExternalCall.


As always, if there are more links or documentation about them please let me know and I will add it.

Introduction to Smalltalk bytecodes

Hi all. It this post I will give you a quick overview and introduction to bytecodes. I won’t talk that much because this topic is well explained in the Blue book, in the code, etc. In the previous posts we saw that a CompiledMethod is all about bytecodes and literals (ok, and a header and a trailer). To really follow the post, I recommend you to have an image with VMMaker. If you don’t know how to do it, please see the title “Prepare the image” of a previous post.

Bytecodes introduction

Let’s start from the beginning: What is a bytecode?  A bytecode is a compact and platform neutral representation of machine code instructions, interpreted by VM. When you code and then save a method in Smalltalk, the Compiler generates an instance of CompiledMethod. Your method’s code is decomposed by the Compiler into a set of basic instructions so that the VM can interpret them.

We have also seen that a CompiledMethod is just an array of bytes. And “BYTEcode” has the prefix “byte”. So, as you can imagine, every bytecode is represented by a byte. One bytecode, one byte (sure?? mmmm). One byte, 8 bits, 2^8 -1 = 255 possible different bytecodes.

Imagine that we code the following basic method:

MyClass >> foo
self name.

To see the bytecodes there are two possibilities: print the answer of  the message #symbolic to the CompiledMethod instance. For example, (MyClass >> #foo) symbolic or use the SystemBrowser, button “View” -> “byte codes”. The bytecodes from the previous method are:

17 <70> self
18 <D0> send: name
19 <87> pop
20 <78> returnSelf

So…how do we interpret such symbolic representation?

Understanding bytecodes printing

Let’s start from left to right. First “column” is a number. In this case, from 17 to 20. What do those number mean?  Explore or inspect the CompiledMethod:

So, those number represent just the position in the whole array. We said a CM (CompiledMethod) was an array of bytes. So, those number represent the position. The CM has two regions: the literal frame (the first bytes of the CM where the literals are stored) and the bytecodes. These numbers are called “Program Counter” (PC) when they are in the bytecode part. For example, if we send the message #endPC to this CM instance, we will get 20 which is the last byte of the CM that represents bytecodes. The next one, 21, is already representing the trailer. The same way, #initialPC answers 17. And how those two methods are implemented?  the #initialPC uses the header’s encoded information such as the number of literals. And #endPC delegates to its trailer since he knows the size of the trailer.

The second column is an hexadecimal surrounded by <> which represents the bytecode number. For example, <70>,, etc. This hexadecimal represent the unique number of bytecode. ’70’ is the bytecode push receiver, ‘D0’ is a send bytecode, 85 pop stack, and push receiver. Since these numbers are encoded in 1 byte it follows that there are 255 possible differnt type of bytecodes.

The third column is just a text describing the type of bytecode.

If we analyze now the bytecodes generated for our simple method that does a “self name” we have that:  the first bytecode (number 17) it just pushes the receiver (self) into the stack. We need that in the stack because in the next bytecode we send a message to it. The second bytecode, 18, sends the message #name to what it is in the stack. When this bytecode finishes, it pushes in the stack the result of the send. But our method doesn’t do anything with it and instead it just answers self (because there is not explicit return). So, we need to do a pop first, bytecode number 19, to remove the result from stack and let the receiver in the top of the stack. And now, we can finally do the return with bytecode number 20.

Mapping bytecodes from image side to VM side

So far we saw how the bytecodes in the CM look like, but we didn’t see how they are map to the VM. So, take your image with VMMaker loaded, and inspect the method #initializeBytecodeTable. You will see that such method is something like this:

The table follows much more but I cut it for the post. So as you see it is just a table that maps numbers to methods 😉  We saw that the symbolic representation has 3 columns, and the second one which was surrounded by <> and an hexadecimal value represents the number of the bytecode. That number is exactly this one used in this table, with the difference that it is in decimal. So, for example the bytecode “<78> returnSelf”, if we translate 78 from hexa to decimal (just print 16r78)  we get 120, which maps to the method #returnReceiver. So, you can now just browse the method and look what it does 🙂  Remember that this is part of VMMaker and this code is written is SLANG. For more details read my old posts.

StackInterpreter >> returnReceiver
localReturnValue := self receiver.
self commonReturn

You have now learned how to see the bytecodes of a method and how to see its implementation in the VM. Cool!!  You deserve a break (or a beer) 🙂

Did you notice that some bytecodes are mapped directly to one only number (like #returnReceiver) but some other like #pushLiteralVariableBytecode are mapped to a range of numbers?  We will see later why.

More complicated bytecodes

Now, let’s see a more advance method, for example, this one:

fooComplicated: aBool and: aNumber
| something aName |
aName := self name.
Transcript show: aName.
ifTrue: [ ^ aName ].
^ nil

Which generates the following bytecodes:

25 <70> self
26  send: name
27 <6B> popIntoTemp: 3
28 <42> pushLit: Transcript
29 <13> pushTemp: 3
30  send: show:
31 <87> pop
32 <10> pushTemp: 0
33 <99> jumpFalse: 36
34 <13> pushTemp: 3
35 <7C> returnTop
36 <7B> return: nil

There are a couple of new bytecodes in this method. Bytecode 27, does a pop of the return of “self name” and push it in the temp number 3. Notice that temps are both, parameters and temp variables. In this case, temp 0 is ‘aBool’, temp 1 ‘aNumber’, temp 2 ‘something’ and temp 3 is ‘aName’. Bytecode 28 needs to push the literal “Transcript” into the stack since in the next bytecode a message is sent to it. Bytecode 29 pushes ‘aName’ to the stack since it will be the parameter for the send.  Bytecode 30 does the send and 31 does a pop because we don’t do anything with the return of the message.

With bytecode 32 we put ‘aBool’ into the stack and then….then…shouldn’t we have something like 33 send: ifTrue:ifFalse:  ???  Yes, we should. But the compiler does an optimization and replaces the message send by a jump bytecode. In this case, a jump bytecode saying that when it is false jump to bytecode number 36 which does the return nil. Otherwise (if true), continue with bytecode 34 which pushes ‘aName’ into the stack and finally bytecode 35 does the return of the top of the stack (where we can ‘aName’).

How do we represent parameters in bytecodes?

We shouldn’t forget that btyecodes are just a number between 0 and 255. The bytecode <78> returnSelf  is number 120, which was we can see in #initializeBytecodeTable it is mapped by  (120 returnReceiver). Does this method requires any kind of parameter? No. It just returns the receiver. Now,  let’s analyze the bytecode <6B> popIntoTemp: 3 from the previous example. 16r6B -> 107.  Ok, cool. So, number 107 does a pop and puts that into the temp number 3. But….all what we have in the CompiledMethod is the bytecode, the byte that contains the number 107. Nothing more. Imagine the method in the VM that is mapped to this bytecode….how can it knows the temp number?   The same with bytecode “<42> pushLit: Transcript”. It is just a number,  66. Where is the “Transcript” stored?

So…the generic question is, if we only have a bytecode number, how do we solve the bytecodes that require some parameters ? Ok, this will sound a little weird. Or smart?  The truth is that this missing information is sometimes (we will see later why sometimes) encoded using offsets in the range of bytecodes. Let’s see the example of the  <6B> popIntoTemp: 3, which is bytecode number 107. In #initializeBytecodeTable, we can see: “(104 111 storeAndPopTemporaryVariableBytecode)“. So we have a range of bytecode between 104 and 111. In this case we want to do a pop and put the result in the temp number 3. If we can assume that 104 is for temp 0, 105, temp1, 106 temp2 and 107 temp3 🙂 we now understand why out bytecode number is 107. That bytecode number encodes that the number of the temp is the 3. The method storeAndPopTemporaryVariableBytecode  will be able to get a diff between the current number (107) and the start of this range (104) and finally know that the temp number is the 3th.

The same happens with the other example <42> pushLit: Transcript, number 66. In #initializeBytecodeTable we can see “( 64  95 pushLiteralVariableBytecode)“. 64 is for literal at 1, 65 at 2, and 66 at 3.   Now, do (MyClass >> #fooComplicated:and:) literalAt: 3 -> #Transcript->Transcript  🙂

Now, if we analyze “(104 111 storeAndPopTemporaryVariableBytecode)” we can understand that the maximum amount of temporal variable is 7 (111-104). However, in this post, we saw the class comment of CompiledMethod says “(index 18)    6 bits:    number of temporary variables (#numTemps)”. That means that maximum number of temps is 2^6-1=63 . So…..something is wrong. Let’s find out what.

Extended bytecodes

(104 111 storeAndPopTemporaryVariableBytecode)”  supports 7 temps, but CompiledMethod class comment says 63 are supported. So, let’s create a method with more than 7 temps and let’s see its bytecodes. If we continue with our example we now modify the method to this:

fooComplicated: aBool and: aNumber
| something aName a b c d e f g h i j k |
d := self name.
Transcript show: aName.
ifTrue: [ ^ aName ].
^ nil

In this case, “d := self name” we are assigning ‘name’ to ‘d’ which is the temp number 7 (remember they start in 0 and the parameters are count together with the temp variables). Hence, the bytecodes are:

 25 <70> self
 26  send: name
 27 <6F> popIntoTemp: 7
 28 <42> pushLit: Transcript
 29 <13> pushTemp: 3
 30  send: show:
 31 <87> pop
 32 <10> pushTemp: 0
 33 <99> jumpFalse: 36
 34 <13> pushTemp: 3
 35 <7C> returnTop
 36 <7B> return: nil

Now, if we just change “d := self name.” to “e := self name.” we would be using the parameter number 8. What would happen? Ok, if you change it you will see that the bytecode changes from “27 <6F> popIntoTemp: 7” to “27 <82 48> popIntoTemp: 8”. Chan! Chan! Chan! What is that???? It seems the bytecode is using in fact 2 bytecodes? (82 and 48).

These kind of bytecodes are called “extended bytecodes”. If we check in #initializeBytecodeTable bytecode 16r82 = 130 is #extendedStoreAndPopBytecode. So at least we know which method is it. Now, what does the 2 bytecode mean (48 in our example) ?  Somehow such second byte should tell us the number of the temp (8 in our example). If we do 16r48 = 72 and check the bytecode 72 we get it is #pushLiteralVariableBytecode, which doesn’t seem to be correct. So, this second byte does not represent a bytecode. Instead, it represents just a byte that encodes information. That information is usually a type and an index, both encoded in one single byte.

In this particular example of #extendedStoreAndPopBytecode uses 2 bits for a type and 6 bits for an index:

| descriptor variableType variableIndex association |
<inline: true>
descriptor := self fetchByte.
self fetchNextBytecode.
variableType := descriptor >> 6 bitAnd: 3.
variableIndex := descriptor bitAnd: 63.
variableType = 0 ifTrue:
[^objectMemory storePointer: variableIndex ofObject: self receiver withValue: self internalStackTop].
variableType = 1 ifTrue:
[^self temporary: variableIndex in: localFP put: self internalStackTop].
variableType = 3 ifTrue:
[association := self literal: variableIndex.
^objectMemory storePointer: ValueIndex ofObject: association withValue: self internalStackTop].
self error: 'illegal store'.

We can see that in our example, 16r48 = 72.  (72 >> 6) bitAnd: 3  -> 1. So, type is 1. The 3 is because it uses 2 bits for the type (2^2-1=3). And 72 bitAnd: 63 -> 8  (which correctly is the number of temp we need). 63 is because 2^6-1=63.  As you can notice, each bytecode is responsible of decoding the information of the second byte. The compiler of course, needs to correctly generate the bytecodes. #extendedStoreAndPopBytecode was an example so that you can understand and learn, but there are much more extended bytecodes. There are even “single extended bytecodes” and “double extended bytecodes”.

Why do we need extended bytecodes?

Well, I am not an expert at all in this subject but I can guess it is because of the size of CompiledMethod. In the previous example of the extended bytecode, it uses two bytes instead of one, as we can see in the explorer:

Notice that bytecode number 27 occupies two (28 is not shown). At the beginning we saw that when we have a “range” in the bytecodes it means that the difference encodes a number, usually an index. But if we need a range of bytecodes for the maximum supported, we would need much more than 255. Hence, more bytes per bytecode.  Since most methods in Smalltalk are short and encode few number of instance variables, parameters, temporal variables, etc, it was decided to use 255 and just use more bytes per bytecode for those cases that was needed. For example:

(CompiledMethod allInstances select: [:each | each numTemps > 7]) size  -> 1337
CompiledMethod instanceCount -> 75786
((1337 * 100) / 75786) asFloat -> 1.7641780803842397

So…only a 1.76% of the CompiledMethod of my image have more than 7 temporal variables. And remember that this was just an example, but there are extended bytecodes for more things. Maybe (I have no idea) with today computers this is not worth it and maybe having 3 or 4 bytes for every bytecode is enough. But since it is like this and working correctly, why to change it?

Groups of bytecodes

Since the specification of the blue book of Smalltalk-80, bytecodes are known to be grouped in different groups. And since the core of Squeak/Pharo VM is implemented in a subset of Smalltalk called SLANG and since we have classes with methods that represents Interpreters….how would these groups be represented? Of course, as method categories!!! So, you can map each of the following group, with a method category of an Interpreter class:

  • Stack manipulation bytecodes: all things related to push and pop.
  • Message sending bytecodes: bytecodes that are used when sending messages.
  • Return bytecodes: are used for different kinds of return.
  • Jump bytecodes are related to conditionals

Look the attached screenshot:

Books and links

In this post it is easy: just read the blue book 🙂   As always, you can download it in pdf from http://stephane.ducasse.free.fr/FreeBooks.html or directly browse the web version provided by Eliot Miranda. For a bytecode introduction read the end of chapter 26  and for more details the whole chapter 28. Notice that the specification has changed a bit from the 80’s and there are now there are more bytecodes but the general idea is still valid.

Building the VM – Second Part

Hi folks. I guess that some readers do not like all the building part and want to go directly to see the VM internals. But it is really important that you understand how to change the VM, compile it or even debug it. Otherwise, you’ll be very limited.

This post is mostly about a couple of things that I wanted to mention in the previous post, but I couldn’t because it was already too long. If you read such post, you may think that compiling the VM from scratch is a lot of work and steps. But the post was long because of my explanations and because of my efforts in making it reproducible. This is why I would like to do a summary of how to compile the VM.

Summary of VM build

Assuming that you have already installed Git + CMake + GCC, then the following are the needed steps to compile the Cog VM:

mkdir newCog
cd newCog
git clone --depth 1 git://gitorious.org/cogvm/blessed.git
cd blessed/image
wget --no-check-certificate http://www.pharo-project.org/pharo-download/unstable-core
"Or we manually download with an Internet Browser the latest PharoCore
image from that URL and we put it in blessed/image

Then we open the image with a Cog VM (which we can get from here or here) and we evaluate:

Deprecation raiseWarning: false.
 Gofer new
 squeaksource: 'MetacelloRepository';
 package: 'ConfigurationOfCog';
(Smalltalk at: #ConfigurationOfCog) project latestVersion load.
"Notice that even loading CMakeVMaker is not necessary anymore
since it is included just as another dependency in ConfigurationOfCog"
MTCocoaIOSCogJitConfig generateWithSources.
"Replace this CMMakeVMMaker configuration class for the one that suites your OS
like CogUnixConfig and CogMsWindowsConfig"

Now, come back to the terminal and do:

cd newCog/blessed/build
cmake .
# Or  cmake . -G"MSYS Makefiles"  if you are in Windows

And that’s all, in “blessed/results” (in Windows it should be under “blessed/build/results”) you should have the CogVM binary. I know that you probably are a lazy guy, but if you really want to take advantage and learn in my posts, I strongly recommend you to follow those steps. All along this sequence of posts, we will debug and modify the VM (change GC, method lookup, create our own primitives and plugins, etc). Once you have Git and CMake, I promise the process takes less than 5 minutes.

Available CogVMs

Remember that all these posts is what I called “Journey through the VM”, so we will probably go and come back between different posts 🙂  In the first post,under the title “CogVM and current status” I explained the different flavors of CogVMs and the main features of them:

  1. Real and optimized block closure implementation. This is why from the image side blocks are now instances of BlockClosure instead of BlockContext.
  2. Context-to-stack mapping.
  3. JIT (just in time compiler) that translates Smalltalk compiled methods to machine code.
  4. PIC (polymorphic inline caching).
  5. Multi-threading.

What is the big difference between StackVM and CogVM? Well, Stack VM implements 1) and 2). And Cog VM is on top of the Stack VM and adds 3) and 4). Finally, there is CogMTVM which is on top of Cog VM and adds multi-threading support for external calls (like FFI for example).

In addition, Cog brings also some refactors. For example, in Interpreter VM, the Interpreter was a subclass of ObjectMemory. That was necessary in order to easily translate to C. In Cog, there are new classes like CoInterpreter and NewObjectMemory. But the good news is that we can have composition!! The CoInterpreter (which is a new class from Cog) has an instance variable that is the object memory (in this case an instance of NewObjectMemory). This was awesome and required changes in the SLANG to C translator.

As said, in the VMMaker part of the VM, what we called the “core”, there are mainly two important classes: Interpreter and ObjectMemory. Read the first post for details of their responsibilities. In Cog, there are a couple of differences:

  1. As said, the Cog Interpreter class do not subclass from ObjectMemory, but instead it is an instance variable.
  2. In Cog there isn’t only one Interpreter class like in the old VM. In fact each Cog VMs I told you (StackVM, CogVM, CogVMMT) has its own Interpreter class (StackInterpreter, CoInterpreter and CoInterpreterMT). Come on!! Don’t be lazy, take you image and browse them 🙂
  3. In Cog, there are not only those Interpreter classes that I have already told you, but also several more that are just for a design point of view, i.e, they are not Interpreter classes that should be used for compiling the VM. They are for example, to reuse code or to better simulate them. Examples, CoInterpreterPrimitives, StackInterpreterPrimitives, InterpreterPrimitives, etc. And then, of course, we have the Interpreter simulators, but that’s another story for another post.

So…if you are paying attention to this blog you may be asking yourself which Interpreter class you should use? My advice, and this is only my advice, is that you should normally use the “last one”. In this case, the CogVMMT. The few reasons I find not to use the last one are:

  1. If you are running on a hardware where Cog JIT is not supported. For example, for the iPhone the StackVM is usually used.
  2. When you are doing hacky things with the VM and you want to be sure there is no problems with JIT, PIC, etc. This is my case…
  3. Maybe for learning purposes the CogVM or CogVMMT is far much complicated than the StackVM or InterprertVM.
  4. The “last one” may not be the most stable one. So if you are in a production application you may want to deploy with a CogVM rather than a CogVM that has been released just now.

But apart from that, you will probably use the “last one” available. Just to finish with this little section, I let you a screenshot of a part of the Cog VMs hierarchy.

CMakeVMaker available configurations

In the previous post we saw what CMMakeVMMaker configurations do: 1) generate VM sources from VMMaker and 2) generate CMake files. 1) depends on which Cog (StackVM, CogVM and CogVM) we want to build, which plugins, etc. And 2) depends not only in which CogVM but also in the OS (the CMake files are not the same for each Operating System) and other things, like whether we are compiling for debug mode or not, whether we are using Carbon on Cococa library in Mac, etc. So…imagine the combination of: which CogVM, which OS, and whether debug mode or not. It gives us a lot of possibilities 🙂

The design decision to solve this in the CMakeVMake project was to create specific “configuration” classes. To summarize, there are at least one class for VM/OS. So you have, for example, CogUnixConfig (which is a CogVM, for Unix and “release”), CogDebugUnixConfig, MTCogUnixConfig, StackInterpreterUnixConfig, StackInterpreterDebugUnixConfig. And then for the rest of the OS is the same: CogMsWindowsConfig, StackInterpreterMsWindowsConfig, MTCogMsWindowsConfig, etc….So, your homework: browse the categories ‘CMakeVMMaker-Windows’, ‘CMakeVMMaker-Unix’ and  ‘CMakeVMMaker-IOS’. Look at the available classes. To learn, check implementors of #compilerFllags, #defaultInternalPlugins, #interpreterClass, etc…To test, take the debug variant, follow the same procedure as always, and you compile a debug VM with all the debugging symbols and no optimization 🙂

Which one you should use? I have already answered, but imagine you want the “last one”, then they are MTCocoaIOSCogJitConfig, MTCogUnixConfig and MTCogMsWindowsConfig.It doesn’t matter which configuration you choose, all you need to normally do is send the #generateWithSoources.

This design decision has a couple of advantages from my point of view:

  1. It is extremelly easy to customize. And in fact, there are already examples: CogUnixNoGLConfig (which doesn’t links against OpenGL so it works perfect unless you use Balloon3D or Croquet plugins), CogFreeBSDConfig (specially for BSD since it has a couple of differences in the compiler flags), etc.
  2. YOU can subclass and change what you want: default internal or external plugins, compiler flags, etc.
  3. It is easy for a continuous integration server like Hudson to build different targets.

Customizing CMakeVMMaker configurations

I told you that you can subclass from a specific class and overwrite the compiler flags, the default plugins and if they should be internal or external, etc. However, CMMakeVMaker can be parametrized in several ways while using them. In the building instructions at the beginning of this blog, I told you to move your Pharo image to blessed/image. And as I explained in the previous post that was in order to let CMakeVMaker take the defaults directories and make it work out of the box. But in fact, it is not necessary at all to move your image. You can download the “platforms code” in some place and the image elsewhere. Notice that these changes (the ability to customize each direcotry) has been commited in new versions of the CMakeVMMaker package. So, if you want to really try the followin code, make sure to have CMakeVMMaker-MarianoMartinezPeck.94. You can get it using Monticello Browser or Gofer.

So, you can do something like this:

"The image where this code is being run can be in ANY place"
MTCocoaIOSCogJitConfig new
srcDir: '/Users/mariano/Pharo/generateCode/src';
platformsDir: '/Users/mariano/Pharo/vm/git/cogVM2/blessed/platforms';
buildDir: '/Users/mariano/Pharo/vms/build';
"The resources directory is only needed for Mac"
resourcesDir: '/Users/mariano/Pharo/vm/git/cogVM2/blessed/macbuild/resources';
outputDir: '/Users/mariano/binaries/results';

The “platformsDir” must  map with “platforms” directory that we downloaded with Git, it cannot be choosed randomly. The same with the “resourcesDir” (which in fact is only for Mac). The rest of the directories (src, build and output) are not created by VMMaker nor Git. They are just directories that I have created by my own and I want to use them instead of the default.

And I’ve created this shortcut also:

"The image where this code is being run can be in ANY place"
MTCocoaIOSCogJitDebugConfig new
defaultDirectoriesFromGitDir: '/Users/mariano/Pharo/vm/git/cogVM1/blessed';

That way, I don’t need to move my image to blessed/image. BTW, don’t try this with Windows confs because there still a problem. Anyway, despite from that we can also customize things using #internalPlugins:,  #externalPlugins, etc.

Synchronization between platform code (Git) and VMMaker

In this post, I told you the problems I have seen so far with “process” of the Interpreter VM + SVN for platform code. And I also told you how this new process (CMake + Git ) helps a bit in some of those problems. From my point of view there are a couple of things that have improved the process:

  1. Platform code and VMMaker are be in sync: when people (Igor, Esteban, etc) commit a new version to Git, they make sure that the VMMaker part is working.
  2. Documentation of that synchronization: in the previous post, I told you to load version ‘1.5’ of ConfigurationOfCog. Suppose I didn’t tell you that, how do you know for a certain Git version, which version of ConfigurationOfCog you should use?  Check in blessed/codegen-scripts/LoadVMMaker.st  and you have exactly the piece of code you should execute to get the working VMMaker with that specific version of GIT. So…this means that when someone commits the Git repository and such changes require a new VMMaker version, then such developer needs to create a new version of ConfoigurationOfCog, and modify LoadVMMaker.st.  Now that you know this, the steps I told you at the beginning of this posts can be automatic, can’t they?  someone say uncle Hudson?  Yes, of course!!
  3. Git is easier in the fact that people can fork, hack, experiment, tests, and then push changes into the blessed.

Hudson for building VMs

Pharo has a continuous integration server with Hudson: http://ci.pharo-project.org/. And as you can see here, there are a lot of targets for CogVMs. Basically, for every single commit in Git, Hudson builds all those images. How? Following nearly the same steps I told you at the beginning of this post. It creates StackVMs, CogVMs and CogVMs for every OS. In fact, there are no Windows builds yet because this week they are getting the Windows slave. But the confs and the procure is working…So it is just a matter of getting the Windows box.

Conclusion: you don’t need to wait one year an a half to get a VM with a bug fix, nor you don’t need to compile it by yourself. With Hudson, they are built for every commit.

Hudson traceability

We saw how we can trace from platform code to VMMaker. Now, how to know how was every Hudson VM build ? Easy:

  1. Go to http://ci.pharo-project.org
  2. Choose a target in the “Cog” tab. For example, I choose “Mac Cog Cocoa”
  3. Follow the link, for example Cog Unix,  and there you can see two artifacts:
  • a built VM
  • a source code tarball, which is used to build that VM (in this example, CocoaIOSCogJitConfig-sources.tar.gz)

If you Download the source code archive and unpack it into your local directory what would you expect?? Of course, a copy of the git directory plus the Pharo image generated to build such VM. Such image is in build/ subdirectory and it is called generator.image and was the used to generate source code (located in src/ subdirectory) and CMake configuration files (located in build/ subdirectory). Isn’t this cool ?

CMake generators

Did I already tell you that I am also a CMake newbie? Ok…just in case 😉  Anyway, imagine CMake like a tool where we can set things, parameters, variables, directories, etc, in some files (which in our case they are auto-generated by CMakeVMMaker) and then from those “general” files we can generate specific and different makefiles. So, from the same CMake files we can generate different kind of makefiles,  i.e, we can generate makefiles the way some IDE except them to be. CMake call this ability “generators”. And the way to create makefiles with a specific generator is like this:

cmake -G "Generator Name"

Does that sound familiar?? Of course! We have already used them for MSYS in Windows. The cool thing is that there are generators for several IDEs. And this is just GREAT. For example, I can create makefiles and a project for XCode (the C IDE for MacOS). Just doing:

cmake -G Xcode

creates a XCode project for CogVM which is  in /blessed/build/CogMTVM.xcodeproj. You don’t have an idea how cool is this. This mean you can open XCode and everything is set and working out of the box for the CogVM. You can put breakpoints, inspect C code, compile, debug, everything….Before, this was much more complicated because the .xcodeproj file was versioned in the SVN and this file usually keeps some file locations or things like that and in my experience, it was always a pain to make it work.

When you use a particular generator for an IDE (like Xcode, Eclipse, KDevelop, Vsual Studio, etc, you usually don’t do the “make” by hand. So, after invoking cmake, you won’t need to do a make. Instead, you have to compile from the IDE itself (which should have the correct makefiles).

How do you know which are the available generators?  just type:

cmake --help

and at the end you’ll find a section that says “The following generators are available on this platform:”  and each of them has a name and a description. What you need to pass to the -G parameter is the name. Notice that as the help says, it automatically shows the generators available in YOUR platform (OS).  Some examples:

cmake -G KDevelop3
cmake -G "Eclipse CDT4 - Unix Makefiles"
cmake -G "Visual Studio 10"
cmake -G "Borland Makefiles"

When the name includes more than one word you must use the double quotes.

So…the 2 main advantages I see from CMake to our process is: cross compiling, and be able to automatically create makefiles for IDEs. Sorry I couldn’t try with other IDE than Xcode. If you try it and it works, let me know 🙂

In the next post we will so how to debug the VM and some related tricks. After that post, we will probably start to see the VM internals since you will have already all the needed tools.

Building the VM from scratch using Git and CMakeVMMaker

So…this is the post all hackers were waiting for! Take a beer, open your Pharo image and prepare your terminal 🙂   In this post, we will see how to build a VM from scratch. I will use Mac OS along this post, but I will also explain how to build in Linux and Windows (yes, I had to do that for this post!). I will follow the instructions together with you so that they are correct and working heehhehe. In addition, I am new with both Git and CMake…so if I say something wrong, please correct me.

The VM is a huge beast and no matter all the effort people do to ease its build, it can fail. But don’t get disappointed, if there are problems, please ask!

Remember (I told you this in the previous post) that there are different ways of building the from scratch. In this case we will use GIT (instead of the SVN repository) and CMakeVMMaker (instead of VMMakerTool) and we will build a Cog VM. These tools are part of what I called “new infrastructure” in the previous post.

Installing necessary tools on each OS

For all the OS we need a couple of things: Git client, CMake, and gcc/make. I will tell you what I think is needed in each OS. I know it is painful, but at least, you have to do it only once…Don’t be lazy. If this get’s longer, you have the excuse for the second beer 😉

Mac OS

  1. “Just” (4gb) installing the whole XCode is enough for compiling. XCode package comes with everything: gcc, make, cmake, etc. You can download it from Apple’s website. For Lion or XCode 4.2 users, read the last section of the blog (“Problems I have found so far”) .
  2. The Git client for Mac OS. Try to put Git in PATH.
  3. To check if everything is installed, open a terminal and try to execute the different commands: git, make and cmake.


  1. You need to install the development tools such as gcc, make, etc. In Ubuntu, you need to install the “buildessential” package (sudo apt-get install build-essential).
  2. Install cmake -> if it is not already included something like “sudo apt-get install cmake” should work in all Ubuntu and forks.
  3. Install Git client (in my Ubuntu this was with “sudo apt-get install git-core”). You may want to put Git in PATH. Usually, when installing with a package system it automatically adds binaries to the PATH.
  4. To check if everything is installed, open a terminal and try to execute the different commands: git, make and cmake.


  1. Download and install MinGW and MSYS, with C/C++ compiler support:  http://www.mingw.org/wiki/msys. To install mingw+msys using single, painless step, one should download latest installer from here: http://sourceforge.net/projects/mingw/files/Automated%20MinGW%20Installer/mingw-get-inst/
    Today, the mingw-get-inst-20110530 is latest one.
  2. Download and install Git: http://code.google.com/p/msysgit/ During the installation, I choose the option “run git from from the windows command prompt”. Optional: add git to the PATH variable so that you can see git from msys. To do this, add path to git for msys:
    Control panel -> System -> System Properties / Advanced  [ Environment Variables ].   There should be already: ‘C:\Program Files\Git\cmd’. Add ‘C:\Program Files\Git\bin’  Notice that the path may not be exactly ‘C:\Program Files\Git\’ but similar…
  3. Install CMake: http://www.cmake.org/cmake/resources/software.html (during installation, in install options , make sure that you choose to add CMake to PATH).
  4. To check if everything is installed, open MSYS program (which should look like a UNIX terminal) and try to execute the different commands: git, make and cmake.

From now, when we refer to “terminal” we refer to whatever it is: iTerm and friends in Mac OS, Unix/Linux terminals, and MSYS in Windows.

Downloading platform code from GIT

If and only if you plan to modify and publish your cahnges, you will need to create an account in https://gitorious.org/ in case you don’t have one already. If you just want to download a project, then you don’t need an account. Gitorious is not more than a nice, free and public GIT hosting. Creating an account there is a little tricky because you need to SSH key. But try it and if you have problems check in the web because there is a lot of documentation and blog posts about that. Once you have your account working, continue with this…

The goal of this post is not to talk about GIT so we will do it fast 🙂 If you are reading this post, you may probably be a Smalltalk hacker. A real Smalltalk hacker doesn’t use a UI program, but instead command line jeheheh. Seriously, for this post it is easier to just tell you the git commands instead of using a Git front end. So..all git commands should be run from a command line. What are you waiting for? Open a terminal go to your prefer directory and create your workspace for today:

mkdir cogVM
cd cogVM

The Cog VM repository is https://gitorious.org/cogvm. We need to download the “platform code” from there. To do this, we have two options. The first one is to just clone the CogVM project to your local directory. This option is the most common and it is what you will do if you just want to load CogVM (and not modify it or commit changes). It is like doing a “svn co” for the SVN guys.

git clone git://gitorious.org/cogvm/blessed.git

Normally, you can also pass the argument “–depth 1” and do “git clone –depth 1 git://gitorious.org/cogvm/blessed.git”. This is just for avoiding to download all the history and it just downloads the HEAD (at least that’s what I think it does). In this post we are not going to use “–depth 1” and I will explain later why not. The second option is to clone Cog VM to your own Git repository. This has to be done from the Git website (maybe it can be done from command line but I don’t know it): login in Gitorius, search the project CogVM in the search input, select the CogVM project and you will find a “clone repository” button. Click on it and wait. Once it finishes, you will have your own fork of CogVM, and the Git repository is something like this: git://gitorious.org/~marianopeck/cogvm/marianopecks-blessed.git. Now, the last step is to clone from your own fork to your local directory (as we did in the first option). This should be something like:

git clone git://gitorious.org/~marianopeck/cogvm/marianopecks-blessed.git

For this post, I recommend to take the first option if you may be a beginner. I told you that I wanted my posts to be reproducible. With the previous commands, you will clone the latest version in the repository. Since I don’t know when you are going to do it (if there is someone), I would like that you load the specific version I know it works. What I am suggesting is doing a kind of “svn co http://xxxx -r 2202”. I checked how to do this in git, and it seems not to provide a clone of a specific version. Instead, you just clone (from the latest one) and then you checkout or revert to a previous one. Execute:

cd blessed
git checkout f3fe94c828f66cd0e7c37cfa3434e384ff65915e

Notice that you can do this because we have downloaded the full history of the repository. If we would have added the “–depth 1” parameter during the first clone, then we should be having an error like “fatal: reference is not a tree: f3fe94c828f66cd0e7c37cfa3434e384ff65915e”.

f3fe94c828f66cd0e7c37cfa3434e384ff65915e is the commit hash of the version I want. You can do “git log” to see the latest commits or “git rev-parse HEAD” to see the last one.

Ok, if you could successfully load everything from Gitorious you should have something like this:

mariano @ Aragorn : ~/Pharo/vm/git/cogVM/blessed
$ls -la
total 72
drwxr-xr-x  16 mariano  staff    544 Apr  7 00:14 .
drwxr-xr-x   3 mariano  staff    102 Apr  7 00:07 ..
drwxr-xr-x  13 mariano  staff    442 Apr  7 00:14 .git
-rw-r--r--   1 mariano  staff   6651 Apr  7 00:13 .gitignore
-rw-r--r--   1 mariano  staff     41 Apr  7 00:13 CHANGES
-rw-r--r--   1 mariano  staff   1112 Apr  7 00:13 LICENSE
-rw-r--r--   1 mariano  staff  13597 Apr  7 00:13 README
-rw-r--r--   1 mariano  staff     17 Apr  7 00:13 VERSION
drwxr-xr-x   8 mariano  staff    272 Apr  7 00:13 codegen-scripts
drwxr-xr-x  13 mariano  staff    442 Apr  7 00:13 cygwinbuild
drwxr-xr-x   3 mariano  staff    102 Apr  7 00:13 image
drwxr-xr-x  24 mariano  staff    816 Apr  7 00:13 macbuild
drwxr-xr-x   7 mariano  staff    238 Apr  7 00:14 platforms
drwxr-xr-x   4 mariano  staff    136 Apr  7 00:14 processors
drwxr-xr-x   8 mariano  staff    272 Apr  7 00:14 scripts
drwxr-xr-x   6 mariano  staff    204 Apr  7 00:14 unixbuild

I have highlighted two lines that represent two important directories: “/platforms” and “/image”. For the moment, lets explain what “/platforms” is…come on, you should guess it! Yes, there, in that folder, it is the famous “platform code”. You can enter to such directory and see the C code by yourself.

Downloading Cog and dependencies

So far we have loaded from Git the “platform code”. We are missing the other part, VMMaker. In the previous post I told you it may not be necessary to always download VMMAker and generate sources, because such sources may also be commited in the repository. Having the auto-generated source code in the repository is a trade-off, it has advantages and disadvantages. In this “new infrastrucutre” under Git, it was decided to remove it. So if you want to compile the VM you have to load VMMaker and translate it to C. You can read the explanations here and here.

So…we need to download VMMaker (Cog branch) and translate it to C. But of course, we first need a Smalltalk image. I told you in the previous post that I want all my posts to be reproducible. So, take this PharoCore 1.3 image. Notice that zip comes not only with the .image and .sources files, but also the .sources. If you don’t know what .sources file is, you should read Pharo By Example chapter 1, section 1.1 “Getting Started” 🙂  You need the .sources because it is necessary in order to generate the sources of the VMMaker. The image of such zip can be opened with both, Interpreter VM and Cog VM. However, if you open it with Cog, and you save it, then you won’t be able to run it with the Interpreter VM (this was fixed in the latest code of Interpreter VM but there isn’t yet an official VM release for all OS that contains this fix). Thus, I recommend you to run the image with a CogVM. If you don’t have one already, you can download this one.

Now, let’s load VMMaker branch for Cog and all its dependencies. You have already learned that Cog has dependencies in other packages and to solve that, among other problems, we use a Metacello configuration for it. The following code may take time because since we are evaluating this in a PharoCore image where Metacello is not present, Metacello needs to be installed first. In addition, VMMaker is a big package…Fortunatly we are running with a CogVM 🙂    So, take the image and evaluate:

Deprecation raiseWarning: false.
Gofer new
squeaksource: 'MetacelloRepository';
package: 'ConfigurationOfCog';
((Smalltalk at: #ConfigurationOfCog) project version: '1.5') load.

Gofer new
squeaksource: 'VMMaker';
package: 'CMakeVMMaker';
version: 'CMakeVMMaker-MarianoMartinezPeck.83';

IMPORTANT: How do you know you have to load version ‘1.5’ ? Yes, because I am telling you hahha. But how do I know that 1.5 is the one that works with the current version of Git?  This is answered in the next post under the title “Synchronization between platform code (Git) and VMMaker”.

One of the cool things I like from Metacello configurations is the possibility to query them. For example, do you want to all the packages that are installed while doing the previous code? Just inspect or print:

(ConfigurationOfCog project version: '1.5') packages.
ConfigurationOfCog project versions.

Now, if you are curious about how defining versions and baselines is achieved in Metacello, take a look to the methods ConfigurationOfCog >>#version15:  and ConfigurationOfCog >>#baseline13:.

Generating VM sources and CMake files

What have we done so far? We have just download all the VMMaker branch for Cog with all its required dependencies. Now, its turn to translate from SLANG to C and to generate the CMake outputs so that we can compile after. This C code generated from VMMaker is known in the Squeak/Pharo VM world as “the sources”. But don’t confuse, the platform code is also source code…but anyway, when usually someone says the “sources” it means the autogenerated C code from VMMaker. This is why this C code is placed (by default) in /src.

To do both things (generate VM sources and CMake outputs), we use one of the available CMakeVMMaker configurations. Metacello configurations are represented by classes (instead of XML like maven or similar package management systems). How do you think CMakeVMMaker configurations are represented?? Of course!! with classes too. So, we need to find the accurate class for us. I won’t go further now because that’s topic of another post, but for the moment, lets use these classes CogUnixConfig, CogMsWindowsConfig and CocoaIOSCogJitConfig depending in which OS you are.

A little remark for Mac users: you can notice that there are two categories with CMake Mac configurations, ‘CMakeVMMaker-MacOS’ and ‘CMakeVMMaker-IOS’. The first one, is for Mac OS versions that use Carbon library. ‘CMakeVMMaker-IOS’ contains CMake configurations that use Cocoa instead, which is the new library for Mac OS. Carbon is legazy, and it may be removed soon in next MacOS versions. So, the new ones, and the ones you should use, are those configurations under ‘CMakeVMMaker-IOS’.

These configurations are flexible enough to set specific directories for sources, platforms, results, etc. In addition, if you follow certain conventions (defaults), the build is more automatic. For the purpose of this post, we will follow the conventions and use the expected default directories. The only real convention we should follow is that the .image should be in a subdirectory of the directory where you downloaded the GIT code. In my case (see the bash example at the beginning of the post), it is ~/Pharo/vm/git/cogVM/blessed.  So, I moved the .image to ~/Pharo/vm/git/cogVM/blessed/image. You can create your own directory  ~/Pharo/vm/git/cogVM/generator  and place it there. The only requirement is that the ‘platforms’ directory is found in ‘../platforms’.  So…did you move your image?   Perfect, let’s continue.

No…wait. Why you need ‘platforms’ directory if we are not really compiling right now?  Ask yourself…do you think VMMaker translation to C needs the platform code?  Nooo!  So…we only need the platform directory for the second part, for  CMake. Now yes, we continue…take the Pharo image (which should have been moved) and evaluate:

"CocoaIOSCogJitConfig is an example. If you are not in Mac, replace it with CogMsWindowsConfig or CogUnixConfig"
CocoaIOSCogJitConfig  new
"Using VMMaker we translate Cog to C"
"We generate all the CMake necessary directories and files"

Ok…As my comments say, #generateSources uses VMMaker class to translate from SLANG to C. Instead of using a UI (VMMakerTool) we directly translate by code…but..do you remember that for compiling the VM we needed to say which plugins to compile and whether to compile them like internal or external? Ok…At that moment I told you that most of the developers shouldn’t be aware of that. In this case, CMakeVMMaker does the job for us. We will come later to this topic, but if you want to know which plugins are compiled, check implementors of #internalPlugins and #externalPlugins. Once again, CMakeVMMaker has defaults for these things, but you can customize and change them.

Where is the C generated code ? By default (yes, it can be changed) is placed in ‘../src’. In my example, it should be in ~/Pharo/vm/git/cogVM/blessed/src and should look like this:

mariano @ Aragorn : ~/Pharo/vm/git/cogVM/blessed/src
ls -la
total 16
drwxr-xr-x   6 mariano  staff   204 Apr  9 14:55 .
drwxr-xr-x  18 mariano  staff   612 Apr  9 14:55 ..
-rw-r--r--@  1 mariano  staff   776 Apr  9 14:55 examplePlugins.ext
-rw-r--r--@  1 mariano  staff    83 Apr  9 14:55 examplePlugins.int
drwxr-xr-x  43 mariano  staff  1462 Apr  9 14:55 plugins
drwxr-xr-x  11 mariano  staff   374 Apr  9 14:55 vm

In a future post, we will go deeper in how is the C translated cog…but if you want to take a look, go ahead!! Inspect the file /src/vm/cointerp.c  for example 🙂   So…do you already love SLANG? hehehe

With the method #generate we create all the directories and files needed by CMake so that we can after use CMake to generate different makefiles. You will notice that this method creates a directory /build. In my case, it is ~/Pharo/vm/git/cogVM/blessed/build. If you check inside that directory, there are a couple of important files generated for CMake (so that we can use it after), such as CMakeLists.txt, directories.cmake, etc. If you are curious, take a look to them.

If you are interested, I strongly recommend you to take a look to both methods: #generateSources and #generate. Now that I have explained the two big steps, I can tell you that there is a shortcut:

CocoaIOSCogJitConfig generateWithSources

Using CMake and compiling the VM

We are almost done…we already have all the necessary C code, and all the CMake files and directories. The next step is to just use CMake. I am newbie in both Git and CMake. But as far as I could see, CMake is a wonderful tool for being able to generate different makefiles from the same “definition”. So, we have already tell CMake which was our code, the directories, compiler flags, ec, etc. Then CMake can take such information and generate different makefiles: UNIX makefiles, MSYS (for Windows), XCode, Visual Studio, etc…In this post we will see just how to use regular Unix makefiles.

Now…come back to your terminal. We need to first go the /build directory and then execute CMake. In MacOS and Linux, evaluate:

cd build
cmake .

Now, in Windows we are compiling in MSYS, so we need to create special makefiles for it. The way to do this with CMake is using the parameter -G”Generator Name” where “Generator Name” is “MSYS Makefiles” in this case. So, in MSYS (in Windows) we evaluate:

cd build
 cmake . -G"MSYS Makefiles"

Once that is done, we have created all the necessary makefiles. Now, the last pending thing is just to “make”. It is the same whether you’re on windows or not. The makefiles have been generated, son only a make is needed:


Hopefully, you didn’t receive any compilation error and you can find your VM binary in /results (in Windows it is under /build/results) . Again, in my case it is ~/Pharo/vm/git/cogVM/blessed/results.

Problems I have found so far


It seems that the default CogUnixConfig needs OpenGL dev files (headers) and libs.This is because some plugins like Croquet or Balloon3D require such lib. And those plugins are being included by default in CogUnixConfig. So in my case I’ve got the error “The file was not found sqUnixOpenGL.h” which I fixed by installing the dev package:

sudo apt-get install mesa-common-dev

Then, I have a problem at linking time, “/usr/bin/ld: cannot find -lGL”, which I solved by doing:

delete /usr/lib/libGL.so
cd /usr/lib/
sudo ln -s libGL.so.1.2 libGL.so

Notice that other people have experienced the same problem but with the libSM.so (lSM) and libICE.so (lICE). Such problem could be resolved the same way. You may  want to use the command “locate” first to see where the library is and then do the “ln”.

Another solution (but I couldn’t test it by myself) to avoid using “ln”, could be to simply install the package libgl1-mesa-dev (sudo apt-get install libgl1-mesa-dev).

After having done all that, I then realised there is a special  CogUnixNoGLConfig  which seems to avoid linking to OpenGL. The VM will work until you use something like Croquet or the Ballon3D. For a more detailed explanation, read this thread.

If you have the error “alsa/asoundlib.h: No such file or directory” then you should install libasound2-dev, for example, by doing “sudo apt-get install libasound2-dev”. Be aware that depending on your Linux distro and what packages you have installed on in, you may require to install a couple of packages or non. If you have a problem with a C header file not found (a .h file) you will probably need to install the “dev” package of the project. And if what it is not found is a library (a .so for example), then it is likely you will need to install the package that contains such libs. How do you know which package contains a specific header file or lib ? I have no idea. I always go to Google and I find the answer.


As you can read in this thread I could generate the VM in Windows by myself but I had a problem at compile time: “cc1.exe: error: invalid option `no-fused-madd'”. To solve that problem I edited the method CPlatformConfig >> configureFloatMathPlugin:.  I’ve changed the line “maker addDefinitions: ‘-O0 -mno-fused-madd’.”  to “maker addDefinitions: ‘-O0’.”. The effects of this change, may be that Croquet doesn’t work properly. For further details, read another thread.

Ok, this is the end of the post. I still remember (ok, only two years ago) the first day I could compile the VM. I felt so hacky and happy at the same time. I remember I chatted with my SqueakDBX companions about that ehehhehe. So…if I could help you to successfully build your own first VM, you own me a beer (yes, I will be at ESUG hahaha). If you spent a couple of hours and you couldn’t…..mmmm… ask the mailing list 🙂    Seriously, as I told you, compiling the VM from scratch is complicated, even with all the effort made to ease that. If you had problems, please ask in the mailing lists. They will probably be able to help you, and your questions will make this process easier in a near future.

In the next post we will see some advanced topics of compiling the VM, and after that we will start to take a look to the VM inside. Once again, thanks Esteban, Igor, and everbody who answered my questions in the mailing list.


As far as I understand, if you update to the latest XCode (4.2) from an older version (in Snow Leopard) it still  includes the GCC but it sets LLVM as the default compiler. In a fresh installation of Lion/XCode, not only LLVM is the default compiler but also GCC is not installed. To check which compiler area you using, execute in a console “gcc –version”. If it says something like “i686-apple-darwin11-gcc-4.2.1” then it is correct, you are using GCC. If it says something like “686-apple-darwin11-llvm-gcc-4.2” then it means you are using LLVM by default.

If GCC is not installed, you can install it via Mac Ports by doing “sudo port install apple-gcc42”.  You can follow http://stackoverflow.com/a/8593831/424245 to get it to appear in  Xcode, the last two steps will probably look like:
a) sudo ln -s /opt/local/bin/gcc-apple-4.2 /Developer/usr/bin/gcc-4.2
b) sudo ln -s /opt/local/bin/g++-apple-4.2 /Developer/usr/bin/g++-4.2

The CogVM needs GCC and cannot be compiled right now with LLVM so you have to use GCC. There are a couple of possibilities:
1) Change the default compiler for the whole system. To do this you have to edit the symbolic link of gcc to point to the real GCC and not LLVM. I do not recommend that much this option since you may be affecting the whole system.
2) When you do the cmake, reacher than simply do “cmake .” do: “cmake -D CMAKE_C_COMPILER=gcc-4.2 -D CMAKE_CXX_COMPILER=g++-4.2 . ” If that doesn’t work, try “cmake -D CMAKE_C_COMPILER=gcc-apple-4.2 -D CMAKE_CXX_COMPILER=g++-apple-4.2 .”. I would recommend this option for those who are building for the first time.
3) Instead of doing 2) by hand, you can use a patch in CMakeVMMaker that does that for you. In such a case you should use the config CogMTCocoaIOSGCC42Config. Notice, however, that such patch was added in the latests versions of CMakeVMMaker. In this post, I put specific versions of each part of the VM building system. Therefore, if you want to use the latest version of CMakeVMMaker you should also use the latest code from git and from ConfigurationOfCog.

For more details see this thread and this one in the VM mailing list:

First stop: VM’s SCM and related stuff

You want to compile your own VM, don’t you? Compiling the VM just for compiling it and following some instructions is not really helpful, otherwise why don’t you directly download the VM binary ?  My idea with this sequence of posts is that you understand and learn about the VM.

So…in order to compile the VM, you will have to deal with the problem of the VM’s Software Configuration Management. The first time I tried to compile the Pharo/Squeak VM was like 2 years ago. After that, I tried few times more, and most of the times I have some troubles. In addition, in the last months not only there have been a lot of changes related to code versioning and management, but also Cog VM come into play. So….a lot of people is confused where each part of the VM is committed, or what is needed to compile each VM. I will try to clarify all that so that in the next post we can finally compile the VM by ourself.

Since the Interpreter VM and Cog VM are a little different regarding the code management, I will split them.

Interpreter VM

Downloading code

So, if you remember from the previous post, we have 2 parts: VMMaker with the core of the VM, and the platform code. For the VMMaker it is easy: it is the VMMaker package in squeaksource. The platform code is the official SVN. This sound pretty straightforward, doesn’t it ?  but this is not true sometimes. There are several problems (some may probably be because of my ignorance) that I have found with this approach:

  1. The package VMMaker is not self contained, i.e, it has dependencies on other packages (some packages in the same repository and some in others). So…first problem, you need to know which other packages you need. For example, to build the VM you may need also the packages: ‘FFI-Pools’, ‘SharedPool-Speech’, ‘MemoryAccess’, ‘SlangBrowser’, ‘Balloon3D-Plugins’, ‘Plugin-XXX’, etc.
  2. Similar to the previous item, there is not only the problem of knowing which packages are needed, but instead which version. So…how do you know that for ‘VMMaker-dtl.221’ you need ‘FFI-Pools-eem.2’, ‘MemoryAccess-dtl.3’, ‘Balloon3D-Plugins-ar.6’, etc ?  Using just the last version of every package does work all the times.
  3. Sync between VMMaker and platform code. How do you know for each VMMaker version which SVN version you need of the platform code? or vice-versa how do you know which VMMaker version you need for a specific SVN version? once again, relying in the last version is not a reliable solution.
  4. Similar to 3) there is yet another problem: the platform code, as you can imagine, is split in one folder for every platform (see SVN): there is one for UNIX, one for Windows, for MacOS, and for iOS (but forget this one for the moment). Each platform has a “leader” or “maintainer”, which is the person in charge of implementing/modifying the code. The problem raises when there are changes in VMMaker for example, that require changes in all platform code, and this is not changed in all of them. So for example, in UNIX the changes are committed, but not in Mac OS. So…each platform code is not always in sync with the rest. Note that I am not complaining: this is all open-source and we all do our best. I am just telling you the problems I have seen so far.
  5. The previous problem happens not only for the commits in the repository, but also for the VM releases. Most of the times, they are not in sync. Maybe there is a particular platform that releases 5 times in a year, and maybe there is another one every 1 year and a half 😦
  6. The version of every VM are not in sync. So for Mac for example you have Squeak 4.2.5beta1U, Squeak, Squeak 5.8b4, etc. For UNIX, Squeak-, Squeak-vm-3.7-7, Squeak, etc.  In Windows, SqueakVM-Win32-4.1.1, SqueakVM-Win32-3.11.5, SqueakVM-Win32-3.10.9, etc. So as you can see, they are all completely different, and for me this is complicated since you cannot just refer to a unique VM version.
  7. The SVN repository is restricted, so you cannot commit unless you have authorized access. This could be a good and bad point at the same time.

I want to make it clear: I am not complaining against this, I am just telling the problems I have found, and how certain infrastructure that has been done in the last months helped with some of these issues.

So….you know that VMMaker is just another Monticello package, and you also know that you have to manage versions, dependencies, why not groups, etc…Does that ring a bell with anyone?  YEEES! Of course, Metacello 🙂  So, one thing we did in Pharo (although I guess it is/was also used in Squeak), is to create a Metacello Configuration for VMMaker: ConfigurationOfVMMaker. For those that doesn’t know what Metacello is, it is a Package Management System on top of Monticello, where the ConfigurationOfVMMaker is a class where you can define versions, dependencies, etc, about your project. If you are a Smalltalker and you don’t know anything about Metacello I recommend you to take a look.

Anyhow, with ConfigurationOfVMMaker we solved the first two problems. With Metacello baselines we define all the structural information of the Interpreter VM: which packages are needed (the dependencies), possible groups (not everybody wants to load the same packages), repositories, etc. And with Metacello versions, we can define a whole set of working versions. For example, for ConfigurationOfVMMaker version 1.5 we load ‘VMMaker-dtl.221’, ‘MemoryAccess-dtl.3’, ‘FFI-Pools-eem.2’, etc. This is a set of frozen versions that we known to work properly all together. Notice that creating versions for ConfigurationOfVMMaker should be done by the “VM developers”. In fact, it was done by people like Torsten,  Laurent and me. But the important thing is that the user doesn’t need to do that. The only thing the user needs to do in order to load VMMaker with all its dependencies, and all loading a working version of every package, is to load the Metacello version. Do you want to try by yourself?  Just take this Pharo image, and evaluate:

Deprecation raiseWarning: false.
 Gofer new
 ((Smalltalk at: #ConfigurationOfVMMaker) project version: '1.5') load.

Sorry for the ugly colors…wordpress.com doesn’t have Smalltalk 😦

Why I told you to download that particular Pharo image? and why I am explicitly loading the version 1.5 instead of using the last one?  Because I want that my posts are reproducible. If you evaluate this instead:

 (Smalltalk at: #ConfigurationOfVMMaker) project lastVersion load.

I cannot guarantee that everything will be working properly. The same with the Pharo image. If you took any Pharo image 1.0, or 1.1 or 1.2, or Squeak 4.2, I am not sure that VMMaker will load correctly. The same if you load another version than 1.5. So…in this case, I am sure (because I test it before posting) that with that Pharo image and that version of ConfigurationOfVMMaker, VMMaker is working properly.

Coming back….the last point 3) is not yet solved, since you cannot know that for a certain SVN version you need XXX version of ConfigurationOfVMMaker, or vice-versa. But we will come to this later on…The rest of the problems are not solved either.

Generating the VM

You need both things to compile the VM: the C platform code that is directly committed in SVN and the generated C code from the VMMaker. Do you always need to translate VMMaker to C ? Not necessary, because the generated code is also committed in the SVN, usually under the “/src” folder, for example here. It is there so that if someone wants to compile, just download both parts and with GCC it compiles the VM. No need to take a Smalltalk image, load VMMaker, and generate sources. So… when is it really needed to generate sources from VMMaker?

  1. When the /src in the SVN is outdated in relation to the platform code.
  2. When you did changes in VMMaker. You can do changes in VMMaker just for fun, for your own project, for testing, etc.
  3. For learning purpose 🙂

So…how do you compile the VM?  yes, of course, using a C compiler…but that’s not enough information! For example, usually you need to place the /src folder (where the output of the generated VMMaker sources go) in a certain place so that it is found by the makefiles. Even more, the problem is that each platform has its own instructions of how to compile. You can find the instructions for UNIX here, for Windows here, and for Mac OS (after searching this info for a long time) it seems (if it is not this please let me know) to be here.

Not only each platform has its own instructions to build the VM, but also some lack support for IDE. For example, it is not easy to b able to compile the VM out of the box with Microsoft Visual Studio or with Appel’s XCode. For example, for XCode, you need a .xcodeproj file for every project. The problem was that most of the times (at least when I tried) this file contained file locations of the commiter (which of course is different from mine). So, at the end, I usually need to do some modifications to the project in order to being able to compile and run the VM from XCode. I am telling you all this so that you can understand the progress we (the community) did in the last months…

Internal and external plugins

Before going further, let me do a little remark: did you remember I told you about the VM plugins?  Like FilePlugin, SocketPlugin, etc. Well, plugins can be compiled in two ways: internal or external. Internal plugins are linked together with the core classical VM, i.e, they are inside the binary file of the VM. External plugins are distributed as separate shared library and the cool thing is that you don’t need to do anything at all to the VM. At runtime the normal/standard VM can just load an external plugin and use it. Whether you should compile a plugin as internal or external is out of scope of this post. What is important here is that:

  • the normal guy that just wants to compiled the VM shouldn’t need to know how each plugin must be compiled.
  • there are some plugins that only work when they are compiled in one of the two ways.

Generating the VMMaker sources

Imagine that for any reason (maybe one of the above mentioned) you need/want to translate VMMaker package to C. How do you do that? The default approach with the Interpreter VM is by using a tool called VMMakerTool, which at the same time it is the name of the class 😉   So…VMMakerTool is a class which is in the VMMaker repository and it is a UI that let you generate the sources. Here you can see a screenshot:

To reproduce the screenshot, just evaluate:

VMMakerTool openInWorld

The tool is pretty cool since it lets you to do a lot of things: choose which plugins to include and choose whether you want them internal or external, you can set the source output directory, the platform code directory, the CPU architecture (32 or 64 bits), etc. This tool is awesome, but from my point of view, it is too much for a non-VM-hacker guy. Why? Because of what I have already told you: the normal user shouldn’t need to know which plugins to include nor if they should be internal or external. At the same time, following some conventions, the directory for platform code and sources could be automatically set.

Fortunately, VMMakerTool is just the UI and it relies in the “model”, which is the VMMaker class (yes, VMMaker is the name of the squeaksurce repository, the name of one of the packages and also one of the classes heheheh).With the class VMMaker we can do the same of VMMakerTool but from code. Example:

| sourcePath |

"The path where I load from SVN"
sourcePath := '/Users/mariano/Pharo/VM/svnSqueakTree/trunk'.

"Generate new sources"
VMMaker default
 platformRootDirectoryName: sourcePath, '/platforms';
 sourceDirectoryName: sourcePath, '/platforms/iOS/vm/src';
 internal: #(

 "lots of plugins more.....I let few just for the example"

 external: #();

So…suppose that someone provides you the list of plugins for every platform, knowing which of them should be internal and which external, and following some conventions everything can be automatic?  Ok….we are going there, don’t worry 😉

Cog VM

The infrastructur for the Cog  VM is a little messy for me so I would try to do my best to explain it. Cog started as a fork of the Interpreter VM. So…imagine that you want to create a fork for VMMaker (in squeaksource) and another fork in the SVN for the platform code. Monticello doesn’t provide real and easy branch support, so Cog needed to do something weird (at least for me). Suppose that a regular version of the VMMaker package is ‘VMMaker-dtl.161’. In this case ‘dtl’ is the initials of the committer, Dave Lewis. So…how does the Cog branch in VMMaker looks like???  they are just normal versions, but whose committer is ‘oscog’ (I guess this is because of Open-Source Cog). Example: ‘VMMaker-oscog.54’. That means that in order to load Cog, you need to open the VMMaker package, and search for a version that matches ‘VMMaker-oscog’. There is where Eliot commits the official Cog versions.

Exercise: Take a Monticello Browser, add the VMMaker repository and browse the version ‘VMMaker-dtl.223’. Then, browse ‘VMMaker-oscog.54’ and notice the difference between them. For example, in ‘VMMaker-oscog.54’ there are several categories that are not even present in ‘VMMaker-dtl.223’, like ‘VMMaker-JIT’, ‘VMMaker-Multithreading’, etc. Even more, the same categories contain different classes.

Now, regarding the branch in the platform code, this is much easier since it is a regular SVN branch which can be found here.

Fortunately, people have also developed a ConfiguraionOfCog which follows the same idea of ConfigurationOfVMMaker.

One difference I found with the regular VM is that Cog is supposed to be translated to C using VMMaker class directly (not VMMakerTool). You can see how to do it in this workspace.

So, in summary, they way to compile Cog VM is more or less the same as the Interpreter VM: you take a Smalltalk image, you load Cog (you can use ConfigurationOfCog), you generate sources, you checkout SVN branch, and finally compile (the instructions of how to build each VM is in the same SVN). Generating the sources may not be necessary if the /src is in sync with /platforms.

Finally, notice also that Eliot usually uploads regular VM builds (Cog VM binaries for all OS) to this url.

New infrastructure

The same way we should thanks Teleplace for Eliot Miranda’s work, we should also thanks INRIA for paying a Pharo engineer: Igor Stasenko. The good news is that since he started a couple of months ago he was not working for Pharo but instead for a new VM infrastructure . What is all this about? I’ll give you only a quick introduction because in the next post we are going to compile the VM using such infrastructure. Disclaimer: this new infrastructure is only for Cog VM and all its variants, but not Interpreter VM. I guess that’s because of the resources/time available.

So…in a nutshell, there are 3 big changes:

  1. Use GIT instead of SVN. There is a new repository for the platform code which is versioned by GIT instead of SVN. There is a new account for CogVM in gitorious. It seems that nowadays if you are not in Git you are not cool, you do not exist. Ok, we are cool now 🙂  No one needs to ask for a blessing, everybody can clone, hack and push/share changes. People can pick the changes without having to have the permissions to publish.
  2. Use CMakeVMMaker instead of VMMakerTool. CMakeVMMaker is a little tool that automates the build. It has two important things: 1) translate VMMaker to C, using the VMMaker class and 2) generate CMake files so that to ease the build. To do this  it automatically assumes (although it can be customized) which plugins are needed and how they are compiled (if internal or external), the needed compiler flags, the directories needed, etc. CMake is an excellent tool for doing cross-platform compiling and for automatic stuff….CMakeVMMaker generates all the necessary files for CMake. For those who doesn’t know what CMake is, imagine one abstract step before makefiles. CMake is a cross-platform, open-source build system where you can define all necessary stuff like directories, compiler flags, etc, in text files. Once you have that, using CMake you can generate different outputs: normal makefiles where you can just use the command “make”, Appel’s XCode project or even Microsoft Visual Studio projects 🙂
  3. Continuous Integration for VMs!!  Can you imagine that for every GIT commit, Mr. Hudson takes the latest PharoCore image, loads the Cog VM, generates sources, and compiles the VM for Windows, Linux and Mac OS ? Come on! isn’t this really cool?  Ok, you don’t believe me? Go to the Pharo CI for CogVM.

In the next post we will see how to use this new infrastructure and how is solves some of the mentioned problems along this post. I want to thanks Esteban, Igor, Dave and all who answer my questions in the mailing lists 🙂