Monthly Archives: April 2011

Smalltalk reflective model

Hi. I am sure the title of this post is horrible, but I didn’t find anything better. The idea is simple: in this part of the journey, we will talk about bytecodes, primitives, CompiledMethods, FFI, plugins, etc… But before going there, I would like to write some bits about what happens first in the image side. These may be topics everybody know, so in that case, just skip the post and wait for the next one ūüėȬ† My intention is that anyway can follow my posts.

A really quick intro to Smalltalk reflective model

The reflective model of Smalltalk is easy and elegant. As we can read in Pharo by Example, there are two important rules:  1) Everything is an object; 2) Every object is instance of a class. Since classes are objects and every object is an instance of a class, it follows that classes must also be instances of classes. A class whose instances are classes is called a metaclass. Whenever you create a class, the system automatically creates a metaclass. The metaclass defines the structure and behavior of the class that is its instance. The following picture shows a minimized reflective model of Smalltalk. Notice that for clarification purposes this diagram shows only a part of it.

A class contains a name, a format, a method dictionary, its superclass, a list of instance variables, etc. The method dictionary is a map where keys are the methods names (called selectors in Smalltalk) and the values are the compiled methods which are instances of CompiledMethod.

When an object receives a message, the Virtual Machine has to do first what it is commonly called as the Method Lookup. This consist of searching the message through the hierarchy chain of the receiver’s class. For each class in the chain, it checks whether the selector is included or not in the MethodDictionary.¬† If it is not, it continues searching forward in the chain until it finds a method or sends the #doesNotUnderstand: message in case it was not found in the whole hierarchy. When a method is found, it is directly executed.

To understand these topics, I really recommend the two wonderful chapters in Pharo By Example book: Chapter 13 “Classes and Metaclasses” and Chapter 14 “Reflection”. They are both a “must read” if you are more or less new with these topics.

In the internal representation of the Virtual Machine, objects are a chuck of memory. They have an object header which (there will be a whole post about it) can be between one and three words, and following the object header, there are slots (normally of 32 or 64 bytes) that are memory addresses which usually (we will see why I didn’t say always) represent the instance variables. The object header contains bits for the Garbage Collector usage, the hash, the format, a pointer to its class, etc.

Classes and Metaclasses

How do you create a class in Smalltalk? In other languages, you normally create a new text file that after you compile. But in Smalltalk, as we are used to, everything happens by a message send. So, to create a new class you tell to the superclass, “Can you create this subclass with this name, these instance variables and this category please?”. So, when you take a browser and you do a “Ctrl + s” of this code:

Object subclass: #MyClass
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'MyCategory'

The only thing you do, is to send the message #subclass:instanceVariableNames:classVariableNames:poolDictionaries:category: to Object. If fact, you can take that piece of code, evaluate it in a Workspace, and you will get the same results ūüôā

You can see implementors and you will logically find one in Class. Which should be the result of such message sent?  two things: a new class and a new metaclass. Do the following test:

Metaclass instanceCount ->  3710
Class allSubclasses size -> 3710

Now, create a new class, and inspect again:

Metaclass instanceCount ->  3711
Class allSubclasses size -> 3711

The problem with Metaclasses is that they are implicit, so they are very difficult to understand. Imagine that you create a class User, then its class is “User class”. The unique instance of “User class” is “User”. And at the same time, “User class” is an instance of Metaclass. So….complicated, but if you want to understand them, take a look to the chapters I told you.¬† How it is done?¬† it is not really important for the purpose of this post, but it uses the ClassBuilder and also the Compiler (check senders of #compilerClass).

Creating a method

We saw what happens when we create a class. And when you save a method from the browser? what happens ? In a nutshell what happens is that the Smalltalk Compiler does its magic, that is, it receives as an input a string that represents the source code, and as a result you get a CompiledMethod instance. A CompiledMethod contains all the instructions (bytecodes) and information (literals) that the VM needs to interpret and execute such method.

Let’s see it by ourself. Take your image, create a dumy class and then put a breakpoint at the beginning of Behavior >> #compile:classified:notifying:trailer:ifFail:. Now, type the following method and accept it:

testCompiler
Transcript show: 'all this code will be compiled'.

Once you accept such code, the debugger should appear. You can analyze the stacktrace if you want. Notice the arguments that the methods has: compile: code classified: category notifying: requestor trailer: bytes ifFail: failBlock.¬† So, I told you that the basic idea was to send a piece of code as text and get the CompiledMethod instance. The parameter “code” should be the code of the method we type, and yes, it is a ByteString. If you go step by step with the debugger, and inspect the result, that is, “CompiledMethodWithNode generateMethodFromNode: methodNode trailer: bytes.”¬† you will see it answers the CompiledMethodWithNode instance to which you can ask “method” and it is the CompiledMethod instnace.¬† Of course, that method should be the same you get after when doing “MyClass methodDict at: #testCompiler”.

The rest of the parameters are the category in which the method should be, the requestor (someone to notify about this event), the trailer bytes (we will see this later on), and a block to execute if there is an error.

CompiledMethods

In Smalltalk compiled methods are first-class objects (classes too!), in this case instances of CompiledMethod class. However, the class CompiledMethod is quite special and a little differet from the rest. But we will see this later on….What it is important for the moment, is to know that a CompiledMethod contains a list of bytecodes and a list of¬† literals. Bytecodes are instructions. A method is decomposed in a set of bytecodes, which are grouped in five categories: pushes, stores, sends, returns, and jumps. Literals are all those objects and selectors that are needed by the bytecodes but they are not instance variables of the receiver, hence they need to be stored somewhere.

For example, with our previous example of #testCompiler, we will have a bytecode (among others) for sending the message #show:¬† and we will have the ‘Transcript’ and the selector name ‘show:’ in the literals. As an exersise, inspect the CompiledMethod instance. You can just evaluate: “(MyClass >> #testCompiler) inspect”. But….”exploring” is usually better that “inspect” for compiled methods…I let you see the differences ūüôā¬† Anyway, you will see something like this:

In the next post we will play a bit deeper with CompiledMethods so don’t worry.

The Compiler

My knowledge of the Compiler is quite limited, but is is important to notice that the Compiler does much more things than the one I have said. In a compiler, there are usually several steps like parse the code, validate it, get an intermediate representation, and finally create the CompiledMethod instance. The compiler needs to know how to translate our Smalltalk code to bytecodes understood by the VM.

In Squeak and Pharo, the compiler is mostly implemented in the class Compiler. It seems it is quite difficult to understand and it has some limitations and difficulties to get intermediate representation of the code. Because of that and probably much other reasons, the community started to work in a new compiler called Opal (which at the beginning was called NewCompiler).

Links


The first part of the journey is over

Hi. This is a short post to tell you that we have finished the first part of the “Journey through the Virtual Machine“. I would like to do a little summary of what we have alreasy seen and tell you about the next part.

Summary of the first part of the trip

I’ve started the Journey with an introduction to the Pharo/Squeak Virtual Machine. I told you the VM was split in two parts, the platform code (written in C) and the VMMaker which was the “Core” of the VM as was written in SLANG, a subset of Smalltalk. I told you the minimal steps needed to build a VM and I also gave you a quick introduction to CogVM.

After, we continue to understand all the SCM and related topics about the VM, i.e., where it is committed each part of the VM. Interpreter VM is committed in the trunk of VMMaker (in squeaksource) and its platform code in the official SVN repository. CogVM has its VMMaker part as a fork in squeaksource, and its platform part is in a branch of the SVN but also in Gitorious repository. I mentioned all the problems I have seen so far while trying to build VMs and I also commented a new way of building them which is using Metacello configurations for Cog, the Git repository and CMakeVMMaker package which can automatically generate CMake files. In addition, I talked about the Pharo Hudson server which automatically builds the CogVMs.

Once that you knew which repositories to use and where everything was version, I explained you step by step how to build a CogVM using Metacello configurations, the Git repository and CMakeVMMaker. The post covered how to build the VM in each OS: Mac OS, Linux and Windows. There was then a second part, more advanced, where I showed the different CMakeVMMaker configurations and the ability to use CMake generators to create makefiles for different IDEs. In the same post, I explained a little how the code was synchronized between the platform code and VMMaker package, how you could customize the CMakeVMMaker configurations, and how Hudson builds the VMs for us.

In the last post, I explain you how to build a VM for debug, how to debug it using gdb from command line and finally how do generate makefiles for XCode and debug it from there. In addition, I did a screencast as part of the ‚ÄĚ PharoCasts with Experts‚Äú. It was a 1:30 screencast about compiling and debugging the VM :)

What I hope you have learnt

One of my biggest doubts when I started this blog is whether to start by the VM internals or by the build process. And I decided the last one. Why?¬† Because I think that if you really want to learn about the VM, you need to follow this cycle: read -> understand -> change -> build -> run -> crash -> debug.¬† And that’s how you learn. I am sure that my posts are not 100% reproducible. And I am sure that if you try it you may have several problems, but I hope you got a general idea of what is needed to build and debug a Squeak/Pharo VM. I hope that you have included such tools in your toolbox.

Now in this “Journey” we will start to see several things about the internals of the VM, say method lookup, GC, primitives, bytecodes, etc. And what is cool is that you have learned all the needed tools to continue learning about the VM. You can modify and play with the VM since you can then build it and debug it.

Second part of the trip

It is not completely decided what I will talk about nor its order, but the following is a possible list:

  • Analyze and change something in the method lookup / normal send.
  • Introduction to CompiledMethods, literals and bytecodes.
  • What are bytecodes and primitives and how they are map between image and VM.
  • Create our own primitive.
  • Objects as methods.
  • Object header and the object format.
  • Use FFI.
  • Explain what is, modify and extend the specialObjectArray.
  • What are compact classes.
  • Method that are not executed.
  • SLANG to C translation.
  • etc..


Feedback

As I already told you several times, I really welcome any kind of feedback. This is my first experience with a blog so I hope you have enjoyed so far. If there is something I can improve, please let me know. You can send me a private mail, a post comment, an email in the mailing list, whatever…

Best regards,

Mariano


How to debug the VM?

Hi. Whether you are experimenting and hacking in the VM (where it is likely that some things will go wrong) or you are running an application in production, it is always useful to know how to debug the VM.¬† In this post, we will see how to compile the VM with all the debug information, how to run the VM from GDB (the GNU Project Debugger) and how to debug it using an IDE’s debugger.

Before going further, I would like to tell you something I did last weekend (apart from sleeping quite a lot).¬† Laurent Lauffont, creator of the PharoCasts, started a new sequence of screencasts called ” PharoCasts with Experts“. The idea is to do a more or less 1 hr “interview” about certain topic. Laurent calls by Skype to the “expert” and they can talk, ask questions, etc. Using TeamViewer, the “expert” shares his desktop to Laurent, who is in addition recording the screencast. The screencast is finally edited and uploaded to http://www.pharocasts.com . Ok, I am not a VM expert at all, but last week we did a 1:30 screencast about compiling and debugging the VM ūüôā¬†¬† This screencast contemplates all what we have learned in the previous posts and what you will learn today.

So…let’s start the show.

Prepared image for you

In previous posts, I told you that Pharo guys recommend us to use the latest PharoCore for building the CogVMs. There are several reasons for this but the basic idea is that they want the VM can be build in latest versions, and if cannot, then fix it as soon as possible. But we have a much better testers than you, and it is called Hudson. So…I don’t want to force you to use the latest unstable PharoCore (which in fact, it is unstable sometimes). Even more, if you are starting in the VM world, I don’t want to add even more problems. So…from now onward, I have prepared and image so that you can use. At the time of this writing, there is no PharoDev 1.3 yet, but all the VM tools doesn’t load directly in a Pharo 1.2. There are 3 little changes that are needed. So…you can grab a PharoDev 1.2.1 and file in these changes, or directly load this image that I have prepared for you. This image has only those 3 changes. It does not contain any Cog or CMakeVMMaker package, that’s your homework ūüėȬ†¬† The good news is that since we use a Dev image now we have all the development tools like refactorings, code completion, code highlighting, etc.

When do we need to debug the VM?

I think it is a good idea to start thinking when we need to debug the VM. In my little experience, I have found the following reasons:

  1. When there is a crash and you cannot find why it was.
  2. When you are developing something on the VM.
  3. For learning purposes.
  4. When you are just a hacky geek ūüôā

I think most people will need the first one.

Debugging and optimizing a C program

If you are reading this post, you are probably a Smalltalker. And if you are a Smalltalker, I know how much you love the debugger. We all do. But there are guys that are not as lucky as we are ūüėČ Now, being serious, how do we debug a program written in C ? I mean, the VM, at the end, is a program compiled in C and executed like any other program. I am not a C expert so I will do a quick introduction.

When you normally compile a C program the binary you get does not contain the C source code used to generate such binaries. In addition, the names of the variables or the functions may be lost. Hence, it is really hard to be able to debug a program. This is why C compilers (like GCC) usually provide a way to compile a C program with “Debugging Symbols“, which add the necessary information for debugging to the binaries. Normally, as we will see today, these symbolic symbols are included directly in the binary. However, they can be sometimes separated in a different file. The GCC flag for the debugging symbols is -g¬† where -g3 is the maximum and -g0 the minimal (none). There are much more flags related to debugging but that is not the main topic of this post.

In addition to the debugging, a C compiler usually provides what they call “Compiler Optimizations”. This means that the compiler try to optimize the resulted binary in different ways: speed, memory, size, etc. When compiling for debugging it is common to disable optimizations, why? because, as its documentation says some variables you declared may not exist at all; flow of control may briefly move where you did not expect it; some statements may not be executed because they compute constant results or their values were already at hand; some statements may execute in different places because they were moved out of loops. The GCC flag for the optimization is -O¬† where -O3 is the maximum and -O0 no optimization (default).

But we are Smalltalkers, so much better if someone can take care about all these low-level details. In this case, CMakeVMMaker is that guy.

Building a Debug VM

As I told you in the previous post, there are several CMakeVMMAker configurations classes, and some of them are for compiling a debug VM. Examples: MTCocoaIOSCogJitDebugConfig, StackInterpreterDebugUnixConfig, CogDebugUnixConfig, etc. What is the difference between those configurations and the ones we have used so far ? The only difference is the compiler flags. These “Debug” VMs use special flags for debugging. Normally, these flags include things like -g3 (maximum debugging information), -O0 (no optimization), etc. For more details, check implementors of #compilerFlagsDebug. But…since there were a couple of fixes in the latest commit, I recommend you to load the version “CMakeVMMaker-MarianoMartinezPeck.94” of the CMakeVMMaker package.

So…how do we build the debug VM?¬† Exactly the same way as the regular (“deploy”)¬† VMs (I have explained that here and here), with only one difference. Guess which one? Yes, of course, use the debug configuration instead of the normal one. That means that in Mac for example, instead of doing “MTCocoaIOSCogJitConfig¬† generateWithSources”¬† we do “MTCocoaIOSCogJitDebugConfig¬† generateWithSources”. And that’s all. Once you finish all the build steps, you have a debug VM.

What are you waiting? Fire up your image and create a debug VM ūüėȬ† Now…how do you know if you really succeeded or not to create a debug VM?¬† Easy: it will be much slower and the VM binary will be bigger (because of the symbolic symbols addition). In a regular CogMTVM, if you evaluate “1 tinyBenchmarks” you may get something arround ¬†‘713091922 bytecodes/sec; 103597027 sends/sec’ whereas with a debug VM something arround ‘223580786 bytecodes/sec; 104885548 sends/sec’. So..as you can notice, a debug VM is at least 3 times slower.

Debugging the VM with GDB

Disclaimer: I am not a GDB expert, since most of the times I debug from the IDE (XCode). I will give you a quick introduction. So…we have compiled our with all the debug flags which means that the VM binaries now have extra information so that we can debug. GDB is a command line debugger from GNU Project and it is the “standard” when debugging C programs. In addition, it works in most OS. Notice that for Windows there are not VMMakeVMMaker debug configurations. I think this is just because nobody tried it. As far as I know gdb works with MSYS, so it may be a matter of just implementing #compilerFlagsDebug for Windows confs and try to make it work. If you give it a try and it works, please let me know! So far, I did a little test with some compiler flags like -g3 and -O0 and it compiles and runs. But (as we will see later) when I do CTRL+C instead of get the gdb prompt, I kill GDB heheheh. Googling, seems to be a known problem.

So….open a console/terminal. The following are the minimal steps to run the VM under GDB in Mac OS:

cd blessed/results/CogMTVM.app/Contents/MacOS/
#If you compiled a StackVM or CogVM instead of a CogMTVM, then the .app and the executable will have another name
gdb CogMTVM

In Linux/Windows since you don’t have the .app directory of the Mac OS, it should be something like this;

cd blessed/results
#If you compiled a StackVM or CogVM instead of a CogMTVM, then the executable will those names instead
gdb CogMTVM

With the previous step, you should have arrived to a gdb console that looks like this:

(gdb)

So now we need to start the VM. In Mac, the VM automatically raises the File Selection popup that lets you choose which image to run. So the following line is enough:

(gdb) run

Choose the image and continue. In the Linux  you have to specify to the VM the .image to run by parameter. So, you have to do something like this:

(gdb) run  /home/mariano/Pharo/images/Pharo1.3.image

The idea is that you send by parameter the .image you want to run. So…at this point you have already launched your image, its time to play a bit. Open a Transcript. Open a workspace and evaluate:

9999 timesRepeat: [Transcript show: 'The universal answer is 42.']

Now…come back to the console, and do a CTRL+C. This will get a SIGINT Interruption and in the GDB console you should get something like:

Program received signal SIGINT, Interrupt.
0x9964a046 in __semwait_signal ()
(gdb)

What happened?¬† I am not an expert, but with such interruption we can “pause” the VM execution and the gdb takes the control. If you see your image at this moment, you will se that you cannot do anything in it. It is like “frozen”. If you now do:

(gdb) continue

You can also type “c” instead of “continue”. The image should continue running, and you Transcript should continue printing the universal answer ūüėȬ†¬† So…what is cool about being able to interrupt the VM?¬† We can, for example:

  • Inspect the value of some variables and stack. Type “help data” and “help stack” in the gdb prompt.
  • Put breakpoints in the code and then after go step by step.
  • Invoke functions from our VM (this is really cool!!)

Let’s see a couple of examples….hit CTRL+C again to get the control of the gdb prompt and do:

(gdb) bt

Which should look similar this (why similar? because it depends where you stop it):

(gdb) bt
#0  0x0002dbd0 in longAtPointerput (ptr=0x22efa6e4 "\017??\024?\001", val=349999375) at sqMemoryAccess.h:82
#1  0x0002daf2 in longAtput (oop=586131172, val=349999375) at sqMemoryAccess.h:99
#2  0x0006a2ae in sweepPhase () at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:44356
#3  0x0003d8b4 in incrementalGC () at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:17846
#4  0x00069f61 in sufficientSpaceAfterGC (minFree=0) at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:44214
#5  0x00032e56 in checkForEventsMayContextSwitch (mayContextSwitch=1) at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:11398
#6  0x0003ccad in handleStackOverflowOrEventAllowContextSwitch (mayContextSwitch=1) at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:17346
#7  0x000326c4 in ceStackOverflow (contextSwitchIfNotNil=525336580) at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:11136
#8  0x1f40032e in ?? ()
#9  0x0006ab4c in threadSchedulingLoop (vmThread=0x828a00) at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:44714
#10 0x0003dd96 in initialEnterSmalltalkExecutive () at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:17970
#11 0x0003ebdc in initStackPagesAndInterpret () at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:18435
#12 0x00022211 in interpret () at /Users/mariano/Pharo/vm/git/cogVM2/blessed/src/vm/gcc3x-cointerpmt.c:2037
#13 0x00072142 in -[sqSqueakMainApplication runSqueak] (self=0x211090, _cmd=0x121c80) at /Users/mariano/Pharo/vm/git/cogVM2/blessed/platforms/iOS/vm/Common/Classes/sqSqueakMainApplication.m:172
#14 0x96339cbc in __NSFirePerformWithOrder ()
#15 0x91b9ee02 in __CFRunLoopDoObservers ()
#16 0x91b5ad8d in __CFRunLoopRun ()
#17 0x91b5a464 in CFRunLoopRunSpecific ()
#18 0x91b5a291 in CFRunLoopRunInMode ()
#19 0x920bde04 in RunCurrentEventLoopInMode ()
#20 0x920bdaf5 in ReceiveNextEventCommon ()
#21 0x920bda3e in BlockUntilNextEventMatchingListInMode ()
#22 0x9307378d in _DPSNextEvent ()
#23 0x93072fce in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#24 0x93035247 in -[NSApplication run] ()
#25 0x9302d2d9 in NSApplicationMain ()
#26 0x0006fca5 in main (argc=2, argv=0xbffff864, envp=0xbffff870) at /Users/mariano/Pharo/vm/git/cogVM2/blessed/platforms/iOS/vm/Common/main.m:52
(gdb)

Guess what it was doing ? Sure…an incremental GC ūüėČ

(gdb) call printAllStacks()

Which should look similar to the following (notice that it is not the same place as the previous example and this is because the stacktrace in the middle of the incremental Garbage Collector was too long to put in the post, so I did a “bt” in another moment):

(gdb) call printAllStacks()
Process 0x2197b5f8 priority 10
0xbff650b0 I ProcessorScheduler class>idleProcess 526237088: a(n) ProcessorScheduler class
0xbff650d0 I [] in ProcessorScheduler class>startUp 526237088: a(n) ProcessorScheduler class
0xbff650f0 I [] in BlockClosure>newProcess 563590428: a(n) BlockClosure

Process 0x219edf1c priority 50
0xbff5f0b0 I WeakArray class>finalizationProcess 526237628: a(n) WeakArray class
0xbff5f0d0 I [] in WeakArray class>restartFinalizationProcess 526237628: a(n) WeakArray class
0xbff5f0f0 I [] in BlockClosure>newProcess 564059712: a(n) BlockClosure

Process 0x2067e180 priority 80
0xbff600d0 M Delay class>handleTimerEvent 526243268: a(n) Delay class
0xbff600f0 I Delay class>runTimerEventLoop 526243268: a(n) Delay class
543678476 s [] in Delay class>startTimerEventLoop
543678752 s [] in BlockClosure>newProcess

Process 0x2197b430 priority 60
0xbff610b0 I SmalltalkImage>lowSpaceWatcher 527603592: a(n) SmalltalkImage
0xbff610d0 I [] in SmalltalkImage>installLowSpaceWatcher 527603592: a(n) SmalltalkImage
0xbff610f0 I [] in BlockClosure>newProcess 563589972: a(n) BlockClosure

Process 0x219eba78 priority 60
0xbff62030 M [] in Delay>wait 564050624: a(n) Delay
0xbff62050 M BlockClosure>ifCurtailed: 565737144: a(n) BlockClosure
0xbff6206c M Delay>wait 564050624: a(n) Delay
0xbff62084 M InputEventPollingFetcher>waitForInput 526769836: a(n) InputEventPollingFetcher
0xbff620b0 I InputEventPollingFetcher(InputEventFetcher)>eventLoop 526769836: a(n) InputEventPollingFetcher
0xbff620d0 I [] in InputEventPollingFetcher(InputEventFetcher)>installEventLoop 526769836: a(n) InputEventPollingFetcher
0xbff620f0 I [] in BlockClosure>newProcess 564050332: a(n) BlockClosure

Process 0x212b6170 priority 40
0xbff5e03c M [] in Delay>wait 565740688: a(n) Delay
0xbff5e05c M BlockClosure>ifCurtailed: 565741260: a(n) BlockClosure
0xbff5e078 M Delay>wait 565740688: a(n) Delay
0xbff5e098 M WorldState>interCyclePause: 529148776: a(n) WorldState
0xbff5e0b4 M WorldState>doOneCycleFor: 529148776: a(n) WorldState
0xbff5e0d0 M PasteUpMorph>doOneCycle 527791904: a(n) PasteUpMorph
0xbff5e0f0 I [] in Project class>spawnNewProcess 528638248: a(n) Project class
556491024 s [] in BlockClosure>newProcess
(gdb)

Notice that in this case we are calling from GDB a exported function of the VM. Where does it come from?  For the moment, take a look to #printAllStacks. Now, it is time to kill the executable and finally to quit gdb:

(gdb) kill
(gdb) Quit

Notice that after doing the “kill” you can call “run” again an start everything again.¬† Hopefully, you got an idea of how to debug the VM. However, using gdb from command line is not the only option.

Debugging the VM with XCode

In the previous post we saw that CMake provides “generators”, i.e, a way to generate specific makefiles for different IDEs. Hence, we can generate makefiles for our IDE and debug the VM from there. Remember that executing “cmake help” in a console shows the help and at the end, there is a list of the available generators. In this particular example, I will use XCode in Mac OS. To do so, we need to follow the same steps than compiling a debug VM (that is, the same as building a release VM but using a debug CMakeVMMaker conf) but using the XCode generator in particular. That means that instead of doing “cmake .” we do “cmake -GXcode”. Notice that if we previously build another VM you will need to remove the CMake cache: /blessed/build/CMakeCache.txt. So..

cd blessed/build
cmake -G Xcode

Notice that it is not needed to do a “make” since we will do that from inside XCode. “cmake -G Xcode” generates a /blessed/build/CogMTVM.xcodeproj (the name will change if you generate a CogVM or a StackVM) that we directly open with XCode. Now…normally we should be able to select the target Debug (on the top left) and then from the menu “Build -> Build and Debug – Breakpoints On”. That should compile the VM and automatically run it with GDB. However, I found two problems:

  • External plugins are not correctly placed. I tried to fix it but I couldn’t. They are placed in blessed/results/CogMTVM.app/Contents/Resources/Debug¬† but they should be together in blessed/results/Debug/CogMTVM.app/Contents/Resources. The problem is that when XCode compiles, it adds a “Debug” in the directory when compiling with the “debug target” and the same for “release”. I have spent two days completely trying to solve it from the CMakeVMMaker and I couldn’t. If you know how to do it, please let me know. Anyway, the workaround is to copy all those .dylib from the first directory to the second one.
  • No matter that you change the settings in the project (like compiler flags, gcc version, etc) it will always the values from the CMake generated makefiles. This is a pity because I would like to change settings from XCode…

Now you are able to “Build -> Build and Debug – Breakpoints On”. If everything goes well, you should be able to open an image and the VM will be running with gdb. You will notice there is a little gdb button that opens a gdb console where you can do exactly the same as¬† if it were by command line. In addition, there is another button for the debugger. I attach a screenshot.

So…there you are, you are running the VM in debug mode from gdb and inside XCode. Interesting things from the gdb console is that you have buttons for “Pause” (what we did with gdb but from command line with the CTRL+C) and for “Continue” (what we type from gdb command line). And you have even a “Restart”.

Now….the nice thing about being able to debug the VM with XCode is the ability to easily put breakpoints in the code. What where? in which file? Without giving much details (because that’s the topic of another post), we will say that the “VM Core” (basically Interpreter classes + ObjectMemory classes) is translated into a big .c file called gcc3x-cointerpmt.c (in case of cogMT). A regular CogVM will be called “gcc3x-cointerp.c” the same as for StackVM. You can check this files by yourself. You should know where they are. Remember?¬† yes, they are (if you did the default configuration) in /blessed/src/vm¬† (for Cocoa configurations it may be in /blessed/stacksrc/vm). In such folder you will see that there is another file cointerpmt.c (or cointerp.c). Which is the difference between the one that has the “gcc3x” and the one that has not? for me moment let’s just say the one with the “gcc3x” has some optimizations (Gnuifier) that were automatically done in the C code and with the intention of compiling such sources with a gcc3x compiler.

We already know which file to debug “most of the times”. Maybe you need to debug another C file like plugins or the machine code generator, but most of the times, you will debug the “VM core” which is the file gcc3x-cointerpmt.c or its variant. Now what you have to do is to open it and put a breakppoint. On the top part of XCode you have a list of the files included in the project. Search for that one and select it. Be careful that XCode is really slow with this big file. Let’s put a breakpoint in the method lookup when the #doesNotUnderstand is thrown. This is done in the function “static sqInt lookupMethodInClass(sqInt class)” and here is a screenshot:

Once you set breakpoints you can “Build and Debug”. Now, instead of testing by executing something in a workspace, we will create a simple method anywhere (why not to use the workspace is not at my knowledge right now, sorry). So…I create the method foo in TestCase doing like this:

TestCase >> foo
self methodNonExistent.

And then I executed (this can be done in a workspace):

TestCase new foo

Once this is done, the VM should have been paused and you should have available the gdb prompt. You can now open the debugger or GDB console and do whatever you want. With the Debugger you can do “Step Over”, “Step Into”, “Step Out”, etc‚ĶAnd with the GDB console you can do exactly the same like if you were executing gdb from a terminal. Here is the screenshot:

There are much things I would like to talk about regarding debugging a VM, but the post is already too long and I think this is enough to start. I recommend you to watch the screencast.  Further in this blog sequence, we will do a second part. Now, it is time to understand some internal parts of the Squeak/Pharo VM.


Building the VM – Second Part

Hi folks. I guess that some readers do not like all the building part and want to go directly to see the VM internals. But it is really important that you understand how to change the VM, compile it or even debug it. Otherwise, you’ll be very limited.

This post is mostly about a couple of things that I wanted to mention in the previous post, but I couldn’t because it was already too long. If you read such post, you may think that compiling the VM from scratch is a lot of work and steps. But the post was long because of my explanations and because of my efforts in making it reproducible. This is why I would like to do a summary of how to compile the VM.

Summary of VM build

Assuming that you have already installed Git + CMake + GCC, then the following are the needed steps to compile the Cog VM:

mkdir newCog
cd newCog
git clone --depth 1 git://gitorious.org/cogvm/blessed.git
cd blessed/image
wget --no-check-certificate http://www.pharo-project.org/pharo-download/unstable-core
"Or we manually download with an Internet Browser the latest PharoCore
image from that URL and we put it in blessed/image

Then we open the image with a Cog VM (which we can get from here or here) and we evaluate:

Deprecation raiseWarning: false.
 Gofer new
 squeaksource: 'MetacelloRepository';
 package: 'ConfigurationOfCog';
 load.
(Smalltalk at: #ConfigurationOfCog) project latestVersion load.
"Notice that even loading CMakeVMaker is not necessary anymore
since it is included just as another dependency in ConfigurationOfCog"
MTCocoaIOSCogJitConfig generateWithSources.
"Replace this CMMakeVMMaker configuration class for the one that suites your OS
like CogUnixConfig and CogMsWindowsConfig"

Now, come back to the terminal and do:

cd newCog/blessed/build
cmake .
# Or  cmake . -G"MSYS Makefiles"  if you are in Windows
make

And that’s all, in “blessed/results” (in Windows it should be under “blessed/build/results”) you should have the CogVM binary. I know that you probably are a lazy guy, but if you really want to take advantage and learn in my posts, I strongly recommend you to follow those steps. All along this sequence of posts, we will debug and modify the VM (change GC, method lookup, create our own primitives and plugins, etc). Once you have Git and CMake, I promise the process takes less than 5 minutes.

Available CogVMs

Remember that all these posts is what I called “Journey through the VM”, so we will probably go and come back between different posts ūüôā¬† In the first post,under the title “CogVM and current status” I explained the different flavors of CogVMs and the main features of them:

  1. Real and optimized block closure implementation. This is why from the image side blocks are now instances of BlockClosure instead of BlockContext.
  2. Context-to-stack mapping.
  3. JIT (just in time compiler) that translates Smalltalk compiled methods to machine code.
  4. PIC (polymorphic inline caching).
  5. Multi-threading.

What is the big difference between StackVM and CogVM? Well, Stack VM implements 1) and 2). And Cog VM is on top of the Stack VM and adds 3) and 4). Finally, there is CogMTVM which is on top of Cog VM and adds multi-threading support for external calls (like FFI for example).

In addition, Cog brings also some refactors. For example, in Interpreter VM, the Interpreter was a subclass of ObjectMemory. That was necessary in order to easily translate to C. In Cog, there are new classes like CoInterpreter and NewObjectMemory. But the good news is that we can have composition!! The CoInterpreter (which is a new class from Cog) has an instance variable that is the object memory (in this case an instance of NewObjectMemory). This was awesome and required changes in the SLANG to C translator.

As said, in the VMMaker part of the VM, what we called the “core”, there are mainly two important classes: Interpreter and ObjectMemory. Read the first post for details of their responsibilities. In Cog, there are a couple of differences:

  1. As said, the Cog Interpreter class do not subclass from ObjectMemory, but instead it is an instance variable.
  2. In Cog there isn’t only one Interpreter class like in the old VM. In fact each Cog VMs I told you (StackVM, CogVM, CogVMMT) has its own Interpreter class (StackInterpreter, CoInterpreter and CoInterpreterMT). Come on!! Don’t be lazy, take you image and browse them ūüôā
  3. In Cog, there are not only those Interpreter classes that I have already told you, but also several more that are just for a design point of view, i.e, they are not Interpreter classes that should be used for compiling the VM. They are for example, to reuse code or to better simulate them. Examples, CoInterpreterPrimitives, StackInterpreterPrimitives, InterpreterPrimitives, etc. And then, of course, we have the Interpreter simulators, but that’s another story for another post.

So…if you are paying attention to this blog you may be asking yourself which Interpreter class you should use? My advice, and this is only my advice, is that you should normally use the “last one”. In this case, the CogVMMT. The few reasons I find not to use the last one are:

  1. If you are running on a hardware where Cog JIT is not supported. For example, for the iPhone the StackVM is usually used.
  2. When you are doing hacky things with the VM and you want to be sure there is no problems with JIT, PIC, etc. This is my case…
  3. Maybe for learning purposes the CogVM or CogVMMT is far much complicated than the StackVM or InterprertVM.
  4. The “last one” may not be the most stable one. So if you are in a production application you may want to deploy with a CogVM rather than a CogVM that has been released just now.

But apart from that, you will probably use the “last one” available. Just to finish with this little section, I let you a screenshot of a part of the Cog VMs hierarchy.

CMakeVMaker available configurations

In the previous post we saw what CMMakeVMMaker configurations do: 1) generate VM sources from VMMaker and 2) generate CMake files. 1) depends on which Cog (StackVM, CogVM and CogVM) we want to build, which plugins, etc. And 2) depends not only in which CogVM but also in the OS (the CMake files are not the same for each Operating System) and other things, like whether we are compiling for debug mode or not, whether we are using Carbon on Cococa library in Mac, etc. So…imagine the combination of: which CogVM, which OS, and whether debug mode or not. It gives us a lot of possibilities ūüôā

The design decision to solve this in the CMakeVMake project was to create specific “configuration” classes. To summarize, there are at least one class for VM/OS. So you have, for example, CogUnixConfig (which is a CogVM, for Unix and “release”), CogDebugUnixConfig, MTCogUnixConfig, StackInterpreterUnixConfig, StackInterpreterDebugUnixConfig. And then for the rest of the OS is the same: CogMsWindowsConfig, StackInterpreterMsWindowsConfig, MTCogMsWindowsConfig, etc….So, your homework: browse the categories ‘CMakeVMMaker-Windows’, ‘CMakeVMMaker-Unix’ and ¬†‘CMakeVMMaker-IOS’. Look at the available classes. To learn, check implementors of #compilerFllags, #defaultInternalPlugins, #interpreterClass, etc…To test, take the debug variant, follow the same procedure as always, and you compile a debug VM with all the debugging symbols and no optimization ūüôā

Which one you should use? I have already answered, but imagine you want the “last one”, then they are MTCocoaIOSCogJitConfig, MTCogUnixConfig and MTCogMsWindowsConfig.It doesn’t matter which configuration you choose, all you need to normally do is send the #generateWithSoources.

This design decision has a couple of advantages from my point of view:

  1. It is extremelly easy to customize. And in fact, there are already examples: CogUnixNoGLConfig (which doesn’t links against OpenGL so it works perfect unless you use Balloon3D or Croquet plugins), CogFreeBSDConfig (specially for BSD since it has a couple of differences in the compiler flags), etc.
  2. YOU can subclass and change what you want: default internal or external plugins, compiler flags, etc.
  3. It is easy for a continuous integration server like Hudson to build different targets.

Customizing CMakeVMMaker configurations

I told you that you can subclass from a specific class and overwrite the compiler flags, the default plugins and if they should be internal or external, etc. However, CMMakeVMaker can be parametrized in several ways while using them. In the building instructions at the beginning of this blog, I told you to move your Pharo image to blessed/image. And as I explained in the previous post that was in order to let CMakeVMaker take the defaults directories and make it work out of the box. But in fact, it is not necessary at all to move your image. You can download the “platforms code” in some place and the image elsewhere. Notice that these changes (the ability to customize each direcotry) has been commited in new versions of the CMakeVMMaker package. So, if you want to really try the followin code, make sure to have CMakeVMMaker-MarianoMartinezPeck.94. You can get it using Monticello Browser or Gofer.

So, you can do something like this:

"The image where this code is being run can be in ANY place"
MTCocoaIOSCogJitConfig new
srcDir: '/Users/mariano/Pharo/generateCode/src';
platformsDir: '/Users/mariano/Pharo/vm/git/cogVM2/blessed/platforms';
buildDir: '/Users/mariano/Pharo/vms/build';
"The resources directory is only needed for Mac"
resourcesDir: '/Users/mariano/Pharo/vm/git/cogVM2/blessed/macbuild/resources';
outputDir: '/Users/mariano/binaries/results';
generateSources;
generate.

The “platformsDir” must¬† map with “platforms” directory that we downloaded with Git, it cannot be choosed randomly. The same with the “resourcesDir” (which in fact is only for Mac). The rest of the directories (src, build and output) are not created by VMMaker nor Git. They are just directories that I have created by my own and I want to use them instead of the default.

And I’ve created this shortcut also:

"The image where this code is being run can be in ANY place"
MTCocoaIOSCogJitDebugConfig new
defaultDirectoriesFromGitDir: '/Users/mariano/Pharo/vm/git/cogVM1/blessed';
generateSources;
generate.

That way, I don’t need to move my image to blessed/image. BTW, don’t try this with Windows confs because there still a problem. Anyway, despite from that we can also customize things using #internalPlugins:,¬† #externalPlugins, etc.

Synchronization between platform code (Git) and VMMaker

In this post, I told you the problems I have seen so far with “process” of the Interpreter VM + SVN for platform code. And I also told you how this new process (CMake + Git ) helps a bit in some of those problems. From my point of view there are a couple of things that have improved the process:

  1. Platform code and VMMaker are be in sync: when people (Igor, Esteban, etc) commit a new version to Git, they make sure that the VMMaker part is working.
  2. Documentation of that synchronization: in the previous post, I told you to load version ‘1.5’ of ConfigurationOfCog. Suppose I didn’t tell you that, how do you know for a certain Git version, which version of ConfigurationOfCog you should use?¬† Check in blessed/codegen-scripts/LoadVMMaker.st¬† and you have exactly the piece of code you should execute to get the working VMMaker with that specific version of GIT. So…this means that when someone commits the Git repository and such changes require a new VMMaker version, then such developer needs to create a new version of ConfoigurationOfCog, and modify LoadVMMaker.st.¬† Now that you know this, the steps I told you at the beginning of this posts can be automatic, can’t they?¬† someone say uncle Hudson?¬† Yes, of course!!
  3. Git is easier in the fact that people can fork, hack, experiment, tests, and then push changes into the blessed.

Hudson for building VMs

Pharo has a continuous integration server with Hudson: http://ci.pharo-project.org/. And as you can see here, there are a lot of targets for CogVMs. Basically, for every single commit in Git, Hudson builds all those images. How? Following nearly the same steps I told you at the beginning of this post. It creates StackVMs, CogVMs and CogVMs for every OS. In fact, there are no Windows builds yet because this week they are getting the Windows slave. But the confs and the procure is working…So it is just a matter of getting the Windows box.

Conclusion: you don’t need to wait one year an a half to get a VM with a bug fix, nor you don’t need to compile it by yourself. With Hudson, they are built for every commit.

Hudson traceability

We saw how we can trace from platform code to VMMaker. Now, how to know how was every Hudson VM build ? Easy:

  1. Go to http://ci.pharo-project.org
  2. Choose a target in the “Cog” tab. For example, I choose “Mac Cog Cocoa”
  3. Follow the link, for example Cog Unix,  and there you can see two artifacts:
  • a built VM
  • a source code tarball, which is used to build that VM (in this example, CocoaIOSCogJitConfig-sources.tar.gz)

If you Download the source code archive and unpack it into your local directory what would you expect?? Of course, a copy of the git directory plus the Pharo image generated to build such VM. Such image is in build/ subdirectory and it is called generator.image and was the used to generate source code (located in src/ subdirectory) and CMake configuration files (located in build/ subdirectory). Isn’t this cool ?

CMake generators

Did I already tell you that I am also a CMake newbie? Ok‚Ķjust in case ūüėȬ† Anyway, imagine CMake like a tool where we can set things, parameters, variables, directories, etc, in some files (which in our case they are auto-generated by CMakeVMMaker) and then from those “general” files we can generate specific and different makefiles. So, from the same CMake files we can generate different kind of makefiles,¬† i.e, we can generate makefiles the way some IDE except them to be. CMake call this ability “generators”. And the way to create makefiles with a specific generator is like this:

cmake -G "Generator Name"

Does that sound familiar?? Of course! We have already used them for MSYS in Windows. The cool thing is that there are generators for several IDEs. And this is just GREAT. For example, I can create makefiles and a project for XCode (the C IDE for MacOS). Just doing:


cmake -G Xcode

creates a XCode project for CogVM which is¬† in /blessed/build/CogMTVM.xcodeproj. You don’t have an idea how cool is this. This mean you can open XCode and everything is set and working out of the box for the CogVM. You can put breakpoints, inspect C code, compile, debug, everything‚Ķ.Before, this was much more complicated because the .xcodeproj file was versioned in the SVN and this file usually keeps some file locations or things like that and in my experience, it was always a pain to make it work.

When you use a particular generator for an IDE (like Xcode, Eclipse, KDevelop, Vsual Studio, etc, you usually don’t do the “make” by hand. So, after invoking cmake, you won’t need to do a make. Instead, you have to compile from the IDE itself (which should have the correct makefiles).

How do you know which are the available generators?  just type:

cmake --help

and at the end you’ll find a section that says “The following generators are available on this platform:”¬† and each of them has a name and a description. What you need to pass to the -G parameter is the name. Notice that as the help says, it automatically shows the generators available in YOUR platform (OS).¬† Some examples:

cmake -G KDevelop3
cmake -G "Eclipse CDT4 - Unix Makefiles"
cmake -G "Visual Studio 10"
cmake -G "Borland Makefiles"

When the name includes more than one word you must use the double quotes.

So‚Ķthe 2 main advantages I see from CMake to our process is: cross compiling, and be able to automatically create makefiles for IDEs. Sorry I couldn’t try with other IDE than Xcode. If you try it and it works, let me know ūüôā

In the next post we will so how to debug the VM and some related tricks. After that post, we will probably start to see the VM internals since you will have already all the needed tools.


Building the VM from scratch using Git and CMakeVMMaker

So…this is the post all hackers were waiting for! Take a beer, open your Pharo image and prepare your terminal ūüôā¬†¬† In this post, we will see how to build a VM from scratch. I will use Mac OS along this post, but I will also explain how to build in Linux and Windows (yes, I had to do that for this post!). I will follow the instructions together with you so that they are correct and working heehhehe. In addition, I am new with both Git and CMake…so if I say something wrong, please correct me.

The VM is a huge beast and no matter all the effort people do to ease its build, it can fail. But don’t get disappointed, if there are problems, please ask!

Remember (I told you this in the previous post) that there are different ways of building the from scratch. In this case we will use GIT (instead of the SVN repository) and CMakeVMMaker (instead of VMMakerTool) and we will build a Cog VM. These tools are part of what I called “new infrastructure” in the previous post.

Installing necessary tools on each OS

For all the OS we need a couple of things: Git client, CMake, and gcc/make. I will tell you what I think is needed in each OS. I know it is painful, but at least, you have to do it only once…Don’t be lazy. If this get’s longer, you have the excuse for the second beer ūüėČ

Mac OS

  1. “Just” (4gb) installing the whole XCode is enough for compiling. XCode package comes with everything: gcc, make, cmake, etc. You can download it from Apple’s website. For Lion or XCode 4.2 users, read the last section of the blog (“Problems I have found so far”) .
  2. The Git client for Mac OS. Try to put Git in PATH.
  3. To check if everything is installed, open a terminal and try to execute the different commands: git, make and cmake.

Linux

  1. You need to install the development tools such as gcc, make, etc. In Ubuntu, you need to install the “buildessential” package (sudo apt-get install build-essential).
  2. Install cmake -> if it is not already included something like “sudo apt-get install cmake” should work in all Ubuntu and forks.
  3. Install Git client (in my Ubuntu this was with “sudo apt-get install git-core”). You may want to put Git in PATH. Usually, when installing with a package system it automatically adds binaries to the PATH.
  4. To check if everything is installed, open a terminal and try to execute the different commands: git, make and cmake.

Windows

  1. Download and install MinGW and MSYS, with C/C++ compiler support:  http://www.mingw.org/wiki/msys. To install mingw+msys using single, painless step, one should download latest installer from here: http://sourceforge.net/projects/mingw/files/Automated%20MinGW%20Installer/mingw-get-inst/
    Today, the mingw-get-inst-20110530 is latest one.
  2. Download and install Git: http://code.google.com/p/msysgit/ During the installation, I choose the option “run git from from the windows command prompt”. Optional: add git to the PATH variable so that you can see git from msys. To do this, add path to git for msys:
    Control panel -> System -> System Properties / Advanced¬† [ Environment Variables ].¬†¬† There should be already: ‘C:\Program Files\Git\cmd’. Add ‘C:\Program Files\Git\bin’¬† Notice that the path may not be exactly ‘C:\Program Files\Git\’ but similar…
  3. Install CMake: http://www.cmake.org/cmake/resources/software.html (during installation, in install options , make sure that you choose to add CMake to PATH).
  4. To check if everything is installed, open MSYS program (which should look like a UNIX terminal) and try to execute the different commands: git, make and cmake.

From now, when we refer to “terminal” we refer to whatever it is: iTerm and friends in Mac OS, Unix/Linux terminals, and MSYS in Windows.

Downloading platform code from GIT

If and only if you plan to modify and publish your cahnges, you will need to create an account in https://gitorious.org/ in case you don’t have one already. If you just want to download a project, then you don’t need an account. Gitorious is not more than a nice, free and public GIT hosting. Creating an account there is a little tricky because you need to SSH key. But try it and if you have problems check in the web because there is a lot of documentation and blog posts about that. Once you have your account working, continue with this…

The goal of this post is not to talk about GIT so we will do it fast ūüôā If you are reading this post, you may probably be a Smalltalk hacker. A real Smalltalk hacker doesn’t use a UI program, but instead command line jeheheh. Seriously, for this post it is easier to just tell you the git commands instead of using a Git front end. So..all git commands should be run from a command line. What are you waiting for? Open a terminal go to your prefer directory and create your workspace for today:

mkdir cogVM
cd cogVM

The Cog VM repository is https://gitorious.org/cogvm. We need to download the “platform code” from there. To do this, we have two options. The first one is to just clone the CogVM project to your local directory. This option is the most common and it is what you will do if you just want to load CogVM (and not modify it or commit changes). It is like doing a “svn co” for the SVN guys.

git clone git://gitorious.org/cogvm/blessed.git

Normally, you can also pass the argument “–depth 1” and do “git clone –depth 1 git://gitorious.org/cogvm/blessed.git”. This is just for avoiding to download all the history and it just downloads the HEAD (at least that’s what I think it does). In this post we are not going to use “–depth 1” and I will explain¬†later why not. The second option is to clone Cog VM to your own Git repository. This has to be done from the Git website (maybe it can be done from command line but I don’t know it): login in Gitorius, search the project CogVM in the search input, select the CogVM project and you will find a “clone repository” button. Click on it and wait. Once it finishes, you will have your own fork of CogVM, and the Git repository is something like this: git://gitorious.org/~marianopeck/cogvm/marianopecks-blessed.git. Now, the last step is to clone from your own fork to your local directory (as we did in the first option). This should be something like:

git clone git://gitorious.org/~marianopeck/cogvm/marianopecks-blessed.git

For this post, I recommend to take the first option if you may be a beginner. I told you that I wanted my posts to be reproducible. With the previous commands, you will clone the latest version in the repository. Since I don’t know when you are going to do it (if there is someone), I would like that you load the specific version I know it works. What I am suggesting is doing a kind of “svn co http://xxxx -r 2202”. I checked how to do this in git, and it seems not to provide a clone of a specific version. Instead, you just clone (from the latest one) and then you checkout or revert to a previous one. Execute:

cd blessed
git checkout f3fe94c828f66cd0e7c37cfa3434e384ff65915e

Notice that you can do this because we have downloaded the full history of the repository. If we would have added the “–depth 1” parameter during the first clone, then we should be having an error like “fatal: reference is not a tree: f3fe94c828f66cd0e7c37cfa3434e384ff65915e”.

f3fe94c828f66cd0e7c37cfa3434e384ff65915e is the commit hash of the version I want. You can do “git log” to see the latest commits or “git rev-parse HEAD” to see the last one.

Ok, if you could successfully load everything from Gitorious you should have something like this:

mariano @ Aragorn : ~/Pharo/vm/git/cogVM/blessed
$ls -la
total 72
drwxr-xr-x  16 mariano  staff    544 Apr  7 00:14 .
drwxr-xr-x   3 mariano  staff    102 Apr  7 00:07 ..
drwxr-xr-x  13 mariano  staff    442 Apr  7 00:14 .git
-rw-r--r--   1 mariano  staff   6651 Apr  7 00:13 .gitignore
-rw-r--r--   1 mariano  staff     41 Apr  7 00:13 CHANGES
-rw-r--r--   1 mariano  staff   1112 Apr  7 00:13 LICENSE
-rw-r--r--   1 mariano  staff  13597 Apr  7 00:13 README
-rw-r--r--   1 mariano  staff     17 Apr  7 00:13 VERSION
drwxr-xr-x   8 mariano  staff    272 Apr  7 00:13 codegen-scripts
drwxr-xr-x  13 mariano  staff    442 Apr  7 00:13 cygwinbuild
drwxr-xr-x   3 mariano  staff    102 Apr  7 00:13 image
drwxr-xr-x  24 mariano  staff    816 Apr  7 00:13 macbuild
drwxr-xr-x   7 mariano  staff    238 Apr  7 00:14 platforms
drwxr-xr-x   4 mariano  staff    136 Apr  7 00:14 processors
drwxr-xr-x   8 mariano  staff    272 Apr  7 00:14 scripts
drwxr-xr-x   6 mariano  staff    204 Apr  7 00:14 unixbuild

I have highlighted two lines that represent two important directories: “/platforms” and “/image”. For the moment, lets explain what “/platforms” is…come on, you should guess it! Yes, there, in that folder, it is the famous “platform code”. You can enter to such directory and see the C code by yourself.

Downloading Cog and dependencies

So far we have loaded from Git the “platform code”. We are missing the other part, VMMaker. In the previous post I told you it may not be necessary to always download VMMAker and generate sources, because such sources may also be commited in the repository. Having the auto-generated source code in the repository is a trade-off, it has advantages and disadvantages. In this “new infrastrucutre” under Git, it was decided to remove it. So if you want to compile the VM you have to load VMMaker and translate it to C. You can read the explanations here and here.

So…we need to download VMMaker (Cog branch) and translate it to C. But of course, we first need a Smalltalk image. I told you in the previous post that I want all my posts to be reproducible. So, take this PharoCore 1.3 image. Notice that zip comes not only with the .image and .sources files, but also the .sources. If you don’t know what .sources file is, you should read Pharo By Example chapter 1, section 1.1 “Getting Started” ūüôā¬† You need the .sources because it is necessary in order to generate the sources of the VMMaker. The image of such zip can be opened with both, Interpreter VM and Cog VM. However, if you open it with Cog, and you save it, then you won’t be able to run it with the Interpreter VM (this was fixed in the latest code of Interpreter VM but there isn’t yet an official VM release for all OS that contains this fix). Thus, I recommend you to run the image with a CogVM. If you don’t have one already, you can download this one.

Now, let’s load VMMaker branch for Cog and all its dependencies. You have already learned that Cog has dependencies in other packages and to solve that, among other problems, we use a Metacello configuration for it. The following code may take time because since we are evaluating this in a PharoCore image where Metacello is not present, Metacello needs to be installed first. In addition, VMMaker is a big package…Fortunatly we are running with a CogVM ūüôā ¬†¬† So, take the image and evaluate:

Deprecation raiseWarning: false.
Gofer new
squeaksource: 'MetacelloRepository';
package: 'ConfigurationOfCog';
load.
((Smalltalk at: #ConfigurationOfCog) project version: '1.5') load.

Gofer new
squeaksource: 'VMMaker';
package: 'CMakeVMMaker';
version: 'CMakeVMMaker-MarianoMartinezPeck.83';
load.

IMPORTANT: How do you know you have to load version ‘1.5’ ? Yes, because I am telling you hahha. But how do I know that 1.5 is the one that works with the current version of Git?¬† This is answered in the next post under the title “Synchronization between platform code (Git) and VMMaker”.

One of the cool things I like from Metacello configurations is the possibility to query them. For example, do you want to all the packages that are installed while doing the previous code? Just inspect or print:


(ConfigurationOfCog project version: '1.5') packages.
ConfigurationOfCog project versions.

Now, if you are curious about how defining versions and baselines is achieved in Metacello, take a look to the methods ConfigurationOfCog >>#version15:  and ConfigurationOfCog >>#baseline13:.

Generating VM sources and CMake files

What have we done so far? We have just download all the VMMaker branch for Cog with all its required dependencies. Now, its turn to translate from SLANG to C and to generate the CMake outputs so that we can compile after. This C code generated from VMMaker is known in the Squeak/Pharo VM world as “the sources”. But don’t confuse, the platform code is also source code…but anyway, when usually someone says the “sources” it means the autogenerated C code from VMMaker. This is why this C code is placed (by default) in /src.

To do both things (generate VM sources and CMake outputs), we use one of the available CMakeVMMaker configurations. Metacello configurations are represented by classes (instead of XML like maven or similar package management systems). How do you think CMakeVMMaker configurations are represented?? Of course!! with classes too. So, we need to find the accurate class for us. I won’t go further now because that’s topic of another post, but for the moment, lets use these classes CogUnixConfig, CogMsWindowsConfig and CocoaIOSCogJitConfig depending in which OS you are.

A little remark for Mac users: you can notice that there are two categories with CMake Mac configurations, ‘CMakeVMMaker-MacOS’ and ‘CMakeVMMaker-IOS’. The first one, is for Mac OS versions that use Carbon library. ‘CMakeVMMaker-IOS’ contains CMake configurations that use Cocoa instead, which is the new library for Mac OS. Carbon is legazy, and it may be removed soon in next MacOS versions. So, the new ones, and the ones you should use, are those configurations under ‘CMakeVMMaker-IOS’.

These configurations are flexible enough to set specific directories for sources, platforms, results, etc. In addition, if you follow certain conventions (defaults), the build is more automatic. For the purpose of this post, we will follow the conventions and use the expected default directories. The only real convention we should follow is that the .image should be in a subdirectory of the directory where you downloaded the GIT code. In my case (see the bash example at the beginning of the post), it is ~/Pharo/vm/git/cogVM/blessed.¬† So, I moved the .image to ~/Pharo/vm/git/cogVM/blessed/image. You can create your own directory¬† ~/Pharo/vm/git/cogVM/generator¬† and place it there. The only requirement is that the ‘platforms’ directory is found in ‘../platforms’.¬† So…did you move your image?¬†¬† Perfect, let’s continue.

No…wait. Why you need ‘platforms’ directory if we are not really compiling right now?¬† Ask yourself…do you think VMMaker translation to C needs the platform code?¬† Nooo!¬† So…we only need the platform directory for the second part, for¬† CMake. Now yes, we continue…take the Pharo image (which should have been moved) and evaluate:

"CocoaIOSCogJitConfig is an example. If you are not in Mac, replace it with CogMsWindowsConfig or CogUnixConfig"
CocoaIOSCogJitConfig  new
"Using VMMaker we translate Cog to C"
generateSources;
"We generate all the CMake necessary directories and files"
generate.

Ok…As my comments say, #generateSources uses VMMaker class to translate from SLANG to C. Instead of using a UI (VMMakerTool) we directly translate by code…but..do you remember that for compiling the VM we needed to say which plugins to compile and whether to compile them like internal or external? Ok…At that moment I told you that most of the developers shouldn’t be aware of that. In this case, CMakeVMMaker does the job for us. We will come later to this topic, but if you want to know which plugins are compiled, check implementors of #internalPlugins and #externalPlugins. Once again, CMakeVMMaker has defaults for these things, but you can customize and change them.

Where is the C generated code ? By default (yes, it can be changed) is placed in ‘../src’. In my example, it should be in ~/Pharo/vm/git/cogVM/blessed/src and should look like this:

mariano @ Aragorn : ~/Pharo/vm/git/cogVM/blessed/src
ls -la
total 16
drwxr-xr-x   6 mariano  staff   204 Apr  9 14:55 .
drwxr-xr-x  18 mariano  staff   612 Apr  9 14:55 ..
-rw-r--r--@  1 mariano  staff   776 Apr  9 14:55 examplePlugins.ext
-rw-r--r--@  1 mariano  staff    83 Apr  9 14:55 examplePlugins.int
drwxr-xr-x  43 mariano  staff  1462 Apr  9 14:55 plugins
drwxr-xr-x  11 mariano  staff   374 Apr  9 14:55 vm

In a future post, we will go deeper in how is the C translated cog…but if you want to take a look, go ahead!! Inspect the file /src/vm/cointerp.c¬† for example ūüôā¬†¬† So…do you already love SLANG? hehehe

With the method #generate we create all the directories and files needed by CMake so that we can after use CMake to generate different makefiles. You will notice that this method creates a directory /build. In my case, it is ~/Pharo/vm/git/cogVM/blessed/build. If you check inside that directory, there are a couple of important files generated for CMake (so that we can use it after), such as CMakeLists.txt, directories.cmake, etc. If you are curious, take a look to them.

If you are interested, I strongly recommend you to take a look to both methods: #generateSources and #generate. Now that I have explained the two big steps, I can tell you that there is a shortcut:

CocoaIOSCogJitConfig generateWithSources

Using CMake and compiling the VM

We are almost done…we already have all the necessary C code, and all the CMake files and directories. The next step is to just use CMake. I am newbie in both Git and CMake. But as far as I could see, CMake is a wonderful tool for being able to generate different makefiles from the same “definition”. So, we have already tell CMake which was our code, the directories, compiler flags, ec, etc. Then CMake can take such information and generate different makefiles: UNIX makefiles, MSYS (for Windows), XCode, Visual Studio, etc…In this post we will see just how to use regular Unix makefiles.

Now…come back to your terminal. We need to first go the /build directory and then execute CMake. In MacOS and Linux, evaluate:

cd build
cmake .

Now, in Windows we are compiling in MSYS, so we need to create special makefiles for it. The way to do this with CMake is using the parameter -G”Generator Name” where “Generator Name” is “MSYS Makefiles” in this case. So, in MSYS (in Windows) we evaluate:

cd build
 cmake . -G"MSYS Makefiles"

Once that is done, we have created all the necessary makefiles. Now, the last pending thing is just to “make”. It is the same whether you’re on windows or not. The makefiles have been generated, son only a make is needed:

make

Hopefully, you didn’t receive any compilation error and you can find your VM binary in /results (in Windows it is under /build/results) . Again, in my case it is ~/Pharo/vm/git/cogVM/blessed/results.

Problems I have found so far

Linux

It seems that the default CogUnixConfig needs OpenGL dev files (headers) and libs.This is because some plugins like Croquet or Balloon3D require such lib. And those plugins are being included by default in CogUnixConfig. So in my case I’ve got the error “The file was not found sqUnixOpenGL.h” which I fixed by installing the dev package:

sudo apt-get install mesa-common-dev

Then, I have a problem at linking time, “/usr/bin/ld: cannot find -lGL”, which I solved by doing:

delete /usr/lib/libGL.so
cd /usr/lib/
sudo ln -s libGL.so.1.2 libGL.so

Notice that other people have experienced the same problem but with the libSM.so (lSM) and libICE.so (lICE). Such problem could be resolved the same way. You may¬† want to use the command “locate” first to see where the library is and then do the “ln”.

Another solution (but I couldn’t test it by myself) to avoid using “ln”, could be to simply install the package libgl1-mesa-dev (sudo apt-get install libgl1-mesa-dev).

After having done all that, I then realised there is a special  CogUnixNoGLConfig  which seems to avoid linking to OpenGL. The VM will work until you use something like Croquet or the Ballon3D. For a more detailed explanation, read this thread.

If you have the error “alsa/asoundlib.h: No such file or directory” then you should install libasound2-dev, for example, by doing “sudo apt-get install¬†libasound2-dev”. Be aware that depending on your Linux distro and what packages you have installed on in, you may require to install a couple of packages or non. If you have a problem with a C header file not found (a .h file) you will probably need to install the “dev” package of the project. And if what it is not found is a library (a .so for example), then it is likely you will need to install the package that contains such libs. How do you know which package contains a specific header file or lib ? I have no idea. I always go to Google and I find the answer.

Windows

As you can read in this thread I could generate the VM in Windows by myself but I had a problem at compile time: “cc1.exe: error: invalid option `no-fused-madd'”. To solve that problem I edited the method CPlatformConfig >> configureFloatMathPlugin:.¬† I’ve changed the line “maker addDefinitions: ‘-O0 -mno-fused-madd’.”¬† to “maker addDefinitions: ‘-O0’.”. The effects of this change, may be that Croquet doesn’t work properly. For further details, read another thread.

Ok, this is the end of the post. I still remember (ok, only two years ago) the first day I could compile the VM. I felt so hacky and happy at the same time. I remember I chatted with my SqueakDBX companions about that ehehhehe. So…if I could help you to successfully build your own first VM, you own me a beer (yes, I will be at ESUG hahaha). If you spent a couple of hours and you couldn’t…..mmmm… ask the mailing list ūüôā¬†¬†¬† Seriously, as I told you, compiling the VM from scratch is complicated, even with all the effort made to ease that. If you had problems, please ask in the mailing lists. They will probably be able to help you, and your questions will make this process easier in a near future.

In the next post we will see some advanced topics of compiling the VM, and after that we will start to take a look to the VM inside. Once again, thanks Esteban, Igor, and everbody who answered my questions in the mailing list.

Mac OSX

As far as I understand, if you update to the latest XCode (4.2) from an older version (in Snow Leopard) it still¬† includes the GCC but it sets LLVM as the default compiler. In a fresh installation of Lion/XCode, not only LLVM is the default compiler but also GCC is not installed. To check which compiler area you using, execute in a console “gcc –version”. If it says something like “i686-apple-darwin11-gcc-4.2.1” then it is correct, you are using GCC. If it says something like “686-apple-darwin11-llvm-gcc-4.2” then it means you are using LLVM by default.

If GCC is not installed, you can install it via Mac Ports by doing “sudo port install apple-gcc42”.¬† You can follow http://stackoverflow.com/a/8593831/424245 to get it to appear in¬† Xcode, the last two steps will probably look like:
a) sudo ln -s /opt/local/bin/gcc-apple-4.2 /Developer/usr/bin/gcc-4.2
b) sudo ln -s /opt/local/bin/g++-apple-4.2 /Developer/usr/bin/g++-4.2

The CogVM needs GCC and cannot be compiled right now with LLVM so you have to use GCC. There are a couple of possibilities:
1) Change the default compiler for the whole system. To do this you have to edit the symbolic link of gcc to point to the real GCC and not LLVM. I do not recommend that much this option since you may be affecting the whole system.
2) When you do the cmake, reacher than simply do “cmake .” do: “cmake -D CMAKE_C_COMPILER=gcc-4.2 -D CMAKE_CXX_COMPILER=g++-4.2 .¬†” If that doesn’t work, try “cmake -D CMAKE_C_COMPILER=gcc-apple-4.2 -D CMAKE_CXX_COMPILER=g++-apple-4.2 .”. I would recommend this option for those who are building for the first time.
3) Instead of doing 2) by hand, you can use a patch in CMakeVMMaker that does that for you. In such a case you should use the config CogMTCocoaIOSGCC42Config. Notice, however, that such patch was added in the latests versions of CMakeVMMaker. In this post, I put specific versions of each part of the VM building system. Therefore, if you want to use the latest version of CMakeVMMaker you should also use the latest code from git and from ConfigurationOfCog.

For more details see this thread and this one in the VM mailing list:


First stop: VM’s SCM and related stuff

You want to compile your own VM, don’t you? Compiling the VM just for compiling it and following some instructions is not really helpful, otherwise why don’t you directly download the VM binary ?¬† My idea with this sequence of posts is that you understand and learn about the VM.

So…in order to compile the VM, you will have to deal with the problem of the VM’s Software Configuration Management. The first time I tried to compile the Pharo/Squeak VM was like 2 years ago. After that, I tried few times more, and most of the times I have some troubles. In addition, in the last months not only there have been a lot of changes related to code versioning and management, but also Cog VM come into play. So….a lot of people is confused where each part of the VM is committed, or what is needed to compile each VM. I will try to clarify all that so that in the next post we can finally compile the VM by ourself.

Since the Interpreter VM and Cog VM are a little different regarding the code management, I will split them.

Interpreter VM

Downloading code

So, if you remember from the previous post, we have 2 parts: VMMaker with the core of the VM, and the platform code. For the VMMaker it is easy: it is the VMMaker package in squeaksource. The platform code is the official SVN. This sound pretty straightforward, doesn’t it ?¬† but this is not true sometimes. There are several problems (some may probably be because of my ignorance) that I have found with this approach:

  1. The package VMMaker is not self contained, i.e, it has dependencies on other packages (some packages in the same repository and some in others). So…first problem, you need to know which other packages you need. For example, to build the VM you may need also the packages: ‘FFI-Pools’, ‘SharedPool-Speech’, ‘MemoryAccess’, ‘SlangBrowser’, ‘Balloon3D-Plugins’, ‘Plugin-XXX’, etc.
  2. Similar to the previous item, there is not only the problem of knowing which packages are needed, but instead which version. So…how do you know that for ‘VMMaker-dtl.221’ you need ‘FFI-Pools-eem.2’, ‘MemoryAccess-dtl.3’, ‘Balloon3D-Plugins-ar.6’, etc ?¬† Using just the last version of every package does work all the times.
  3. Sync between VMMaker and platform code. How do you know for each VMMaker version which SVN version you need of the platform code? or vice-versa how do you know which VMMaker version you need for a specific SVN version? once again, relying in the last version is not a reliable solution.
  4. Similar to 3) there is yet another problem: the platform code, as you can imagine, is split in one folder for every platform (see SVN): there is one for UNIX, one for Windows, for MacOS, and for iOS (but forget this one for the moment). Each platform has a “leader” or “maintainer”, which is the person in charge of implementing/modifying the code. The problem raises when there are changes in VMMaker for example, that require changes in all platform code, and this is not changed in all of them. So for example, in UNIX the changes are committed, but not in Mac OS. So…each platform code is not always in sync with the rest. Note that I am not complaining: this is all open-source and we all do our best. I am just telling you the problems I have seen so far.
  5. The previous problem happens not only for the commits in the repository, but also for the VM releases. Most of the times, they are not in sync. Maybe there is a particular platform that releases 5 times in a year, and maybe there is another one every 1 year and a half ūüė¶
  6. The version of every VM are not in sync. So for Mac for example you have Squeak 4.2.5beta1U, Squeak 5.7.4.1, Squeak 5.8b4, etc. For UNIX, Squeak-4.4.7.2357, Squeak-vm-3.7-7, Squeak 4.0.3.2202, etc.  In Windows, SqueakVM-Win32-4.1.1, SqueakVM-Win32-3.11.5, SqueakVM-Win32-3.10.9, etc. So as you can see, they are all completely different, and for me this is complicated since you cannot just refer to a unique VM version.
  7. The SVN repository is restricted, so you cannot commit unless you have authorized access. This could be a good and bad point at the same time.

I want to make it clear: I am not complaining against this, I am just telling the problems I have found, and how certain infrastructure that has been done in the last months helped with some of these issues.

So….you know that VMMaker is just another Monticello package, and you also know that you have to manage versions, dependencies, why not groups, etc…Does that ring a bell with anyone?¬† YEEES! Of course, Metacello ūüôā¬† So, one thing we did in Pharo (although I guess it is/was also used in Squeak), is to create a Metacello Configuration for VMMaker: ConfigurationOfVMMaker. For those that doesn’t know what Metacello is, it is a Package Management System on top of Monticello, where the ConfigurationOfVMMaker is a class where you can define versions, dependencies, etc, about your project. If you are a Smalltalker and you don’t know anything about Metacello I recommend you to take a look.

Anyhow, with ConfigurationOfVMMaker we solved the first two problems. With Metacello baselines we define all the structural information of the Interpreter VM: which packages are needed (the dependencies), possible groups (not everybody wants to load the same packages), repositories, etc. And with Metacello versions, we can define a whole set of working versions. For example, for ConfigurationOfVMMaker version 1.5 we load ‘VMMaker-dtl.221’, ‘MemoryAccess-dtl.3’, ‘FFI-Pools-eem.2’, etc. This is a set of frozen versions that we known to work properly all together. Notice that creating versions for ConfigurationOfVMMaker should be done by the “VM developers”. In fact, it was done by people like Torsten,¬† Laurent and me. But the important thing is that the user doesn’t need to do that. The only thing the user needs to do in order to load VMMaker with all its dependencies, and all loading a working version of every package, is to load the Metacello version. Do you want to try by yourself?¬† Just take this Pharo image, and evaluate:

Deprecation raiseWarning: false.
 Gofer new
 squeaksource:'MetacelloRepository';
 package:'ConfigurationOfVMMaker';
 load.
 ((Smalltalk at: #ConfigurationOfVMMaker) project version: '1.5') load.

Sorry for the ugly colors…wordpress.com doesn’t have Smalltalk ūüė¶

Why I told you to download that particular Pharo image? and why I am explicitly loading the version 1.5 instead of using the last one?  Because I want that my posts are reproducible. If you evaluate this instead:

 (Smalltalk at: #ConfigurationOfVMMaker) project lastVersion load.

I cannot guarantee that everything will be working properly. The same with the Pharo image. If you took any Pharo image 1.0, or 1.1 or 1.2, or Squeak 4.2, I am not sure that VMMaker will load correctly. The same if you load another version than 1.5. So…in this case, I am sure (because I test it before posting) that with that Pharo image and that version of ConfigurationOfVMMaker, VMMaker is working properly.

Coming back….the last point 3) is not yet solved, since you cannot know that for a certain SVN version you need XXX version of ConfigurationOfVMMaker, or vice-versa. But we will come to this later on…The rest of the problems are not solved either.

Generating the VM

You need both things to compile the VM: the C platform code that is directly committed in SVN and the generated C code from the VMMaker. Do you always need to translate VMMaker to C ? Not necessary, because the generated code is also committed in the SVN, usually under the “/src” folder, for example here. It is there so that if someone wants to compile, just download both parts and with GCC it compiles the VM. No need to take a Smalltalk image, load VMMaker, and generate sources. So… when is it really needed to generate sources from VMMaker?

  1. When the /src in the SVN is outdated in relation to the platform code.
  2. When you did changes in VMMaker. You can do changes in VMMaker just for fun, for your own project, for testing, etc.
  3. For learning purpose ūüôā

So…how do you compile the VM?¬† yes, of course, using a C compiler…but that’s not enough information! For example, usually you need to place the /src folder (where the output of the generated VMMaker sources go) in a certain place so that it is found by the makefiles. Even more, the problem is that each platform has its own instructions of how to compile. You can find the instructions for UNIX here, for Windows here, and for Mac OS (after searching this info for a long time) it seems (if it is not this please let me know) to be here.

Not only each platform has its own instructions to build the VM, but also some lack support for IDE. For example, it is not easy to b able to compile the VM out of the box with Microsoft Visual Studio or with Appel’s XCode. For example, for XCode, you need a .xcodeproj file for every project. The problem was that most of the times (at least when I tried) this file contained file locations of the commiter (which of course is different from mine). So, at the end, I usually need to do some modifications to the project in order to being able to compile and run the VM from XCode. I am telling you all this so that you can understand the progress we (the community) did in the last months…

Internal and external plugins

Before going further, let me do a little remark: did you remember I told you about the VM plugins?¬† Like FilePlugin, SocketPlugin, etc. Well, plugins can be compiled in two ways: internal or external. Internal plugins are linked together with the core classical VM, i.e, they are inside the binary file of the VM. External plugins are distributed as separate shared library and the cool thing is that you don’t need to do anything at all to the VM. At runtime the normal/standard VM can just load an external plugin and use it. Whether you should compile a plugin as internal or external is out of scope of this post. What is important here is that:

  • the normal guy that just wants to compiled the VM shouldn’t need to know how each plugin must be compiled.
  • there are some plugins that only work when they are compiled in one of the two ways.

Generating the VMMaker sources

Imagine that for any reason (maybe one of the above mentioned) you need/want to translate VMMaker package to C. How do you do that? The default approach with the Interpreter VM is by using a tool called VMMakerTool, which at the same time it is the name of the class ūüėȬ†¬† So…VMMakerTool is a class which is in the VMMaker repository and it is a UI that let you generate the sources. Here you can see a screenshot:

To reproduce the screenshot, just evaluate:

VMMakerTool openInWorld

The tool is pretty cool since it lets you to do a lot of things: choose which plugins to include and choose whether you want them internal or external, you can set the source output directory, the platform code directory, the CPU architecture (32 or 64 bits), etc. This tool is awesome, but from my point of view, it is too much for a non-VM-hacker guy. Why? Because of what I have already told you: the normal user shouldn’t need to know which plugins to include nor if they should be internal or external. At the same time, following some conventions, the directory for platform code and sources could be automatically set.

Fortunately, VMMakerTool is just the UI and it relies in the “model”, which is the VMMaker class (yes, VMMaker is the name of the squeaksurce repository, the name of one of the packages and also one of the classes heheheh).With the class VMMaker we can do the same of VMMakerTool but from code. Example:

| sourcePath |

"The path where I load from SVN"
sourcePath := '/Users/mariano/Pharo/VM/svnSqueakTree/trunk'.

"Generate new sources"
VMMaker default
 platformRootDirectoryName: sourcePath, '/platforms';
 sourceDirectoryName: sourcePath, '/platforms/iOS/vm/src';
 internal: #(
 ADPCMCodecPlugin
 B3DAcceleratorPlugin
 B3DEnginePlugin
 BalloonEnginePlugin

 "lots of plugins more.....I let few just for the example"

 SurfacePlugin
 UUIDPlugin
 DropPlugin)
 external: #();
 generateMainVM.

So…suppose that someone provides you the list of plugins for every platform, knowing which of them should be internal and which external, and following some conventions everything can be automatic?¬† Ok….we are going there, don’t worry ūüėČ

Cog VM

The infrastructur for the Cog¬† VM is a little messy for me so I would try to do my best to explain it. Cog started as a fork of the Interpreter VM. So…imagine that you want to create a fork for VMMaker (in squeaksource) and another fork in the SVN for the platform code. Monticello doesn’t provide real and easy branch support, so Cog needed to do something weird (at least for me). Suppose that a regular version of the VMMaker package is ‘VMMaker-dtl.161’. In this case ‘dtl’ is the initials of the committer, Dave Lewis. So…how does the Cog branch in VMMaker looks like???¬† they are just normal versions, but whose committer is ‘oscog’ (I guess this is because of Open-Source Cog). Example: ‘VMMaker-oscog.54’. That means that in order to load Cog, you need to open the VMMaker package, and search for a version that matches ‘VMMaker-oscog’. There is where Eliot commits the official Cog versions.

Exercise: Take a Monticello Browser, add the VMMaker repository and browse the version ‘VMMaker-dtl.223’. Then, browse ‘VMMaker-oscog.54’ and notice the difference between them. For example, in ‘VMMaker-oscog.54’ there are several categories that are not even present in ‘VMMaker-dtl.223’, like ‘VMMaker-JIT’, ‘VMMaker-Multithreading’, etc. Even more, the same categories contain different classes.

Now, regarding the branch in the platform code, this is much easier since it is a regular SVN branch which can be found here.

Fortunately, people have also developed a ConfiguraionOfCog which follows the same idea of ConfigurationOfVMMaker.

One difference I found with the regular VM is that Cog is supposed to be translated to C using VMMaker class directly (not VMMakerTool). You can see how to do it in this workspace.

So, in summary, they way to compile Cog VM is more or less the same as the Interpreter VM: you take a Smalltalk image, you load Cog (you can use ConfigurationOfCog), you generate sources, you checkout SVN branch, and finally compile (the instructions of how to build each VM is in the same SVN). Generating the sources may not be necessary if the /src is in sync with /platforms.

Finally, notice also that Eliot usually uploads regular VM builds (Cog VM binaries for all OS) to this url.

New infrastructure

The same way we should thanks Teleplace for Eliot Miranda’s work, we should also thanks INRIA for paying a Pharo engineer: Igor Stasenko. The good news is that since he started a couple of months ago he was not working for Pharo but instead for a new VM infrastructure . What is all this about? I’ll give you only a quick introduction because in the next post we are going to compile the VM using such infrastructure. Disclaimer: this new infrastructure is only for Cog VM and all its variants, but not Interpreter VM. I guess that’s because of the resources/time available.

So…in a nutshell, there are 3 big changes:

  1. Use GIT instead of SVN. There is a new repository for the platform code which is versioned by GIT instead of SVN. There is a new account for CogVM in gitorious. It seems that nowadays if you are not in Git you are not cool, you do not exist. Ok, we are cool now ūüôā¬† No one needs to ask for a blessing, everybody can clone, hack and push/share changes. People can pick the changes without having to have the permissions to publish.
  2. Use CMakeVMMaker instead of VMMakerTool. CMakeVMMaker is a little tool that automates the build. It has two important things: 1) translate VMMaker to C, using the VMMaker class and 2) generate CMake files so that to ease the build. To do this¬† it automatically assumes (although it can be customized) which plugins are needed and how they are compiled (if internal or external), the needed compiler flags, the directories needed, etc. CMake is an excellent tool for doing cross-platform compiling and for automatic stuff….CMakeVMMaker generates all the necessary files for CMake. For those who doesn’t know what CMake is, imagine one abstract step before makefiles. CMake is a cross-platform, open-source build system where you can define all necessary stuff like directories, compiler flags, etc, in text files. Once you have that, using CMake you can generate different outputs: normal makefiles where you can just use the command “make”, Appel’s XCode project or even Microsoft Visual Studio projects ūüôā
  3. Continuous Integration for VMs!!¬† Can you imagine that for every GIT commit, Mr. Hudson takes the latest PharoCore image, loads the Cog VM, generates sources, and compiles the VM for Windows, Linux and Mac OS ? Come on! isn’t this really cool?¬† Ok, you don’t believe me? Go to the Pharo CI for CogVM.

In the next post we will see how to use this new infrastructure and how is solves some of the mentioned problems along this post. I want to thanks Esteban, Igor, Dave and all who answer my questions in the mailing lists ūüôā

Links:


Departure: VM introduction

My main goals of this “Journey through the VM” are that the reader learns about the VM and to spread the VM knowledge, mostly for people who doesn’t know anything about VM. For these same reasons months ago I’ve created a special mailing list for beginners. Please, ask everything you don’t understand, correct me when I say something wrong, give your opinion, etc. There are always people ready to help in the mailing lists. So, let’s start.

Intro to VM

As you may know, there are different Smalltalk dialects and each of them has its own VM. In this Journey through the VM, we will use the PharoVM. I hope there are newcomers reading this blog, so I will explain for them: Pharo, was a fork of another Smalltalk dialect called Squeak. However, both of them still uses the same VM. To avoid confusions, I will try to just say VM, but know that it is the Pharo/Squeak VM.

As far as I am aware of, most, if not all, Smalltalk VMs are implemented in C (or maybe parts in C++). Fortunately for us, happy smalltalkers, the Pharo/Squeak VM consist of two parts of code: a) VMMaker (written in Smalltalk) ; b) “platform code” (C/Smalltalk-C hand-written code)

VMMaker

VMMaker is a normal repository in squeaksource, but its name is very misleading for me. One would imagine that it is a tool for building the VM, but the truth is that it is much more than that. In this VMMaker repository, there are different interesting packages but for the moment, we will concentrate in the most important one: the package VMMaker itself:

It is where the Smalltalk part of the VM is contained. I consider this part of the VM as the “core”, and it is written in Smalltalk. In fact, it is not really in Smalltalk but instead in a subset (limited) of Smalltalk called SLANG, which is then translated to C. So, even if this is much happier thank coding in C, you have to be aware that witting in SLANG has some limitations.

VMMaker has two very important classes: Interpreter and ObjectMemory. As it says in Interpreter class comment, they both together represents a complete implementation of the Smalltalk-80 virtual machine, derived originally from the Blue Book specification but quite different in some areas.

As you may know, every time to save a method in Smalltalk, the Compiler takes such source code, it analyzes, validates, parses, and finally compiles it, generating a CompiledMethod instance. In this object, all the literals and bytecodes are encoded. So, one of the main tasks of the VM is to interpret (that’s why its name) and execute Smalltalk bytecodes that reads from the image. In addition, Interpreter implements some VM responsibilities like methods lookup, cache and execution, primitives, arithmetic operations, etc.

ObjectMemory describes the object memory of the VM. Its responsibility includes allocating/deallocating memory, Garbage Collector and memory compaction, it defines the structure and flags of the object header, object enumeration, etc.

Platform code

Fortunately, the VM is decoupled from the issue of the supporting platform specific stuff, hence the “platform code”. It contains those parts that depends on the OS like sockets, display and files, where performance is needed, etc. As I told you, I understand the VMMaker part as the “core”. But there are a lot of other features that the VM should provide and they are not part of the “core” but instead they are written like plugins. There several plugins like FilePlugin, SocketPlugin, SoundPlugin, etc. We are not going further now with the plugins since there will be a post later for that (you will be even able to write you own plugin!!!). What is important to understand now is that the platform has code both things: for the VM “core” part, and for plugins.

I think (I may be wrong) that another good reason to have the “platform code” is to increase VM portabilty. Suppose you need to run the VM on a particular hardware/OS that supports C. In this case, you will probably need to change a little or even anything of the VM core (VMMaker). What you will need is to create a specific platform code for that hardware/OS. I think this is one of the reasons why you see Squeak VM running everywhere.

Most of the times that you want to hack, play or do research (which is the difference anyway between this verbs? heheh) with the VM, you usually modify the VMMaker part. It is not likely that you will need to modify something in the platform code.

How the VM is built ?

So…we have two parts of the VM: VMMaker (written in SLANG) and the platform code written in C (for the new MacOS platform there are parts written in Objective-C in order to talk with the cocoa library). There are two steps:

  1. Translate VMMaker (ObjectMemory, Interpreter, the plugins, etc)¬† to C. This is done by the classes that are in the category ‘VMMaker-Translation to C’.
  2. Compile the C source code generated from the previous step, together with the platform code.

We will not go further on this topic now because this is explained in the next post or so.

The VM is written is Smalltalk? really? is that cool?

Ok, this can be controversial, so as always, this is just my point of view. I love Smalltalk. I love browsing for senders, implementors, references, doing refactors, browsing methods and classes, etc. If you are reading this blog, you probably understand my feelings ūüôā¬† There are people who love C. Ok, I don’t. I’ve even contributed (with code and fixes) to the OpenDBX library (written in C), so I can do it. But…… do you know how cool is to be able to read and modify the VM from the same image that you usually use ? SLANG is quite limited, and sometimes it looks like C, of course, but it is still much better than coding in C for me:

  1. You use the SAME Smalltalk image you use for other purposes
  2. You have all the tools provided by the Smalltalk IDE: senders, implementors, refactors, monticello, versions, etc. You browse Interpreter and ObjectMemory classes like if they were regular classes (indeed they are!).
  3. Most of the cases, you don’t need to deal with C (sometimes you do need).
  4. It can be versioned  in Monticello just like any other project.
  5. This is cool: you take you image, you modify something in VMMaker, generate sources, go to a command line and using gcc (and the platform code) compiles the vm. Then, you take the compiled VM and run the same image you used for generating it hhehehe. Isn’t that cool?
  6. Finally, and this is the most important for several persons, the VM can be simulated. Since it is Smalltalk, you can run a InterpreterSimulator, inside a host image/VM. But we will talk about this later in the journey.

Cog VM and current status

Some time ago, Eliot Miranda (thanks to Teleplace company) started to work on a new VM for Squeak (and all its fork, like Pharo). This VM is called Cog VM and it has been already released. In fact, is the default VM included in the PharoOneClick 1.1.1 and beyond. Cog VM includes a lot of new features, but in a glance:

  1. Real and optimized block closure implementation. This is why from the image side blocks are now instances of BlockClosure instead of BlockContext.
  2. Context-to-stack mapping.
  3. JIT (just in time compiler) that compiles methods to machine code.
  4. PIC (polymorphic inline caching).
  5. Multi-threading.

If you are a little arround Smalltalk you may have heard about Cog VM and Stack VM. What is the big difference? Stack VM implements 1) and 2). And Cog VM is on top of the Stack VM and adds 3) and 4). Finally, there is CogMT VM which is on top of Cog VM and adds multi-threading support for external calls (like FFI for example). Now…to be able to put names to all these VMs and not get confused, we will call our old VM Interpreter VM.

Cog VMs have increased performance arround 4x-10x. In addition, it has refactored a couple of things. For example, in Interpreter VM, the Interpreter was a subclass of ObjectMemory (WTF?). But that was necessary in order to easily translate to C. In Cog, there are new classes like CoInterpreter and NewObjectMemory. But the good news is that we can have composition!! The CoInterpreter has an instance variable that is the object memory (in this case an instance of NewObjectMemory). This is awesome for us, and of course, it required changes in the SLANG to C translator. So…as you can see, Cog is really important for the Squeak and Pharo community.

Links:

One of the problems I found when trying to learn the VM is that there is little documentation and even more, it is not very known. This is why in every post I will try to put the links I know to the related topics. Keep and share them. They may be useful if you want to go further in the VM or in certain areas. Notice that it is probable that some information in there is outdated, but that’s better than nothing ūüėČ

So, that was all for today. I tried to give you a first overview of the VM. The first posts may be “boring” but they will get better ūüôā¬†¬† If you have questions, remarks, or any kind of feedback, let me know.