Tag Archives: VMMaker

First stop: VM’s SCM and related stuff

You want to compile your own VM, don’t you? Compiling the VM just for compiling it and following some instructions is not really helpful, otherwise why don’t you directly download the VM binary ?  My idea with this sequence of posts is that you understand and learn about the VM.

So…in order to compile the VM, you will have to deal with the problem of the VM’s Software Configuration Management. The first time I tried to compile the Pharo/Squeak VM was like 2 years ago. After that, I tried few times more, and most of the times I have some troubles. In addition, in the last months not only there have been a lot of changes related to code versioning and management, but also Cog VM come into play. So….a lot of people is confused where each part of the VM is committed, or what is needed to compile each VM. I will try to clarify all that so that in the next post we can finally compile the VM by ourself.

Since the Interpreter VM and Cog VM are a little different regarding the code management, I will split them.

Interpreter VM

Downloading code

So, if you remember from the previous post, we have 2 parts: VMMaker with the core of the VM, and the platform code. For the VMMaker it is easy: it is the VMMaker package in squeaksource. The platform code is the official SVN. This sound pretty straightforward, doesn’t it ?  but this is not true sometimes. There are several problems (some may probably be because of my ignorance) that I have found with this approach:

  1. The package VMMaker is not self contained, i.e, it has dependencies on other packages (some packages in the same repository and some in others). So…first problem, you need to know which other packages you need. For example, to build the VM you may need also the packages: ‘FFI-Pools’, ‘SharedPool-Speech’, ‘MemoryAccess’, ‘SlangBrowser’, ‘Balloon3D-Plugins’, ‘Plugin-XXX’, etc.
  2. Similar to the previous item, there is not only the problem of knowing which packages are needed, but instead which version. So…how do you know that for ‘VMMaker-dtl.221’ you need ‘FFI-Pools-eem.2’, ‘MemoryAccess-dtl.3’, ‘Balloon3D-Plugins-ar.6’, etc ?  Using just the last version of every package does work all the times.
  3. Sync between VMMaker and platform code. How do you know for each VMMaker version which SVN version you need of the platform code? or vice-versa how do you know which VMMaker version you need for a specific SVN version? once again, relying in the last version is not a reliable solution.
  4. Similar to 3) there is yet another problem: the platform code, as you can imagine, is split in one folder for every platform (see SVN): there is one for UNIX, one for Windows, for MacOS, and for iOS (but forget this one for the moment). Each platform has a “leader” or “maintainer”, which is the person in charge of implementing/modifying the code. The problem raises when there are changes in VMMaker for example, that require changes in all platform code, and this is not changed in all of them. So for example, in UNIX the changes are committed, but not in Mac OS. So…each platform code is not always in sync with the rest. Note that I am not complaining: this is all open-source and we all do our best. I am just telling you the problems I have seen so far.
  5. The previous problem happens not only for the commits in the repository, but also for the VM releases. Most of the times, they are not in sync. Maybe there is a particular platform that releases 5 times in a year, and maybe there is another one every 1 year and a half 😦
  6. The version of every VM are not in sync. So for Mac for example you have Squeak 4.2.5beta1U, Squeak 5.7.4.1, Squeak 5.8b4, etc. For UNIX, Squeak-4.4.7.2357, Squeak-vm-3.7-7, Squeak 4.0.3.2202, etc.  In Windows, SqueakVM-Win32-4.1.1, SqueakVM-Win32-3.11.5, SqueakVM-Win32-3.10.9, etc. So as you can see, they are all completely different, and for me this is complicated since you cannot just refer to a unique VM version.
  7. The SVN repository is restricted, so you cannot commit unless you have authorized access. This could be a good and bad point at the same time.

I want to make it clear: I am not complaining against this, I am just telling the problems I have found, and how certain infrastructure that has been done in the last months helped with some of these issues.

So….you know that VMMaker is just another Monticello package, and you also know that you have to manage versions, dependencies, why not groups, etc…Does that ring a bell with anyone?  YEEES! Of course, Metacello 🙂  So, one thing we did in Pharo (although I guess it is/was also used in Squeak), is to create a Metacello Configuration for VMMaker: ConfigurationOfVMMaker. For those that doesn’t know what Metacello is, it is a Package Management System on top of Monticello, where the ConfigurationOfVMMaker is a class where you can define versions, dependencies, etc, about your project. If you are a Smalltalker and you don’t know anything about Metacello I recommend you to take a look.

Anyhow, with ConfigurationOfVMMaker we solved the first two problems. With Metacello baselines we define all the structural information of the Interpreter VM: which packages are needed (the dependencies), possible groups (not everybody wants to load the same packages), repositories, etc. And with Metacello versions, we can define a whole set of working versions. For example, for ConfigurationOfVMMaker version 1.5 we load ‘VMMaker-dtl.221’, ‘MemoryAccess-dtl.3’, ‘FFI-Pools-eem.2’, etc. This is a set of frozen versions that we known to work properly all together. Notice that creating versions for ConfigurationOfVMMaker should be done by the “VM developers”. In fact, it was done by people like Torsten,  Laurent and me. But the important thing is that the user doesn’t need to do that. The only thing the user needs to do in order to load VMMaker with all its dependencies, and all loading a working version of every package, is to load the Metacello version. Do you want to try by yourself?  Just take this Pharo image, and evaluate:

Deprecation raiseWarning: false.
 Gofer new
 squeaksource:'MetacelloRepository';
 package:'ConfigurationOfVMMaker';
 load.
 ((Smalltalk at: #ConfigurationOfVMMaker) project version: '1.5') load.

Sorry for the ugly colors…wordpress.com doesn’t have Smalltalk 😦

Why I told you to download that particular Pharo image? and why I am explicitly loading the version 1.5 instead of using the last one?  Because I want that my posts are reproducible. If you evaluate this instead:

 (Smalltalk at: #ConfigurationOfVMMaker) project lastVersion load.

I cannot guarantee that everything will be working properly. The same with the Pharo image. If you took any Pharo image 1.0, or 1.1 or 1.2, or Squeak 4.2, I am not sure that VMMaker will load correctly. The same if you load another version than 1.5. So…in this case, I am sure (because I test it before posting) that with that Pharo image and that version of ConfigurationOfVMMaker, VMMaker is working properly.

Coming back….the last point 3) is not yet solved, since you cannot know that for a certain SVN version you need XXX version of ConfigurationOfVMMaker, or vice-versa. But we will come to this later on…The rest of the problems are not solved either.

Generating the VM

You need both things to compile the VM: the C platform code that is directly committed in SVN and the generated C code from the VMMaker. Do you always need to translate VMMaker to C ? Not necessary, because the generated code is also committed in the SVN, usually under the “/src” folder, for example here. It is there so that if someone wants to compile, just download both parts and with GCC it compiles the VM. No need to take a Smalltalk image, load VMMaker, and generate sources. So… when is it really needed to generate sources from VMMaker?

  1. When the /src in the SVN is outdated in relation to the platform code.
  2. When you did changes in VMMaker. You can do changes in VMMaker just for fun, for your own project, for testing, etc.
  3. For learning purpose 🙂

So…how do you compile the VM?  yes, of course, using a C compiler…but that’s not enough information! For example, usually you need to place the /src folder (where the output of the generated VMMaker sources go) in a certain place so that it is found by the makefiles. Even more, the problem is that each platform has its own instructions of how to compile. You can find the instructions for UNIX here, for Windows here, and for Mac OS (after searching this info for a long time) it seems (if it is not this please let me know) to be here.

Not only each platform has its own instructions to build the VM, but also some lack support for IDE. For example, it is not easy to b able to compile the VM out of the box with Microsoft Visual Studio or with Appel’s XCode. For example, for XCode, you need a .xcodeproj file for every project. The problem was that most of the times (at least when I tried) this file contained file locations of the commiter (which of course is different from mine). So, at the end, I usually need to do some modifications to the project in order to being able to compile and run the VM from XCode. I am telling you all this so that you can understand the progress we (the community) did in the last months…

Internal and external plugins

Before going further, let me do a little remark: did you remember I told you about the VM plugins?  Like FilePlugin, SocketPlugin, etc. Well, plugins can be compiled in two ways: internal or external. Internal plugins are linked together with the core classical VM, i.e, they are inside the binary file of the VM. External plugins are distributed as separate shared library and the cool thing is that you don’t need to do anything at all to the VM. At runtime the normal/standard VM can just load an external plugin and use it. Whether you should compile a plugin as internal or external is out of scope of this post. What is important here is that:

  • the normal guy that just wants to compiled the VM shouldn’t need to know how each plugin must be compiled.
  • there are some plugins that only work when they are compiled in one of the two ways.

Generating the VMMaker sources

Imagine that for any reason (maybe one of the above mentioned) you need/want to translate VMMaker package to C. How do you do that? The default approach with the Interpreter VM is by using a tool called VMMakerTool, which at the same time it is the name of the class 😉   So…VMMakerTool is a class which is in the VMMaker repository and it is a UI that let you generate the sources. Here you can see a screenshot:

To reproduce the screenshot, just evaluate:

VMMakerTool openInWorld

The tool is pretty cool since it lets you to do a lot of things: choose which plugins to include and choose whether you want them internal or external, you can set the source output directory, the platform code directory, the CPU architecture (32 or 64 bits), etc. This tool is awesome, but from my point of view, it is too much for a non-VM-hacker guy. Why? Because of what I have already told you: the normal user shouldn’t need to know which plugins to include nor if they should be internal or external. At the same time, following some conventions, the directory for platform code and sources could be automatically set.

Fortunately, VMMakerTool is just the UI and it relies in the “model”, which is the VMMaker class (yes, VMMaker is the name of the squeaksurce repository, the name of one of the packages and also one of the classes heheheh).With the class VMMaker we can do the same of VMMakerTool but from code. Example:

| sourcePath |

"The path where I load from SVN"
sourcePath := '/Users/mariano/Pharo/VM/svnSqueakTree/trunk'.

"Generate new sources"
VMMaker default
 platformRootDirectoryName: sourcePath, '/platforms';
 sourceDirectoryName: sourcePath, '/platforms/iOS/vm/src';
 internal: #(
 ADPCMCodecPlugin
 B3DAcceleratorPlugin
 B3DEnginePlugin
 BalloonEnginePlugin

 "lots of plugins more.....I let few just for the example"

 SurfacePlugin
 UUIDPlugin
 DropPlugin)
 external: #();
 generateMainVM.

So…suppose that someone provides you the list of plugins for every platform, knowing which of them should be internal and which external, and following some conventions everything can be automatic?  Ok….we are going there, don’t worry 😉

Cog VM

The infrastructur for the Cog  VM is a little messy for me so I would try to do my best to explain it. Cog started as a fork of the Interpreter VM. So…imagine that you want to create a fork for VMMaker (in squeaksource) and another fork in the SVN for the platform code. Monticello doesn’t provide real and easy branch support, so Cog needed to do something weird (at least for me). Suppose that a regular version of the VMMaker package is ‘VMMaker-dtl.161’. In this case ‘dtl’ is the initials of the committer, Dave Lewis. So…how does the Cog branch in VMMaker looks like???  they are just normal versions, but whose committer is ‘oscog’ (I guess this is because of Open-Source Cog). Example: ‘VMMaker-oscog.54’. That means that in order to load Cog, you need to open the VMMaker package, and search for a version that matches ‘VMMaker-oscog’. There is where Eliot commits the official Cog versions.

Exercise: Take a Monticello Browser, add the VMMaker repository and browse the version ‘VMMaker-dtl.223’. Then, browse ‘VMMaker-oscog.54’ and notice the difference between them. For example, in ‘VMMaker-oscog.54’ there are several categories that are not even present in ‘VMMaker-dtl.223’, like ‘VMMaker-JIT’, ‘VMMaker-Multithreading’, etc. Even more, the same categories contain different classes.

Now, regarding the branch in the platform code, this is much easier since it is a regular SVN branch which can be found here.

Fortunately, people have also developed a ConfiguraionOfCog which follows the same idea of ConfigurationOfVMMaker.

One difference I found with the regular VM is that Cog is supposed to be translated to C using VMMaker class directly (not VMMakerTool). You can see how to do it in this workspace.

So, in summary, they way to compile Cog VM is more or less the same as the Interpreter VM: you take a Smalltalk image, you load Cog (you can use ConfigurationOfCog), you generate sources, you checkout SVN branch, and finally compile (the instructions of how to build each VM is in the same SVN). Generating the sources may not be necessary if the /src is in sync with /platforms.

Finally, notice also that Eliot usually uploads regular VM builds (Cog VM binaries for all OS) to this url.

New infrastructure

The same way we should thanks Teleplace for Eliot Miranda’s work, we should also thanks INRIA for paying a Pharo engineer: Igor Stasenko. The good news is that since he started a couple of months ago he was not working for Pharo but instead for a new VM infrastructure . What is all this about? I’ll give you only a quick introduction because in the next post we are going to compile the VM using such infrastructure. Disclaimer: this new infrastructure is only for Cog VM and all its variants, but not Interpreter VM. I guess that’s because of the resources/time available.

So…in a nutshell, there are 3 big changes:

  1. Use GIT instead of SVN. There is a new repository for the platform code which is versioned by GIT instead of SVN. There is a new account for CogVM in gitorious. It seems that nowadays if you are not in Git you are not cool, you do not exist. Ok, we are cool now 🙂  No one needs to ask for a blessing, everybody can clone, hack and push/share changes. People can pick the changes without having to have the permissions to publish.
  2. Use CMakeVMMaker instead of VMMakerTool. CMakeVMMaker is a little tool that automates the build. It has two important things: 1) translate VMMaker to C, using the VMMaker class and 2) generate CMake files so that to ease the build. To do this  it automatically assumes (although it can be customized) which plugins are needed and how they are compiled (if internal or external), the needed compiler flags, the directories needed, etc. CMake is an excellent tool for doing cross-platform compiling and for automatic stuff….CMakeVMMaker generates all the necessary files for CMake. For those who doesn’t know what CMake is, imagine one abstract step before makefiles. CMake is a cross-platform, open-source build system where you can define all necessary stuff like directories, compiler flags, etc, in text files. Once you have that, using CMake you can generate different outputs: normal makefiles where you can just use the command “make”, Appel’s XCode project or even Microsoft Visual Studio projects 🙂
  3. Continuous Integration for VMs!!  Can you imagine that for every GIT commit, Mr. Hudson takes the latest PharoCore image, loads the Cog VM, generates sources, and compiles the VM for Windows, Linux and Mac OS ? Come on! isn’t this really cool?  Ok, you don’t believe me? Go to the Pharo CI for CogVM.

In the next post we will see how to use this new infrastructure and how is solves some of the mentioned problems along this post. I want to thanks Esteban, Igor, Dave and all who answer my questions in the mailing lists 🙂

Links:

Advertisements

Departure: VM introduction

My main goals of this “Journey through the VM” are that the reader learns about the VM and to spread the VM knowledge, mostly for people who doesn’t know anything about VM. For these same reasons months ago I’ve created a special mailing list for beginners. Please, ask everything you don’t understand, correct me when I say something wrong, give your opinion, etc. There are always people ready to help in the mailing lists. So, let’s start.

Intro to VM

As you may know, there are different Smalltalk dialects and each of them has its own VM. In this Journey through the VM, we will use the PharoVM. I hope there are newcomers reading this blog, so I will explain for them: Pharo, was a fork of another Smalltalk dialect called Squeak. However, both of them still uses the same VM. To avoid confusions, I will try to just say VM, but know that it is the Pharo/Squeak VM.

As far as I am aware of, most, if not all, Smalltalk VMs are implemented in C (or maybe parts in C++). Fortunately for us, happy smalltalkers, the Pharo/Squeak VM consist of two parts of code: a) VMMaker (written in Smalltalk) ; b) “platform code” (C/Smalltalk-C hand-written code)

VMMaker

VMMaker is a normal repository in squeaksource, but its name is very misleading for me. One would imagine that it is a tool for building the VM, but the truth is that it is much more than that. In this VMMaker repository, there are different interesting packages but for the moment, we will concentrate in the most important one: the package VMMaker itself:

It is where the Smalltalk part of the VM is contained. I consider this part of the VM as the “core”, and it is written in Smalltalk. In fact, it is not really in Smalltalk but instead in a subset (limited) of Smalltalk called SLANG, which is then translated to C. So, even if this is much happier thank coding in C, you have to be aware that witting in SLANG has some limitations.

VMMaker has two very important classes: Interpreter and ObjectMemory. As it says in Interpreter class comment, they both together represents a complete implementation of the Smalltalk-80 virtual machine, derived originally from the Blue Book specification but quite different in some areas.

As you may know, every time to save a method in Smalltalk, the Compiler takes such source code, it analyzes, validates, parses, and finally compiles it, generating a CompiledMethod instance. In this object, all the literals and bytecodes are encoded. So, one of the main tasks of the VM is to interpret (that’s why its name) and execute Smalltalk bytecodes that reads from the image. In addition, Interpreter implements some VM responsibilities like methods lookup, cache and execution, primitives, arithmetic operations, etc.

ObjectMemory describes the object memory of the VM. Its responsibility includes allocating/deallocating memory, Garbage Collector and memory compaction, it defines the structure and flags of the object header, object enumeration, etc.

Platform code

Fortunately, the VM is decoupled from the issue of the supporting platform specific stuff, hence the “platform code”. It contains those parts that depends on the OS like sockets, display and files, where performance is needed, etc. As I told you, I understand the VMMaker part as the “core”. But there are a lot of other features that the VM should provide and they are not part of the “core” but instead they are written like plugins. There several plugins like FilePlugin, SocketPlugin, SoundPlugin, etc. We are not going further now with the plugins since there will be a post later for that (you will be even able to write you own plugin!!!). What is important to understand now is that the platform has code both things: for the VM “core” part, and for plugins.

I think (I may be wrong) that another good reason to have the “platform code” is to increase VM portabilty. Suppose you need to run the VM on a particular hardware/OS that supports C. In this case, you will probably need to change a little or even anything of the VM core (VMMaker). What you will need is to create a specific platform code for that hardware/OS. I think this is one of the reasons why you see Squeak VM running everywhere.

Most of the times that you want to hack, play or do research (which is the difference anyway between this verbs? heheh) with the VM, you usually modify the VMMaker part. It is not likely that you will need to modify something in the platform code.

How the VM is built ?

So…we have two parts of the VM: VMMaker (written in SLANG) and the platform code written in C (for the new MacOS platform there are parts written in Objective-C in order to talk with the cocoa library). There are two steps:

  1. Translate VMMaker (ObjectMemory, Interpreter, the plugins, etc)  to C. This is done by the classes that are in the category ‘VMMaker-Translation to C’.
  2. Compile the C source code generated from the previous step, together with the platform code.

We will not go further on this topic now because this is explained in the next post or so.

The VM is written is Smalltalk? really? is that cool?

Ok, this can be controversial, so as always, this is just my point of view. I love Smalltalk. I love browsing for senders, implementors, references, doing refactors, browsing methods and classes, etc. If you are reading this blog, you probably understand my feelings 🙂  There are people who love C. Ok, I don’t. I’ve even contributed (with code and fixes) to the OpenDBX library (written in C), so I can do it. But…… do you know how cool is to be able to read and modify the VM from the same image that you usually use ? SLANG is quite limited, and sometimes it looks like C, of course, but it is still much better than coding in C for me:

  1. You use the SAME Smalltalk image you use for other purposes
  2. You have all the tools provided by the Smalltalk IDE: senders, implementors, refactors, monticello, versions, etc. You browse Interpreter and ObjectMemory classes like if they were regular classes (indeed they are!).
  3. Most of the cases, you don’t need to deal with C (sometimes you do need).
  4. It can be versioned  in Monticello just like any other project.
  5. This is cool: you take you image, you modify something in VMMaker, generate sources, go to a command line and using gcc (and the platform code) compiles the vm. Then, you take the compiled VM and run the same image you used for generating it hhehehe. Isn’t that cool?
  6. Finally, and this is the most important for several persons, the VM can be simulated. Since it is Smalltalk, you can run a InterpreterSimulator, inside a host image/VM. But we will talk about this later in the journey.

Cog VM and current status

Some time ago, Eliot Miranda (thanks to Teleplace company) started to work on a new VM for Squeak (and all its fork, like Pharo). This VM is called Cog VM and it has been already released. In fact, is the default VM included in the PharoOneClick 1.1.1 and beyond. Cog VM includes a lot of new features, but in a glance:

  1. Real and optimized block closure implementation. This is why from the image side blocks are now instances of BlockClosure instead of BlockContext.
  2. Context-to-stack mapping.
  3. JIT (just in time compiler) that compiles methods to machine code.
  4. PIC (polymorphic inline caching).
  5. Multi-threading.

If you are a little arround Smalltalk you may have heard about Cog VM and Stack VM. What is the big difference? Stack VM implements 1) and 2). And Cog VM is on top of the Stack VM and adds 3) and 4). Finally, there is CogMT VM which is on top of Cog VM and adds multi-threading support for external calls (like FFI for example). Now…to be able to put names to all these VMs and not get confused, we will call our old VM Interpreter VM.

Cog VMs have increased performance arround 4x-10x. In addition, it has refactored a couple of things. For example, in Interpreter VM, the Interpreter was a subclass of ObjectMemory (WTF?). But that was necessary in order to easily translate to C. In Cog, there are new classes like CoInterpreter and NewObjectMemory. But the good news is that we can have composition!! The CoInterpreter has an instance variable that is the object memory (in this case an instance of NewObjectMemory). This is awesome for us, and of course, it required changes in the SLANG to C translator. So…as you can see, Cog is really important for the Squeak and Pharo community.

Links:

One of the problems I found when trying to learn the VM is that there is little documentation and even more, it is not very known. This is why in every post I will try to put the links I know to the related topics. Keep and share them. They may be useful if you want to go further in the VM or in certain areas. Notice that it is probable that some information in there is outdated, but that’s better than nothing 😉

So, that was all for today. I tried to give you a first overview of the VM. The first posts may be “boring” but they will get better 🙂   If you have questions, remarks, or any kind of feedback, let me know.