Monthly Archives: July 2011

See you at ESUG?

You can notice that I’ve added a picture/link of ESUG on the right column of the blog. Even if I am young and I don’t have experience traveling arround the world, I can tell you that ESUG Conference, as well as Smalltalks Conference, are just AWESOME. I have assisted to all Smalltalks from 2007 to 2010 and ESUG 09 and 10. All of them are great and not only because of the talks. The talks are the less important for me ūüėČ

What is really great is to meet people. Have you ever sent 1 million emails to someone without even knowing his face?. Is he 70 years old? 20 ? What language does he speak?¬† Well, ESUG is the perfect conference to meet people by real, face to face. The best part of ESUG happens in the “corridors”, I mean, talking with people between talks, after the conference, in the social event, etc. There will be people who will ask you about your stuff, they will give you feedback and ideas. You will find yourself wanting to give feedback to others. It is a nice circle.

And the Smalltalk community is unique in the sense that you can assist ESUG and meet the developers of the tools you use to have fun, make a living, study, do research, work, or whatever. You can take a beer with the developers/creators of Pharo, Seaside, AidaWeb, GemStone, Pier, Moose, Squeak/Pharo VM, DBXTalk, Metacello, Glorp, Cincom, VA, etc….. And what is also great about smalltalkers is that a smalltalker is likely to be a very good person. ESUG is friendly and smalltalkers come from all over the world. Come on, join us ūüôā

This year ESUG Conference is in Edinburgh, Scotland. All the information you may need, such as list of talks, schedule, maps, venue, etc, is in http://www.esug.org/wiki/pier/Conferences/2011.

Now, regarding the “Journey through the Virtual Machine”, if you like that topic, you have to attend ESUG. There you can meet Igor Stasenko (author of HydraVM, NativeBoost, Hudson VM configurations, CMakeVMMaker, etc),¬† Esteban Lorenzano (the new maintainer of Mac Squeak VM and Cocoa port), Andres Valloud (VM developer at Cincom), Javier Burroni and Gerardo Richarte (authors of SqueakNOS and the JIT/GC implemented in Smalltalk), etc. Even more, there is a special workshop for you: “Compiling your own VM” by Igor. This is a hand-on tutorial. So, what are you waiting for? In addition, there are even more talks related to VM. Check the schedule for more details.

I will give some talks as well, but I will comment that in another post.

So…see you there?


The second part of the journey is over

Hi. As the title says, the second part of the journey is over. This part consisted of 6 posts during 2 months and a half. I started explaining the most basic introduction to Smalltalk reflective model. For those readers just knowing the basis of Smalltalk but not its internal model, I’ve explained the basic classes involved in the internals such as Class, Metaclass, MethodDictionary, CompiledMethod, etc. I then talk a little about compiler’s role in the system.

Then I followed with an explanation of what is known as object (or class) format. This was necessary in order to really understand the uniqueness of CompiledMethod. I explained each format and also both sides of the game: what happens from image side and VM side. We saw how the format is encoded in objects, how the format is read, and how to create objects with different formats. We finally saw the special format of CompiledMethod.

Once we knew about object formats, we continue with an introduction to CompiledMethods. I explained how to understand the results of inspecting/exploring a CompiledMethod instance. It was necessary to explain the CompiledMethod header and trailer. Then I showed how we can decompile a CompiledMethod and using the bytecodes and literals, how to get a possible source code. Finally, it was funny to mention CompiledMethod equality ūüėČ

After given an introduction to CompiledMethods, it followed an introduction to bytecodes (part of a CompiledMethod). Bytecodes are a fundamental part of Smalltalk and it is important to understand them. Of course, you needed to know what bytecodes are first, so that was the start. Then I explained how to understand/read/interpret the  bytecodes that the system browser show us for a particular method. More interesting was to see how bytecodes are mapped to the VM code (what the VM does for each bytecode and how you can get such code). Then I showed the limitation of using one byte per bytecode and hence I presented the extended bytecodes. Finally, I mentioned the types/groups of bytecodes (they come even from the blue book) and showed them in the VMMaker code.

I then continued with a post that explained several things such as Primitives, Pragmas, Literals and their relation to CompiledMethods. Basically, the post started with a definition of method primitives and some examples. Then we saw what is a Pragma and how they are related/encoded in a CompiledMethod. Then it follows to see the impact of defining primitives in the CompiledMethod instances. Finally the post shows how primitives are mapped from image side to VM side.

The last post of this part was a pending explanation of the previous one: what are named primitives?  We saw the difference between them and numbered primitives and explain the basic concepts. Then we saw that plugins can be compiled both internal or external, and then how named primitives impact CompiledMethod instances.

So…that was all. I hope you have enjoyed this part as much as I enjoyed by writing it. As always, any feedback is more than welcome. The future part? I am not sure about it ūüėČ


Named Primitives

In the previous post we saw different things: what is a primitive and some examples, their impact on CompiledMethod instances, pragmas, etc. Continuing with this “Journey through the Virtual Machine”, today I will talk about Named Primitives.

Important data from previous post

What is important for this post is a summary of what a primitive is. As we saw, there are methods that can be implemented in such a way that they call a Virtual Machine primitive. To declare the information related to which primitive to use, we use Pragmas. Example of the method #class:

Object >> class
"Primitive. Answer the object which is the receiver's class. Essential. See
Object documentation whatIsAPrimitive."

<primitive: 111>
self primitiveFailed

In this case, the primitive is the number 111. The primitive is implemented in the CORE of the Virtual Machine. This core is written in Slang, a subset of Smalltalk.¬† To see how to map primitive numbers to their implementation we can see the method StackInterpreter >> #initializePrimitiveTable. In this example, for example, we can see it is mapped to the method #primitiveClass. But don’t confuse, this is NOT a regular method. This is part of the VM (the package VMMaker) and that method is automatically translated to C while building the VM.

For more details, please read the previous posts of this blog.

Named Primitives vs. Numbered Primitives

Again, in the previous post, we saw a “weird” method like:

FileDirectory >> primDeleteFileNamed: aFileName
"Delete the file of the given name. Return self if the primitive succeeds, nil otherwise."

    <primitive: 'primitiveFileDelete' module: 'FilePlugin'>
    ^ nil

Which are the differences between this primitive and the previous one (#class)? Well…let’s see:

With “numbered primitives” like #class, those primitives are implemented in the VM core, that is, the code of the primitives is inside Interpreter classes. There is a table kept in the VM that maps numbers to methods which are then translated to C functions. The only thing is needed to know from image side to call a primitive is the primitive number. In addition, these primitives cannot be loaded dynamically and hence, it is not easy to extend the VM with new primitives. If that is desired one need to build a new VM wich such primitive and distribute that VM.

Named primitives are different. They can be written with Slang as well, but they are not part of what I call the “VM core”. The methods that implement those primitives are not part of the Interpreter classes. Instead, they are written in different classes: plugins. What is needed to know from image side to call a named primitive is the name and its module. What is a module? Let’s say that it is the plugin name. Contrary to numbered primitives, named ones can be loaded dynamically and hence, it is easy to extend the VM with new primitives. One can generate the binaries of the plugin and distribute it with the regular VM. Named primitives can reside in an external library (.so on Unix, DLL on Windows, etc).

Named Primitives / Plugins / Pluggable Primitives

So…do they all mean the same?¬† Yes, at least for me, they all represent something similar. For me, named and pluggable primitives are the same concept. And I see a plugin like a set of named/pluggable primitives.

When someone says “this is done with a plugin” or “did you load the plugin”, they refer to that. Even if in a future post we will see how to implement our custom plugin, I will give a small introduction.

Plugins are translated to a different C file, not to the same C file of the VM (result of Interpreter classes translation). In fact, plugins are translated and placed in the directory /src/plugin. Each plugin is implemented in the VM as a subclass of InterpreterPlugin. Just for fun, inspect “InterpreterPlugin allSubclasses”. Usually, a plugin needs functionality provided by the VM core. For this purpose, the class InterpreterPlugin has an instance variable InterpreterProxy, which acts as its name says, as a proxy to the Interpreter (the vm). InterpreterProxy provides only the methods that the VM wants to provide to primitives. Some examples are #fetchInteger:ofObject:, #pop:thenPush:, #superclassOf:, etc….So, plugins can only use those provided methods of the interpreter.

We saw that from the image side, named primitives are implemented using the following pragma: “<primitive: ‘primitiveXXX’ module: ‘YYYPlugin’>”. For example, “<primitive: ‘primitiveFileDelete’ module: ‘FilePlugin’>”. The first parameter is the primitive name, which has to map to the method that implementes such primitive (notice the difference with the table for numbered primitives). So in this case, there must be a method (implemented in Slang) called #primitiveFileDelete. The second parameter is the plugin name. A plugin is rified as a subclass of InterpreterPlugin and the plugin name can be defined by implementing the method #moduleName. If a plugin does not do that then the class name is used by default, as it happens with FilePlugin. So….FilePlugin is a subclass of InterpreterPlugin and implements the method #primitiveFileDelete, which looks like:

primitiveFileDelete

| namePointer nameIndex nameSize  okToDelete |

<export: true>

namePointer := interpreterProxy stackValue: 0.
(interpreterProxy isBytes: namePointer)
ifFalse: [^ interpreterProxy primitiveFail].
nameIndex := interpreterProxy firstIndexableField: namePointer.
nameSize := interpreterProxy byteSizeOf: namePointer.
"If the security plugin can be loaded, use it to check for permission.
If not, assume it's ok"
sCDFfn ~= 0
ifTrue: [okToDelete := self cCode: ' ((sqInt (*)(char *, sqInt))sCDFfn)(nameIndex, nameSize)'.
okToDelete
ifFalse: [^ interpreterProxy primitiveFail]].
self
sqFileDeleteName: nameIndex
Size: nameSize.
interpreterProxy failed
ifFalse: [interpreterProxy pop: 1]

How plugins are compiled with the VM, as well as telling the VM which plugins to compile, is explained in a previous posts such as this one and this one.

Plugins: internal or external?

Plugins can be compiled in two ways: internal or external. Notice that it is just the way they are compiled, but the way they are written is the same: using SLANG. Each plugin is a class subclass of InterpreterPlugin or SmartSyntaxInterpreterPlugin. A plugin can then be compiled in the mentioned ways.

Internal plugins are linked together with the core of the classical VM, that is, the binaries of the plugins are put together with the binary of the VM. So for the final user, there is just one binary representing the VM. External plugins are distributed as separate shared library (a .dll in windows, a .so in Unix, etc). The functions (remember that slang is then translated to C so what we coded as methods will become C functions hahaha) of the shared libraries representing the plugins are accessed using system calls.

Which one to use?¬† Well, that depends on what the developer of the plugin wants. In my case I usually try to build them externally since you don’t need to do anything at all to the VM. It is easier to distribute: just compile the plugin and use it with a regular VM. And from security point of view they are even simpler to eliminate or disable, just removing the binary file.

But not everything is pink in this world. Unfortunately, there are some plugins that cannot be compiled in both ways, but with one in particular. Most existing plugins are optional. Nevertheless, there are some plugins that are mandatory for the core of the VM, that is, the VM cannot run without those plugins. There are lots of available plugins. Which ones are needed? Some plugins only work in certain Operating System. Some only work not even in certain OS but also in a particular version. Plugins may need different compiler flags in different OS. Etc…

To solve the problem of knowing all that, CMakeVMMaker provides an easy way to compile the plugins of a VM. I assume you have been following this “journey” so you read how to compile the VM from scratch in https://marianopeck.wordpress.com/2011/04/10/building-the-vm-from-scratch-using-git-and-cmakevmmaker/ and https://marianopeck.wordpress.com/2011/04/16/building-the-vm-second-part/. So if you installed ConfigurationOfCog, you installed CMakeVMMaker as well. Check for the methods #defaultInternalPlugins and #defaultExternalPlugins. Each CMakeVMMaker configuration class implements those methods correctly. Each of them knows which plugins should be compiled and whether internally or externally. So, the user, someone who wants to build the VM, doesn’t need to worry about that. In addition, CMakeVMMaker let us customize which plugins to use with the method #internalPlugins: and #externalPlugins.

I know, I know.  You want to write and compile your own plugin? Ok, there will be a future post about that. But if you want to try it, check subclasses of InterpreterPlugin or SmartSyntaxInterpreterPlugin  (I recommend the last one since makes a lot of stuff simpler) and then build the VM with something like:

| config |
config := CogUnixConfig new.
config externalPlugins: (config externalPlugins copyWith: #MyHackyFirstPlugin).
config generateWithSources.

Named Primitives and their relation to CompiledMethod

In the previous post we saw that methods that contained a numbered primitive have something special in the CompiledMethod instance: the penultimate literal does not have the Symbol containing the selector but instead an instance of AdditionalMethodState which has a pragma with the primitive information. In the case of numbered primitives we have that, but in addition, there is one more special object in the first literal of the CompiledMethod. That object is an Array that with 4 elements. The first is the plugin name, which is answered by #moduleName (what you put in the module:). The second one is the selector. The third is the session ID which is obsolete, not used anymore, and hence it is usually zero. The last one, is the function index (Integer) in a table that resides in the VM: externalPrimitiveTable. As far as I understood, such table and this index is used as a cache. What is funny is that the VM writes that index in the CompiledMethod instance. For more details, read the method #primitiveExternalCall.

Links

As always, if there are more links or documentation about them please let me know and I will add it.