In the previous post we saw different things: what is a primitive and some examples, their impact on CompiledMethod instances, pragmas, etc. Continuing with this “Journey through the Virtual Machine”, today I will talk about Named Primitives.
Important data from previous post
What is important for this post is a summary of what a primitive is. As we saw, there are methods that can be implemented in such a way that they call a Virtual Machine primitive. To declare the information related to which primitive to use, we use Pragmas. Example of the method #class:
Object >> class
"Primitive. Answer the object which is the receiver's class. Essential. See
Object documentation whatIsAPrimitive."
In this case, the primitive is the number 111. The primitive is implemented in the CORE of the Virtual Machine. This core is written in Slang, a subset of Smalltalk. To see how to map primitive numbers to their implementation we can see the method StackInterpreter >> #initializePrimitiveTable. In this example, for example, we can see it is mapped to the method #primitiveClass. But don’t confuse, this is NOT a regular method. This is part of the VM (the package VMMaker) and that method is automatically translated to C while building the VM.
For more details, please read the previous posts of this blog.
Named Primitives vs. Numbered Primitives
Again, in the previous post, we saw a “weird” method like:
FileDirectory >> primDeleteFileNamed: aFileName
"Delete the file of the given name. Return self if the primitive succeeds, nil otherwise."
<primitive: 'primitiveFileDelete' module: 'FilePlugin'>
Which are the differences between this primitive and the previous one (#class)? Well…let’s see:
With “numbered primitives” like #class, those primitives are implemented in the VM core, that is, the code of the primitives is inside Interpreter classes. There is a table kept in the VM that maps numbers to methods which are then translated to C functions. The only thing is needed to know from image side to call a primitive is the primitive number. In addition, these primitives cannot be loaded dynamically and hence, it is not easy to extend the VM with new primitives. If that is desired one need to build a new VM wich such primitive and distribute that VM.
Named primitives are different. They can be written with Slang as well, but they are not part of what I call the “VM core”. The methods that implement those primitives are not part of the Interpreter classes. Instead, they are written in different classes: plugins. What is needed to know from image side to call a named primitive is the name and its module. What is a module? Let’s say that it is the plugin name. Contrary to numbered primitives, named ones can be loaded dynamically and hence, it is easy to extend the VM with new primitives. One can generate the binaries of the plugin and distribute it with the regular VM. Named primitives can reside in an external library (.so on Unix, DLL on Windows, etc).
Named Primitives / Plugins / Pluggable Primitives
So…do they all mean the same? Yes, at least for me, they all represent something similar. For me, named and pluggable primitives are the same concept. And I see a plugin like a set of named/pluggable primitives.
When someone says “this is done with a plugin” or “did you load the plugin”, they refer to that. Even if in a future post we will see how to implement our custom plugin, I will give a small introduction.
Plugins are translated to a different C file, not to the same C file of the VM (result of Interpreter classes translation). In fact, plugins are translated and placed in the directory /src/plugin. Each plugin is implemented in the VM as a subclass of InterpreterPlugin. Just for fun, inspect “InterpreterPlugin allSubclasses”. Usually, a plugin needs functionality provided by the VM core. For this purpose, the class InterpreterPlugin has an instance variable InterpreterProxy, which acts as its name says, as a proxy to the Interpreter (the vm). InterpreterProxy provides only the methods that the VM wants to provide to primitives. Some examples are #fetchInteger:ofObject:, #pop:thenPush:, #superclassOf:, etc….So, plugins can only use those provided methods of the interpreter.
We saw that from the image side, named primitives are implemented using the following pragma: “<primitive: ‘primitiveXXX’ module: ‘YYYPlugin’>”. For example, “<primitive: ‘primitiveFileDelete’ module: ‘FilePlugin’>”. The first parameter is the primitive name, which has to map to the method that implementes such primitive (notice the difference with the table for numbered primitives). So in this case, there must be a method (implemented in Slang) called #primitiveFileDelete. The second parameter is the plugin name. A plugin is rified as a subclass of InterpreterPlugin and the plugin name can be defined by implementing the method #moduleName. If a plugin does not do that then the class name is used by default, as it happens with FilePlugin. So….FilePlugin is a subclass of InterpreterPlugin and implements the method #primitiveFileDelete, which looks like:
| namePointer nameIndex nameSize okToDelete |
namePointer := interpreterProxy stackValue: 0.
(interpreterProxy isBytes: namePointer)
ifFalse: [^ interpreterProxy primitiveFail].
nameIndex := interpreterProxy firstIndexableField: namePointer.
nameSize := interpreterProxy byteSizeOf: namePointer.
"If the security plugin can be loaded, use it to check for permission.
If not, assume it's ok"
sCDFfn ~= 0
ifTrue: [okToDelete := self cCode: ' ((sqInt (*)(char *, sqInt))sCDFfn)(nameIndex, nameSize)'.
ifFalse: [^ interpreterProxy primitiveFail]].
ifFalse: [interpreterProxy pop: 1]
How plugins are compiled with the VM, as well as telling the VM which plugins to compile, is explained in a previous posts such as this one and this one.
Plugins: internal or external?
Plugins can be compiled in two ways: internal or external. Notice that it is just the way they are compiled, but the way they are written is the same: using SLANG. Each plugin is a class subclass of InterpreterPlugin or SmartSyntaxInterpreterPlugin. A plugin can then be compiled in the mentioned ways.
Internal plugins are linked together with the core of the classical VM, that is, the binaries of the plugins are put together with the binary of the VM. So for the final user, there is just one binary representing the VM. External plugins are distributed as separate shared library (a .dll in windows, a .so in Unix, etc). The functions (remember that slang is then translated to C so what we coded as methods will become C functions hahaha) of the shared libraries representing the plugins are accessed using system calls.
Which one to use? Well, that depends on what the developer of the plugin wants. In my case I usually try to build them externally since you don’t need to do anything at all to the VM. It is easier to distribute: just compile the plugin and use it with a regular VM. And from security point of view they are even simpler to eliminate or disable, just removing the binary file.
But not everything is pink in this world. Unfortunately, there are some plugins that cannot be compiled in both ways, but with one in particular. Most existing plugins are optional. Nevertheless, there are some plugins that are mandatory for the core of the VM, that is, the VM cannot run without those plugins. There are lots of available plugins. Which ones are needed? Some plugins only work in certain Operating System. Some only work not even in certain OS but also in a particular version. Plugins may need different compiler flags in different OS. Etc…
To solve the problem of knowing all that, CMakeVMMaker provides an easy way to compile the plugins of a VM. I assume you have been following this “journey” so you read how to compile the VM from scratch in https://marianopeck.wordpress.com/2011/04/10/building-the-vm-from-scratch-using-git-and-cmakevmmaker/ and https://marianopeck.wordpress.com/2011/04/16/building-the-vm-second-part/. So if you installed ConfigurationOfCog, you installed CMakeVMMaker as well. Check for the methods #defaultInternalPlugins and #defaultExternalPlugins. Each CMakeVMMaker configuration class implements those methods correctly. Each of them knows which plugins should be compiled and whether internally or externally. So, the user, someone who wants to build the VM, doesn’t need to worry about that. In addition, CMakeVMMaker let us customize which plugins to use with the method #internalPlugins: and #externalPlugins.
I know, I know. You want to write and compile your own plugin? Ok, there will be a future post about that. But if you want to try it, check subclasses of InterpreterPlugin or SmartSyntaxInterpreterPlugin (I recommend the last one since makes a lot of stuff simpler) and then build the VM with something like:
| config |
config := CogUnixConfig new.
config externalPlugins: (config externalPlugins copyWith: #MyHackyFirstPlugin).
Named Primitives and their relation to CompiledMethod
In the previous post we saw that methods that contained a numbered primitive have something special in the CompiledMethod instance: the penultimate literal does not have the Symbol containing the selector but instead an instance of AdditionalMethodState which has a pragma with the primitive information. In the case of numbered primitives we have that, but in addition, there is one more special object in the first literal of the CompiledMethod. That object is an Array that with 4 elements. The first is the plugin name, which is answered by #moduleName (what you put in the module:). The second one is the selector. The third is the session ID which is obsolete, not used anymore, and hence it is usually zero. The last one, is the function index (Integer) in a table that resides in the VM: externalPrimitiveTable. As far as I understood, such table and this index is used as a cache. What is funny is that the VM writes that index in the CompiledMethod instance. For more details, read the method #primitiveExternalCall.
As always, if there are more links or documentation about them please let me know and I will add it.