Tag Archives: Primitives

Named Primitives

In the previous post we saw different things: what is a primitive and some examples, their impact on CompiledMethod instances, pragmas, etc. Continuing with this “Journey through the Virtual Machine”, today I will talk about Named Primitives.

Important data from previous post

What is important for this post is a summary of what a primitive is. As we saw, there are methods that can be implemented in such a way that they call a Virtual Machine primitive. To declare the information related to which primitive to use, we use Pragmas. Example of the method #class:

Object >> class
"Primitive. Answer the object which is the receiver's class. Essential. See
Object documentation whatIsAPrimitive."

<primitive: 111>
self primitiveFailed

In this case, the primitive is the number 111. The primitive is implemented in the CORE of the Virtual Machine. This core is written in Slang, a subset of Smalltalk.  To see how to map primitive numbers to their implementation we can see the method StackInterpreter >> #initializePrimitiveTable. In this example, for example, we can see it is mapped to the method #primitiveClass. But don’t confuse, this is NOT a regular method. This is part of the VM (the package VMMaker) and that method is automatically translated to C while building the VM.

For more details, please read the previous posts of this blog.

Named Primitives vs. Numbered Primitives

Again, in the previous post, we saw a “weird” method like:

FileDirectory >> primDeleteFileNamed: aFileName
"Delete the file of the given name. Return self if the primitive succeeds, nil otherwise."

    <primitive: 'primitiveFileDelete' module: 'FilePlugin'>
    ^ nil

Which are the differences between this primitive and the previous one (#class)? Well…let’s see:

With “numbered primitives” like #class, those primitives are implemented in the VM core, that is, the code of the primitives is inside Interpreter classes. There is a table kept in the VM that maps numbers to methods which are then translated to C functions. The only thing is needed to know from image side to call a primitive is the primitive number. In addition, these primitives cannot be loaded dynamically and hence, it is not easy to extend the VM with new primitives. If that is desired one need to build a new VM wich such primitive and distribute that VM.

Named primitives are different. They can be written with Slang as well, but they are not part of what I call the “VM core”. The methods that implement those primitives are not part of the Interpreter classes. Instead, they are written in different classes: plugins. What is needed to know from image side to call a named primitive is the name and its module. What is a module? Let’s say that it is the plugin name. Contrary to numbered primitives, named ones can be loaded dynamically and hence, it is easy to extend the VM with new primitives. One can generate the binaries of the plugin and distribute it with the regular VM. Named primitives can reside in an external library (.so on Unix, DLL on Windows, etc).

Named Primitives / Plugins / Pluggable Primitives

So…do they all mean the same?  Yes, at least for me, they all represent something similar. For me, named and pluggable primitives are the same concept. And I see a plugin like a set of named/pluggable primitives.

When someone says “this is done with a plugin” or “did you load the plugin”, they refer to that. Even if in a future post we will see how to implement our custom plugin, I will give a small introduction.

Plugins are translated to a different C file, not to the same C file of the VM (result of Interpreter classes translation). In fact, plugins are translated and placed in the directory /src/plugin. Each plugin is implemented in the VM as a subclass of InterpreterPlugin. Just for fun, inspect “InterpreterPlugin allSubclasses”. Usually, a plugin needs functionality provided by the VM core. For this purpose, the class InterpreterPlugin has an instance variable InterpreterProxy, which acts as its name says, as a proxy to the Interpreter (the vm). InterpreterProxy provides only the methods that the VM wants to provide to primitives. Some examples are #fetchInteger:ofObject:, #pop:thenPush:, #superclassOf:, etc….So, plugins can only use those provided methods of the interpreter.

We saw that from the image side, named primitives are implemented using the following pragma: “<primitive: ‘primitiveXXX’ module: ‘YYYPlugin’>”. For example, “<primitive: ‘primitiveFileDelete’ module: ‘FilePlugin’>”. The first parameter is the primitive name, which has to map to the method that implementes such primitive (notice the difference with the table for numbered primitives). So in this case, there must be a method (implemented in Slang) called #primitiveFileDelete. The second parameter is the plugin name. A plugin is rified as a subclass of InterpreterPlugin and the plugin name can be defined by implementing the method #moduleName. If a plugin does not do that then the class name is used by default, as it happens with FilePlugin. So….FilePlugin is a subclass of InterpreterPlugin and implements the method #primitiveFileDelete, which looks like:


| namePointer nameIndex nameSize  okToDelete |

<export: true>

namePointer := interpreterProxy stackValue: 0.
(interpreterProxy isBytes: namePointer)
ifFalse: [^ interpreterProxy primitiveFail].
nameIndex := interpreterProxy firstIndexableField: namePointer.
nameSize := interpreterProxy byteSizeOf: namePointer.
"If the security plugin can be loaded, use it to check for permission.
If not, assume it's ok"
sCDFfn ~= 0
ifTrue: [okToDelete := self cCode: ' ((sqInt (*)(char *, sqInt))sCDFfn)(nameIndex, nameSize)'.
ifFalse: [^ interpreterProxy primitiveFail]].
sqFileDeleteName: nameIndex
Size: nameSize.
interpreterProxy failed
ifFalse: [interpreterProxy pop: 1]

How plugins are compiled with the VM, as well as telling the VM which plugins to compile, is explained in a previous posts such as this one and this one.

Plugins: internal or external?

Plugins can be compiled in two ways: internal or external. Notice that it is just the way they are compiled, but the way they are written is the same: using SLANG. Each plugin is a class subclass of InterpreterPlugin or SmartSyntaxInterpreterPlugin. A plugin can then be compiled in the mentioned ways.

Internal plugins are linked together with the core of the classical VM, that is, the binaries of the plugins are put together with the binary of the VM. So for the final user, there is just one binary representing the VM. External plugins are distributed as separate shared library (a .dll in windows, a .so in Unix, etc). The functions (remember that slang is then translated to C so what we coded as methods will become C functions hahaha) of the shared libraries representing the plugins are accessed using system calls.

Which one to use?  Well, that depends on what the developer of the plugin wants. In my case I usually try to build them externally since you don’t need to do anything at all to the VM. It is easier to distribute: just compile the plugin and use it with a regular VM. And from security point of view they are even simpler to eliminate or disable, just removing the binary file.

But not everything is pink in this world. Unfortunately, there are some plugins that cannot be compiled in both ways, but with one in particular. Most existing plugins are optional. Nevertheless, there are some plugins that are mandatory for the core of the VM, that is, the VM cannot run without those plugins. There are lots of available plugins. Which ones are needed? Some plugins only work in certain Operating System. Some only work not even in certain OS but also in a particular version. Plugins may need different compiler flags in different OS. Etc…

To solve the problem of knowing all that, CMakeVMMaker provides an easy way to compile the plugins of a VM. I assume you have been following this “journey” so you read how to compile the VM from scratch in https://marianopeck.wordpress.com/2011/04/10/building-the-vm-from-scratch-using-git-and-cmakevmmaker/ and https://marianopeck.wordpress.com/2011/04/16/building-the-vm-second-part/. So if you installed ConfigurationOfCog, you installed CMakeVMMaker as well. Check for the methods #defaultInternalPlugins and #defaultExternalPlugins. Each CMakeVMMaker configuration class implements those methods correctly. Each of them knows which plugins should be compiled and whether internally or externally. So, the user, someone who wants to build the VM, doesn’t need to worry about that. In addition, CMakeVMMaker let us customize which plugins to use with the method #internalPlugins: and #externalPlugins.

I know, I know.  You want to write and compile your own plugin? Ok, there will be a future post about that. But if you want to try it, check subclasses of InterpreterPlugin or SmartSyntaxInterpreterPlugin  (I recommend the last one since makes a lot of stuff simpler) and then build the VM with something like:

| config |
config := CogUnixConfig new.
config externalPlugins: (config externalPlugins copyWith: #MyHackyFirstPlugin).
config generateWithSources.

Named Primitives and their relation to CompiledMethod

In the previous post we saw that methods that contained a numbered primitive have something special in the CompiledMethod instance: the penultimate literal does not have the Symbol containing the selector but instead an instance of AdditionalMethodState which has a pragma with the primitive information. In the case of numbered primitives we have that, but in addition, there is one more special object in the first literal of the CompiledMethod. That object is an Array that with 4 elements. The first is the plugin name, which is answered by #moduleName (what you put in the module:). The second one is the selector. The third is the session ID which is obsolete, not used anymore, and hence it is usually zero. The last one, is the function index (Integer) in a table that resides in the VM: externalPrimitiveTable. As far as I understood, such table and this index is used as a cache. What is funny is that the VM writes that index in the CompiledMethod instance. For more details, read the method #primitiveExternalCall.


As always, if there are more links or documentation about them please let me know and I will add it.


Primitives, Pragmas, Literals and their relation to CompiledMethods

What is a primitive?

Do you want to know the answer?  Just do what we always do in Smalltalk: browse code 🙂   Open your image and browse the method #whatIsAPrimitive. You can read the following information there:

“Some messages in the system are responded to primitively. A primitive response is performed directly by the interpreter rather than by evaluating expressions in a method. The methods for these messages indicate the presence of a primitive response by including <primitive: xx> before the first expression in the method.  

Primitives exist for several reasons. Certain basic or ‘primitive’ operations cannot be performed in any other way. Smalltalk without primitives can move values from one variable to another, but cannot add two SmallIntegers together. Many methods for arithmetic and comparison between numbers are primitives. Some primitives allow Smalltalk to communicate with I/O devices such as the disk, the display, and the keyboard.  Some primitives exist only to make the system run faster; each does the same thing as a certain Smalltalk method, and its implementation as a primitive is optional.  

When the Smalltalk interpreter begins to execute a method which specifies a primitive response, it tries to perform the primitive action and to return a result. If the routine in the interpreter for this primitive is successful, it will return a value and the expressions in the method will not be evaluated. If the primitive routine is not successful, the primitive ‘fails’, and the Smalltalk expressions in the method are executed instead. These expressions are evaluated as though the primitive routine had not been called. 

The Smalltalk code that is evaluated when a primitive fails usually anticipates why that primitive might fail. If the primitive is optional, the expressions in the method do exactly what the primitive would have done (See Number @). If the primitive only works on certain classes of arguments, the Smalltalk code tries to coerce the argument or appeals to a superclass to find a more general way of doing the operation (see SmallInteger +). If the primitive is never supposed to fail, the expressions signal an error (see SmallInteger asFloat). 

Each method that specifies a primitive has a comment in it. If the primitive is optional, the comment will say ‘Optional’. An optional primitive that is not implemented always fails, and the Smalltalk expressions do the work instead.  If a primitive is not optional, the comment will say, ‘Essential’. Some methods will have the comment, ‘No Lookup’. See Object >> #howToModifyPrimitives for an explanation of special selectors which are not looked up. 

For the primitives for +, -, *, and bitShift: in SmallInteger, and truncated in Float, the primitive constructs and returns a 16-bit LargePositiveInteger when the result warrants it. Returning 16-bit LargePositiveIntegers from these primitives instead of failing is optional in the same sense that the LargePositiveInteger arithmetic primitives are optional. The comments in the SmallInteger primitives say, ‘Fails if result is not a SmallInteger’, even though the implementor has the option to construct a LargePositiveInteger. For further information on primitives, see the ‘Primitive Methods’ part of the chapter on the formal specification of the interpreter in the Smalltalk book.”

Primitives examples

As we will see later in another post, in the object header of every object (except compact classes) there is a pointer to its class (another object). Hence, accessing to that pointer of the object header has to be done by a primitive:

Object >> class
"Primitive. Answer the object which is the receiver's class. Essential. See
Object documentation whatIsAPrimitive."

<primitive: 111>
self primitiveFailed

We read in the comment of the method #whatIsAPrimitive that what it is after <primitive: XXX> is ONLY called when the primitive fails. In this case, when that happens, the code “self primitiveFailed” will be executed: there is nothing we can do from image side if this primitive fails. Notice that the declaration of <primitive: XXX> has to be first in the method. The only possible thing before that is comments and declare temp variables: it is not possible to write code before that. So, this is not possible:

Object >> class
"Primitive. Answer the object which is the receiver's class. Essential. See
Object documentation whatIsAPrimitive."
Transcript show: '#class was called!!'.
<primitive: 111>
self primitiveFailed

Another example of a primitive:

SmallInteger >> bitOr: arg
"Primitive. Answer an Integer whose bits are the logical OR of the
receiver's bits and those of the argument, arg.
Numbers are interpreted as having 2's-complement representation.
Essential.  See Object documentation whatIsAPrimitive."

<primitive: 15>
self >= 0 ifTrue: [^ arg bitOr: self].
^ arg < 0
ifTrue: [(self bitInvert bitAnd: arg bitInvert) bitInvert]
ifFalse: [(self bitInvert bitClear: arg) bitInvert]

In this case, if the primitive fails, this method tries to resolve its task in Smalltalk code. Sometimes this works and it means that this primitive is for improving performance, but not mandatory (as it is the case with #class).  In other cases, the code after the primitive (written in Smalltalk) will fail for sure if the primitive has already failed. However, such code is put in Smalltalk with documentation purposes. You can imagine what such primitive does (and why it could fail) in the VM side (Slang/C) by looking its possible code in Smalltalk.

Two important literals

When we talked about CompiledMethod and literals I forgot to mention that there are 2 literals in every CompiledMethod that are really important. CompiledMethod can answer to the messages #methodClass (which answers the class where such CompiledMethod is installed) and #selector (which answers the method’s selector). How can both methods be implemented in CompiledMethod if they don’t hold such information?  Ok, they do hold such information as literals. The LAST literal of every CompiledMethod is an Association where the key is the class name and the value the class object. The penultimate literal stores the selector. So if we explore “Date >> #month”:

Literal 3 is the last one and points to the Association and literal 2 is the penultimate and points to the selector.

So you can now understand the methods:

CompiledMethod >> methodClass
"answer the class that I am installed in"
^self numLiterals > 0
ifTrue: [ (self literalAt: self numLiterals) value ]
ifFalse: [ nil ]


CompiledMethod >> selector
"Answer a method's selector.  This is either the penultimate literal,
or, if the method has any properties or pragmas, the selector of
the MethodProperties stored in the penultimate literal."
| penultimateLiteral |
^(penultimateLiteral := self penultimateLiteral) isMethodProperties
ifTrue: [penultimateLiteral selector]
ifFalse: [penultimateLiteral]

Forget for the moment the #isMethodProperties.

Pragmas and CompiledMethods

Now…when we talk about the <primitive: XXX>, what’s that??  it is not a regular message send. How can that be compiled by the Compiler? Ok, these are called “Method tags” and their goal is to store metadata of the method. If you are a java developer, method tags can be “similar” to Java annotations. In Pharo Smalltalk, one implementation of method tags is called “Pragmas”. I won’t discuss the advantages or disadvantages of Pragmas against other method tag implementations, or whether to use pragmas o regular subclassification, etc.

For more information about Pragmas, check the class comment of Pragma class and the tests like PragmaTest, MethodPragmaTest, etc. Nowadays, Pragmas are used in Pharo in several places like the new settings framework, the world menu, Metacello, HelpSystem, etc.

Ok…nice. But how are they really stored in a CompiledMethod? Let’s explore “SmallInteger >> #bitOr:”.

So….as we can see in the explorer, at compiling time the Compiler creates an instance of AdditionalMethodState and such object is placed in the penultimate literal. The class comment of AdditionalMethodState says: “I am class holding state for compiled methods. All my instance variables should be actually part of the CompiledMethod itself, but the current implementation of the VM doesn’t allow this.  Currently I hold the selector and any pragmas or properties the compiled method has.  Pragmas and properties are stored in indexable fields; pragmas as instances of Pragma, properties as instances of Association.”

AdditionalMethodState has two named instance variables: ‘method’ and ‘selector’. No explanation needed here. But since the class format is variable (do you remember them from my old post?) it can also store indexable fields. In this case, pragmas are stored that way. Hence, an instance of Pragma is stored in AdditionalMethodState and that’s what we can see in the explorer. A Pragma instance has 3 instance variables:  ‘method keyword arguments’.

But the AdditionalMethodState instance is put in the penultimate literal and that’s where the “selector” should be found. How can “CompiledMethod >> #selector”  work with them? If we now take again a look to such method (look above), you will see there is a “isMethodProperties ifTrue: [penultimateLiteral selector]”. Of course, AdditionalMethodState answers true to isMethodProperties and hence the selector is asked to itself (which in fact is an instance variable of it).

Primitives and their impact in CompiledMethod

Since primitives uses Pragma, the first effect is to have an AdditionalMethodState in the penultimate literal instead of a selector. The second effect, is that the primitive number is stored in the CompiledMethod header. You can send the message #primitive and get the value. For example, “(SmallInteger >> #bitOr:) primitive” -> 15. If the method has no primitive then zero is answered. Example, (TestCase >> #assert:) primitive -> 0.

When the VM executes a CompiledMethod it checks whether it is a primitive method or not (checking whether the value in the object header is zero or bigger). If it is, the VM searches in a table and dispatches the  primitive associated to the number.

How primitives are map to the VM side?

Continuing with SmallInteger >> #bitOr:, the primitive number is 15. How can we know the code of such primitive in the VM side? Time to open an image with VMMaker (if you don’t know how to do it read the title “Prepared image for you” in this post). The VM keeps a table that maps primitive numbers with selectors implemented in the interpreter class. The most useful advice here is to check the method that initialices such table: #initializePrimitiveTable. So we can take a look:

For our example of #class the primitive number was 111. In such table 111 maps to #primitiveClass. So we can browse its code. Remember that this code is written in SLANG and it is part of the VMMaker package (check my previous posts for details).

| instance |
instance := self stackTop.
self pop: argumentCount+1 thenPush: (objectMemory fetchClassOf: instance)

(SmallInteger >> #bitOr:)  has primitive number 15, which maps to #primitiveBitOr, which code is:

| integerReceiver integerArgument |
integerArgument := self popPos32BitInteger.
integerReceiver := self popPos32BitInteger.
self successful
ifTrue: [self push: (self positive32BitIntegerFor:
(integerReceiver bitOr: integerArgument))]
ifFalse: [self unPop: 2]

So..you have learnt how to map primitive numbers with methods in VM side 🙂   You already know how to do that for primitives and bytecodes now. Congrats!!!

Future explanations

Browse de method #primDeleteFileNamed: and you will see something like:

primDeleteFileNamed: aFileName
"Delete the file of the given name. Return self if the primitive succeeds, nil otherwise."

^ nil

what’s that primitive? where is the number?  Can I create my own primitive? Sure! We will see how to do that in a future post 🙂