Monthly Archives: October 2011

Memory Addresses and Immediate Objects

Hi. After a couple of months talking about other stuff like Fuel, and presentations in conferences such as ESUG and Smalltalks, I would like now to continue with the “Journey through the Virtual Machine” for beginners. So far I have written the first and second part. Consider this post the first one of the third part.

Direct pointers vs object tables

Let’s say we have this code:

| aPoint |
| aPoint := Point x: 10 y: 20.5.

In this case, aPoint has an instance variable that refers to an integer (10) and a float (20.5). How are these references implemented in the VM?

Most virtual machines have an important part whose responsibility is managing the memory, allocating objects, releasing, etc. In Squeak/Pharo VM, such part is called Object Memory. In addition, the Object Memory defines the internal representation of objects, its references, its location, its object header, etc.  Regarding the references implementation, there are two possibilities which are the most common: object tables and direct pointers.

With the first, there is a large table with two entries. When the object aPoint refers to the float 20.5, it means that the instance variable “y” of aPoint has an index in the table where the memory address of the float 20.5 is located. With direct pointers, when aPoint refers to 20.5, it means that the instance variable “y” of aPoint has directly the memory address of 20.5.

There are pros and cons for each strategy but such discussion is out of range for this post. One of the nice things with object tables is that the primitive #become: is really fast since it is just updating one reference. With direct references, the #become: it needs to scan all the memory do detect all the objects that are pointing to a particular one. On the other hand, with object tables, we have to pay the cost of accessing an extra indirection and (I guess) this may impacts on the overall performance of the system. With direct pointers, we do not have that problem. Finally, object table uses more memory since the table itself needs memory. Few months ago there was a nice discussion in the mailing list about the prons and cons.

First Smalltalk VMs used to have an object table, but now most current VMs (included the Squeak/Pharo VM) use direct pointers. The only current VM I am aware of that uses object tables is GemStone. But… they actually have one (virtual) Object Table (OT) per committed transaction!!  How they can do those optimizations and not blowup in terabytes of memory used by OTs? Well, that’s one of GemStone keys 😉  If you are interested in this topic, you can read this thread.

Memory addresses

In the previous paragraphs you learn that each memory address in the Squeak/Pharo VM represents a direct pointer to another object. Well, that’s almost correct. We are missing what it is usually known as “immediate objects”.  Immediate objects are those that are directly encoded in the memory address and do not require an object header nor slots so they consume less memory. In the CogVM there is only one type of immediate object, and it is SmallInteger. What does it mean?

In our example, the instance variable “x” of aPoint does not have a pointer to an instance of SmallInteger with the content 10. Instead, the memory address of “x” has directly encoded the value 10. So there is no instance of SmallInteger. But now, how the VM can known whether an instance variable is a pointer to another object or a SmallInteger? We need to tag a memory address to say “this is a object pointer” or “this is a SmallInteger”. To do that, the VM uses the last bit of the word (32 bits). If such bit is 1, then it is a signed 31-bits SmallInteger. If it is 0, it is a regular object pointer (oop).

Since I told you SmallInteger were encoded in 31 bits and they were signed, it follows that we have 30 bits for the number (one bit is for the sign). Hence, SmallInteger maxVal should be (2 raisedTo: 30) -1, that is, 1073741823. Analogy, SmallInteger minVal answers -1073741824. Number are encoded using the two’s complement. If you want to know more about this, read the excellent chapter that Stéphane Ducasse wrote about it.

Now, regarding object pointers, they always point to the memory address where the object header is. In our example, the instance variable “y” of aPoint, has the memory address of 20.5‘s object header.

As you can imagine, the VM needs to check all the time whether a OOP is really an OOP or an integer:

ObjectMemory >> isIntegerObject: objectPointer

^ (objectPointer bitAnd: 1) > 0

If you have an image with Cog loaded (as I explained in all my posts about building the VM), you can check for its senders…and you will find quite a lot 😉

Previously, I explain you why SmallInteger instances do not have object headers and those instances do not really exist as “objects”. That’s exactly why “SmallInteger instanceCount” answers zero. Each SmallInteger is encoded in different instance variables of different objects.

Another funny fact is why identity is always true with SmallIntegers. Say you have  ‘1’ asNumber ==  (4-3), that answers true. Because at the end, the VM calls a regular C’s equality (=), which of course, for 2 equal numbers, it is always true. But of course, if those numbers are actually OOP (a number), if they are equal, then it means they both point to the same object:

StackInterpreter >> bytecodePrimEquivalent

| rcvr arg |
rcvr := self internalStackValue: 1.
arg := self internalStackValue: 0.
self booleanCheat: rcvr = arg.

There are more things where you can notice that SmallInteger is special. In fact, you can browse the class and see some methods it overwrites, like #nextInstance (throwing an error), #shallowCopy, #sizeInMemory, etc. And of course, there are more problems like trying to do a become. For example, (42 become: Date new) throws an error saying it cannot become SmallIntegers.

More immediate objects?

As said, in a word of 32 bits, we only use 1 bit for tagging immediate objects (SmallInteger in the case of the squeak VM). We could use more than 1 bit…but then it means we have fewer bits for the OOP, therefore, the maximum possible memory to address is smaller, because the amount of bits of the OOP limits us in how much memory we can address as maximum.

But….what happens in a 64-bits VM?  I think 63 bits can be more than enough  for memory addresses. So what about using fewer bits for OOP and more for immediate objects?  Say we can use 58 for OOP and 6 for tagging immediate objects. In that example, we have (2 raisedTo: 6) – 1 , that is,  63 different possibilities!!!  So we can not only encode SmallIntegers but also small floats, true, false, nil, characters, etc… Is that all?  No! there are even more ideas. We can not only encode instances of certain class, but also give semantics to the possibility of tagging memory addresses. For example..we could use one of the combinations of tag bits to say that memory address is in fact a proxy. It doesn’t need to be an instance of Proxy, but we just give the semantics that when a memory address finishes with that tag bit, it means that the 58 bits for the OOP is not an OOP but a proxy contents. Such content can be a number representing an offset in a table, an address in secondary memory, etc… The VM could then do something different if the object is a proxy!

Well…all that I mention is not new at all. In fact, Gemstone does something very similar. They use 61 bits for address + 3 for tags. Here is a nice set of videos about Gemstone’s internals.  And in this video you can see what we are speaking here.

Documentation and future posts

I always try to put some links together related to each post I talk about:

In the next post, I will give details about the current Object Header.


Interview about Fuel for ClubSmalltalk

Hi guys. ClubSmalltalk is a very nice website which has a lot of information regarding Smalltalk. You can see interviews, posts, jobs offers, etc. There is also a mailing list in Spanish which has been the most active Smalltalk/Spanish mailing list in the last years. Anyway….you can see it all by yourself in the website.

Some time ago, they contacted me to do an interview, mainly because Fuel won the ESUG Awards. The interview also included some questions related to my PhD and what I am doing here in France. So… if you are interested in knowing why we have started Fuel, which was the most difficult part, what is a pickle format, etc, I really recommend you to take a look at it.

The interview is here.

See you

smalltalks2011Attendees + 1

Hi. I think I was pretty clear in this post when I described my feelings about ESUG Conference: “What is really great is to meet people. Have you ever sent 1 million emails to someone without even knowing his face?. Is he 70 years old? 20 ? What language does he speak?  Well, ESUG is the perfect conference to meet people by real, face to face. The best part of ESUG happens in the “corridors”, I mean, talking with people between talks, after the conference, in the social event, etc. There will be people who will ask you about your stuff, they will give you feedback and ideas. You will find yourself wanting to give feedback to others. It is a nice circle.”

Even though the previous paragraph describes pretty good what I think about ESUG, it can also be applied to Smalltalks Conference as well. This conference, which takes place in Argentina, has started in 2007 meaning this is the fifth edition. It’s true that the audience is less international than ESUG but that doesn’t mean that the conference is worst or not worth it. The conference is really international, people come from different countries and, in the last editions, all presentations were in English. The conference is free and the attendees are more or less between 150 and 300 people.

The conference is organized by FAST (Smalltalk Argentinian Foundation in Spanish) with several sponsors including ESUG. Since last year, there is not only a “technical track” of presentations, but also a “research track” (a workshop). This means there is a full program committee conformed by several international researchers. Papers are submitted and there is even a journal associated to the workshop. Believe me that for me, a PhD student and Smalltalker at the same time, this is awesome. Java fanatisim does not only exist in industry 😉

I have submitted a paper which was accepted so I am attending the conference. This is very good news for me because it means that I will be going to all existing Smalltalks conferences so far (from 2007 to 2011)! Well, at the same time I will be visiting my country so… 🙂  The paper is about my current development/investigation about managing unused objects. Soon, I will do a post and put the link to the paper.

Apart from the paper, I have also submitted a talk which was accepted as well: “Building your own Pharo images with Metacello”. It doesn’t make sense to said more because it is already explained in the conference website.

I would like to mention the effort that FAST and Smalltalkers are doing this year in order to attract new attendees to the conference. They are doing special “pre-Smalltalks talks” in 2 different universities where they give introduction courses to Smalltalk and the VM. It’s nice to see friends of mine taking care of that 🙂  So… if you want to improve your Smalltalks skills before Smalltalks or you want someone else to take the courses, plase visit this link: Of course, these courses are free as well.

There is also a Pharo sprint just before Smalltalks. It is on Wednesday 2nd at the Universidad Nacional de Quilmes (the same place where the conference takes place). If you have never attended a Pharo sprint, don’t miss this chance!! It’s the perfect place for everyone who loves Smalltalk from newbies to hackers. You can learn and pair program with really experienced guys of the Pharo board and make a real progress in an open-source project such as Pharo. The link is:

Not enough reasons to come? Well, I may be biased but Argentina is a very nice country with a lot of beautiful places and really friendly people. So, why don’t you come to Smaltalks and take some holidays as well?