Tag Archives: GemStone

Serializing large graphs with SIXX in GemStone

Hi guys,

While I promised to start a series of posts about GemStone itself, today I wanted to write down some notes about something I’ve been doing in the last days.

Serializing object graphs with SIXX

One possible way of moving objects between different GemStone instances or even between GemStone and Pharo, is a serializer. But the serializer must be able to serialize in one dialect and materialize in the other. Such serializer could also be used as backups (besides the GemStone full backups).

For that, one simplest approach we have is SIXX, which is a XML-based serializer. One of the biggest drawbacks of such a serializer is the memory consumed when serializing or materializing large graphs of objects.

In my case I need to serialize/materialize conceptual databases. These are large enough so that SIXX will crash and run out of memory (the classical “VM temporary object memory is full”) . GLASS free version allows 2GB of Shared Page Cache, so the max temporal object space that a VM can hold should be less than that. If your SIXX export/import crashed with an out of memory, this post presents a trick that may help you.

Making SIXX temporal data persistent

This trick (thanks Dale for sharing) actually only works for GemStone, not for Pharo. But still, it’s useful.  When SIXX crashes with an out of memory it’s because SIXX creates a lot of TEMPORAL NON PERISTENT data that cannot fit in memory. Since they are not persistent, those objects cannot go to disk and hence the out of memory.

SIXX’s port to GemStone provides a kind of API to be able to define an array instance which should be persistent (should be reachable from UserGlobals or another SymbolDictionary). Internally, then SIXX stores a few things in such an array, like the stream and some other temp data. Since now this array is persistent it means that everything reachable from that array can go and come back from disk as needed. So…yes, we will have trashing likely (lots of objects moving back and forth from memory to disk), but the process and execution of the export/import should finalize correctly.

There is a little problem with this trick and it’s the fact that if you do not make a GemStone commit, even if the defined array and all what was stored there are “persisted”, they are not ready to go to disk until you commit. Only after you commit, those persisting objects could be moved to disk if more memory is needed.

Writing and reading with UTF8 encoding

Something that I also needed was to be able to write the SIXX resulting XML file with UTF8. Of course, the materialization also should be done reading UTF8. For that, I used Grease port to GemStone.

The code and explanation

Please, take a look to the code. I have added lots of comments so that besides documenting here in the block, I get the documentation also in the code 😉  All problems and solutions I found are explained in the code.

The serialization

Screen Shot 2015-03-05 at 5.17.39 PM

 

The materialization:

Screen Shot 2015-03-05 at 5.19.27 PM

Now running out of SPC???

Well… in my case, when I tried to above code, it still didn’t work. In other words…after making sure I was doing the SIXX export and import in a forked gem, and surrounding the code in a #commitOnAlmostOutOfMemoryDuring: the operation was not yet working. I was getting an error ” FindFreeFrame: potential infinite loop”. From what I read that seems to indicate that the SPC is fully occupy. When you are inside a GemStone transaction and you have created new persistent objects, al those objects must fit in the SPC at the time you do the #commit.

Dale said: “if you were to look at a statmon you would find that GlobalDirtyPages were filling the cache .. the dirty pages due to a transaction in progress (i.e., you are doing a commit and writing the objects from TOC to SPC) cannot be written to disk until the transaction completes … and it cannot complete until it can write all of the dirty objects from the TOC to the SPC …”

OK…now…If you see the #commitOnAlmostOutOfMemoryDuring: the memory it is talking about is the GEM_TEMPOBJ_CACHE_SIZE not the SPC (SHR_PAGE_CACHE_SIZE_KB). Unfortunately, I have other places like SIXX export / import where I do heavy / bulk operations, that I have not yet migrated to the new way of using a temporal persistent root and forked gems. Therefore, for the time being, my GEM_TEMPOBJ_CACHE_SIZE is quite big. To give an example, I could a SPC of 1GB and a 900MB temp space:

SHR_PAGE_CACHE_SIZE_KB=1000000;
GEM_TEMPOBJ_CACHE_SIZE=900000;

Continue with #commitOnAlmostOutOfMemoryDuring: you will see the threshold of the “almost out of memory” is 75%. So…75% of 900MB is 675MB. SPC – 765MB = 325MB. In other words….my SIXX block will commit when my gem temp space arrives to 765MB. Most SIXX temp data should be persistent because we are using that hook to define a persistent root. Therefore, most of that 765MB should be persistent. So…to conclude, what I think it was happening is that I have a really big temp space (close to SPC size) and a high threshold for the commit. Hence, I was filling up the SPC before I was able to commit.

Solutions?

1) Do not use #commitOnAlmostOutOfMemoryDuring but instead split your own code in your own specific commits. For example, in my case I could split the SIXX serialization/materialization in operations where I commit every X numbers of objects. Or whatever. But this is use-case specific.

2) Code a variation of #commitOnAlmostOutOfMemoryDuring: where you pass as an argument a smaller threshold. For example, with a 50% it worked correct for me.

3) Set a smaller GEM_TEMPOBJ_CACHE_SIZE for the forged gems. This should be the better solution because with a TOC that approaches the size of SPC you are always in danger of being able to create more dirty objects than will fit in the SPC and thus will not be able to commit.

For my app, I finally decided to use a GEM_TEMPOBJ_CACHE_SIZE of 75% of SHR_PAGE_CACHE_SIZE_KB. That is still a problem if I have many gems doing large commits…but I need it. And then, I also did the export/import of SIXX above to commit on a threshold of 50%.

Executing this from within Seaside?

If you try to do these exports / imports from Seaside callbacks, you will see it will likely not work, so you will have to invoke them from GemTools, tODE or any other Gemstone client. This is because the used hook to commit #commitOnAlmostOutOfMemoryDuring:  will commit the opened transaction Seaside has. If you don’t use GLASS or you manage transactions yourself then the results could vary.

When Seaside transaction is committed it will make Seaside to redirect you to home page because there is none GemStone transaction opened left. To solve this and other issues that I will try to explain in another post, we can use separate VMs (usually called Service VMs) that take care of that while not affecting the Seaside ones. Thanks Dale and Otto for sharing this too.

Future Work

The next step if I want to stay with SIXX would be to use the XML Pull parser which should use less memory. Another possibility could be to use STON, but I am not sure if it is 100% working in GemStone..or maybe try to port Fuel…I tried once and I let half tests running 🙂

 


What is GemStone? Part 2

A bit on GemStone history

When talking about Smalltalk, one of the advantages always mentioned is its “maturity”. In the previous post, I commented some GemStone features. If you read them carefully, it seems like if we were talking about a modern technology that couldn’t have been possible years ago. Wrong!!!!  GemStone Systems was funded in 1982 and I think the first release was a few years after. Of course, not all the features I commented exist back then, but nobody can discuss it’s history. That means that when using GemStone not only you will be getting the “maturity” of Smalltalk, but also it’s own maturity as an object database.

For a long time, GemStone was owned and developed by GemStone Systems. In 2010, VMWare acquired GemStone Systems. However, later on, in 2013, GemStone and all other Smalltalk products were acquired by a new company called GemTalk Systems. I don’t know all the details (if you want, you can check them online)… but what I think that matters most is the fact that now GemTalk Systems has all GemStone engineers working, it does not depend on higher companies decisions (like VMWare) and is 100% focused in Smalltalk!

If you want to have a look at a general overview of the company and the impact of their products, I recommend the slides of this presentation.

Why GemStone is even more interesting now than before?

Just in case… I will clarify again: in these posts, I always give my opinion. Not everybody should agree with me.

Let’s go back some years ago. At that time, a few things happened:

  1.  Most of the developed apps were fat desktop and GemStone did not have a UI nor an IDE to develop.
  2. There was no good open-source and business friendly Smalltalk (I said it was my opinion!).
  3. GemStone did not have a free license.

The above things meant that someone developing a fat client app would require two Smalltalks: one for the UI and GemStone as the database. That also meant paying two licenses, one for the commercial Smalltalk for the UI and one for GemStone. And that could have been expensive for certain uses. However, things have changed in the recent years:

  1. Most apps are now web based so we do not need a fat UI.
  2. There is a very cool open-source and business friendly Smalltalk: Pharo.
  3. Gemstone does offer a free license with generous limits (in future posts, I will explain better the limits).

That means that you can develop a whole web app in Pharo, put the code in Gemstone and run it from there. And… paying no license (with GemStone free license limits). This is why I think GemStone is even more interesting now than it was ever before.

A bit more about fat client vs web based

When using an app with fat client and GemStone as object database, we actually have two Smalltalk communicating with each other. It is not like “I develop in Smalltalk Whatever and I deploy in GemStone”. No… it is both, Smalltalk Whatever and GemStone running and each of them is communicating to each other. This means there must be some connection or some kind of mapping/adaptor between the two, because both can have some differences in the kernel classes. This kind of software is what GemTalks Systems sells as “GemBuilder”. So we have “GemBuilder for VisualWorks” and “GemBuilder for VisualAge” etc… I have never used these products so I can’t talk much about them.

Just a last comment. Of course, building a GemBuilder for a Smalltalk dialect seems “easy” in the sense that GemStone is also a Smalltalk. But what if there were GemBuilders for other languages so that these can use GemStone as the object database?  Well, there is also a “GemBuilder for Java“. This tell us a little about the internal GemStone architecture (the object repository process is a bit decoupled from the VirtualMachines running the language). But we will see this later.

What do people mean by “Develop in Pharo and deploy in GemStone”

In a very first step, an app could be both developed and deployed in Pharo. That means we use Pharo tools to develop it and we also use Pharo to run our application in production. This may work well enough for small apps or a prototype. But, at some point, we may need more power. As I discussed in the previous post, there are many alternatives. Not all solutions are available for all situations. The solution I am interested in this post is to directly run (deploy) your app in GemStone. Which are the requirements? The app cannot be fat client (in Pharo, this means the app should not be a Morphic app). It could be either a web app or a rest server or whatever form that doesn’t involve a fat UI.

In fact… I guess you could even use GemStone as your backend language and database and provide a REST api answering JSON or whatever to a mobile app (maybe even using Amber??? I don’t know…).

With this alternative, the idea is to develop in Pharo. Then… we simply load our code (using Metacello, Git, Monticello, whatever) into GemStone and we run it there. Hence, “develop in Pharo and deploy in GemStone”. Of course… all the code we have developed in Pharo may not work perfectly in Gemstone or it may behave a little bit different. So some adjustments and work will likely must be done when making the app to work in GemStone besides Pharo. But we will talk about this in a future post.

Most of the times, the app continues to be able to be run by Pharo (besides developing with it). So you can likely continue to develop, run, test and debug your app locally with Pharo. And then periodically you deploy and test it in GemStone.

To sum up

I hope I have clarified a bit the different scenarios when using GemStone and what people mean when they say “develop in Pharo and deploy in GemStone”. All my posts from now onward will take this scenario in mind.

See you soon,


What is GemStone?

What is GemStone

When you ask a Smalltalker what Smalltalk is, you will find many different answers: a language, an environment, an object system, a platform or simply a combination of all of those or more. With GemStone, I have a similar feeling. I think different people will answer differently. To me, GemStone is an object system with two big concepts included: an object database and a language. Others will say that it’s a transactional or persistent Smalltalk, an object database, etc.

Before continuing, let me clarify a few things for this post and all the posts I will write after this one:
– I will not be discussing here relational databases vs object databases vs NoSQL. That’s a whole other discussion that I am not willing to write about right now.
– These posts are aimed mostly for Smalltalkers and GemStone newbies, but not for GemStone experts.

Ok…that being clarify…let’s start. When I refer to an object database, I mean exactly that: an Object Database Management System. Rather than dealing with tables as in relational databases, we can directly persist and query objects. Most of the OODB I have seen in other languages, are kind of an external piece of software that is only a database (just as relational databases are). For the moment, just imagine any of the relational databases you know but storing objects instead. In this case, you still need a language (and probably a VM running that language) for your application logic and you still must communicate to the database to perform the storage and retrieval of objects. I know… that should already be easier than dealing with relational databases but I personally think it could be better.

GemStone goes a step forward. What if the “database” would also include the language to run your application? Sounds cool, doesn’t it? So, this is the second concept GemStone has: it’s also a language implementation in itself. And which language? Smalltalk, of course!!! This means GemStone IS a Smalltalk dialect, just as any other dialect like Pharo, Visual Works, VisualAge, etc. So… GemStone is a Smalltalk dialect but also acts as an object database. You might be thinking “any Smalltalk can act as an object database because we have image persistency”. Fair enough. However, image persistency lacks lots of needed features to be a really scalable database (we will talk about this in other posts).

GemStone analogy to an image-based Smalltalk

As I said, the aim of these posts is to explain GemStone in a way that most readers can get it. And sometimes a good way to do so is by making a comparison to what we already know. So… let’s take an example with Pharo. Say we have one application running in one image. Soon, one image can start to be too little power and we need to scale. Smalltalk is cool and allow us to run the very same image with N number of VMs. Ok… so now we have 10 VMs running our app. Imagine this app needs persistency (as most apps do). If the database is outside Pharo (say a relational DB, NoSQL, etc), then we have no problem since the access to the database from multiple images will be correctly synchronized. But would you be allowed to use image persistency in this scenario? Of course not, because it’s not synchronized among all the VMs. But hell… that would be nice, wouldn’t it?

GemStone offers exactly what I would like: multiple (hundreds) Smalltalk VMs running and sharing the same “image” (repository/database of objects) in a synchronized fashion.

Note, however, that GemStone does NOT have a UI (it is headless) nor development tools (no IDE). So you still need another Smalltalk to develop your app code. And this is why the Pharo / GemStone combination is so great. But I will talk about this in another post.

To sum up

So you are the happiest programmer on the block. Your language has closures (have you ever try to imagine not using closures again???), an amazingly simple syntax, a small learning curve, decades of maturity, serious open-source and free dialects available, etc. Now I tell you can literally run hundreds of Smalltalk VMs all sharing the same repository of objects. But not only that… also imagine not having to write a simple mapping to SQL. Imagine that saving an object in the database is just adding an object to a collection (“clientList add: aClient”) and a query as a normal select (“clientList select: [:each | each age = 42 ]”). BTW… Did someone notice that, apart from selecting via an instance variable (‘age’ in this example), I can also send other domain specific messages? Say…. “clientList select: [:each | each associatedBankPollicy = BankPolicy strict ]”.

Ok, you might still not be convinced. What if I also tell you GemStone supports:

  • Multiple-user database support
  •  Indexes and reduce conflict Collection classes
  • Distributed env (imagine that your hundred VMs can also be running in different nodes!
  • Fault tolerance
  • Security at different levels (even at object level)
  • 64 bits VMs and multi-cpu VMs
  • Free license with generous limits

Ok…too much info for today. As you can note, I am very happy with these technologies so I will try to be objective… but I cannot promise you anything hahaha! I hope I have provoked and intrigued you enough to read on the future posts.

Stay tuned,