Serializing large graphs with SIXX in GemStone

Hi guys,

While I promised to start a series of posts about GemStone itself, today I wanted to write down some notes about something I’ve been doing in the last days.

Serializing object graphs with SIXX

One possible way of moving objects between different GemStone instances or even between GemStone and Pharo, is a serializer. But the serializer must be able to serialize in one dialect and materialize in the other. Such serializer could also be used as backups (besides the GemStone full backups).

For that, one simplest approach we have is SIXX, which is a XML-based serializer. One of the biggest drawbacks of such a serializer is the memory consumed when serializing or materializing large graphs of objects.

In my case I need to serialize/materialize conceptual databases. These are large enough so that SIXX will crash and run out of memory (the classical “VM temporary object memory is full”) . GLASS free version allows 2GB of Shared Page Cache, so the max temporal object space that a VM can hold should be less than that. If your SIXX export/import crashed with an out of memory, this post presents a trick that may help you.

Making SIXX temporal data persistent

This trick (thanks Dale for sharing) actually only works for GemStone, not for Pharo. But still, it’s useful.  When SIXX crashes with an out of memory it’s because SIXX creates a lot of TEMPORAL NON PERISTENT data that cannot fit in memory. Since they are not persistent, those objects cannot go to disk and hence the out of memory.

SIXX’s port to GemStone provides a kind of API to be able to define an array instance which should be persistent (should be reachable from UserGlobals or another SymbolDictionary). Internally, then SIXX stores a few things in such an array, like the stream and some other temp data. Since now this array is persistent it means that everything reachable from that array can go and come back from disk as needed. So…yes, we will have trashing likely (lots of objects moving back and forth from memory to disk), but the process and execution of the export/import should finalize correctly.

There is a little problem with this trick and it’s the fact that if you do not make a GemStone commit, even if the defined array and all what was stored there are “persisted”, they are not ready to go to disk until you commit. Only after you commit, those persisting objects could be moved to disk if more memory is needed.

Writing and reading with UTF8 encoding

Something that I also needed was to be able to write the SIXX resulting XML file with UTF8. Of course, the materialization also should be done reading UTF8. For that, I used Grease port to GemStone.

The code and explanation

Please, take a look to the code. I have added lots of comments so that besides documenting here in the block, I get the documentation also in the code 😉  All problems and solutions I found are explained in the code.

The serialization

Screen Shot 2015-03-05 at 5.17.39 PM

 

The materialization:

Screen Shot 2015-03-05 at 5.19.27 PM

Now running out of SPC???

Well… in my case, when I tried to above code, it still didn’t work. In other words…after making sure I was doing the SIXX export and import in a forked gem, and surrounding the code in a #commitOnAlmostOutOfMemoryDuring: the operation was not yet working. I was getting an error ” FindFreeFrame: potential infinite loop”. From what I read that seems to indicate that the SPC is fully occupy. When you are inside a GemStone transaction and you have created new persistent objects, al those objects must fit in the SPC at the time you do the #commit.

Dale said: “if you were to look at a statmon you would find that GlobalDirtyPages were filling the cache .. the dirty pages due to a transaction in progress (i.e., you are doing a commit and writing the objects from TOC to SPC) cannot be written to disk until the transaction completes … and it cannot complete until it can write all of the dirty objects from the TOC to the SPC …”

OK…now…If you see the #commitOnAlmostOutOfMemoryDuring: the memory it is talking about is the GEM_TEMPOBJ_CACHE_SIZE not the SPC (SHR_PAGE_CACHE_SIZE_KB). Unfortunately, I have other places like SIXX export / import where I do heavy / bulk operations, that I have not yet migrated to the new way of using a temporal persistent root and forked gems. Therefore, for the time being, my GEM_TEMPOBJ_CACHE_SIZE is quite big. To give an example, I could a SPC of 1GB and a 900MB temp space:

SHR_PAGE_CACHE_SIZE_KB=1000000;
GEM_TEMPOBJ_CACHE_SIZE=900000;

Continue with #commitOnAlmostOutOfMemoryDuring: you will see the threshold of the “almost out of memory” is 75%. So…75% of 900MB is 675MB. SPC – 765MB = 325MB. In other words….my SIXX block will commit when my gem temp space arrives to 765MB. Most SIXX temp data should be persistent because we are using that hook to define a persistent root. Therefore, most of that 765MB should be persistent. So…to conclude, what I think it was happening is that I have a really big temp space (close to SPC size) and a high threshold for the commit. Hence, I was filling up the SPC before I was able to commit.

Solutions?

1) Do not use #commitOnAlmostOutOfMemoryDuring but instead split your own code in your own specific commits. For example, in my case I could split the SIXX serialization/materialization in operations where I commit every X numbers of objects. Or whatever. But this is use-case specific.

2) Code a variation of #commitOnAlmostOutOfMemoryDuring: where you pass as an argument a smaller threshold. For example, with a 50% it worked correct for me.

3) Set a smaller GEM_TEMPOBJ_CACHE_SIZE for the forged gems. This should be the better solution because with a TOC that approaches the size of SPC you are always in danger of being able to create more dirty objects than will fit in the SPC and thus will not be able to commit.

For my app, I finally decided to use a GEM_TEMPOBJ_CACHE_SIZE of 75% of SHR_PAGE_CACHE_SIZE_KB. That is still a problem if I have many gems doing large commits…but I need it. And then, I also did the export/import of SIXX above to commit on a threshold of 50%.

Executing this from within Seaside?

If you try to do these exports / imports from Seaside callbacks, you will see it will likely not work, so you will have to invoke them from GemTools, tODE or any other Gemstone client. This is because the used hook to commit #commitOnAlmostOutOfMemoryDuring:  will commit the opened transaction Seaside has. If you don’t use GLASS or you manage transactions yourself then the results could vary.

When Seaside transaction is committed it will make Seaside to redirect you to home page because there is none GemStone transaction opened left. To solve this and other issues that I will try to explain in another post, we can use separate VMs (usually called Service VMs) that take care of that while not affecting the Seaside ones. Thanks Dale and Otto for sharing this too.

Future Work

The next step if I want to stay with SIXX would be to use the XML Pull parser which should use less memory. Another possibility could be to use STON, but I am not sure if it is 100% working in GemStone..or maybe try to port Fuel…I tried once and I let half tests running 🙂

 


One thought on “Serializing large graphs with SIXX in GemStone

Leave a Reply