Migrating Shared Memory applications

Migrating applications that use Shared memory was not the easiest part of the openMosix development. The words actually already mention the problem: a process uses memory shared with another process.

When a process that does not use shared memory has to be migrated, openMosix destroys the memory map and the related pages on the home node and recreates them on the remote node.

This can not be done when an application uses Shared memory since other applications are using the same memory regions.

Next to the fact that you can't destroy the memory being used by another program , imagine 2 shared memory processes on the same node, trying to migrate both to a remote node. One could have implemented a solution where both processes created new instances of their required memory regions, hence doubling the memory usage, while one should actually have both these processes share the same memory regions on the same remote node also.

And what should happen when both processes want to be running on different nodes because that would increase their performance ? Double memory usage on different nodes, which is off course less an issue as double memory usage on the same node.

The MAASK group (Maya, Anu, Asmita, Snehal, Krushna, the developers of migShm) opted for an Eager Release consistency model, this means that local copies of modified pages will only be written to the original owner when the lock on that segment is released , hence not requiring this action for every write. This way the owner node of the shared memory always has the latest copy. When another remote process reads this shared memory it page faults from the owner nod to fetch the latest version of the page.

MAASK implemented a solution with a mig_shm_daemon() running on every node , once an invalidate message is sent by the owner to the remote nodes, the page table entries of all remote processes for the pages that were modified are being invalidated. The next time a process tries to access a recently invalidated page , it will page fault to the owner node and get the latest copy from the owner node.

The updating of the pages from the remote node to the owner node is performed by a write-back operation, hence the dirty pages are being sent back to the owner node in order to be restored to their correct position, just before release of the lock on the remote node.

Just as in the original openMosix algorithms one has to know which processes that use shared memory are actually suitable for migration, To facilitate this a module has been written that monitors and logs the access to shared memory.

A weakly linked process is migrated without migrating the memory. For a strongly linked process whether a process with or without the memory is being migrated depends on how strong other processes are linked to it. If the selected process is the most active one then the memory is migrated with the process.

When a process is found to be suitable for migration and the shared memory is migrated along with the process to a remote node, this remote node becomes the owner of the shared memory node.