The Application Tests

Apache

The MAASK group did their initial tests with apache 1.3.20 on a 100Mbit network, I decided to rerun similar tests but on my 10Mbit network.

In order to test apache, we had to create a page that took long enough to load in order to benefit from migration, such a php script was provided by the MAASK group. The next step then was to run the Apache Bench mark tool

ab -c 5 -n 5 http://10.0.11.222/test.php 

in the different possible configurations.

I ran these tests in 3 different configurations. Once with an normal openMosix 2.4.21-1 kernel, once with the migShm patch enabled, and once just with a vanilla Linux kernel. It was clear that only with the migShm patch the apache processes were allowed to migrate to other nodes, but maybe that other processes could benefit from normal openMosix migration when under heavier load.

[sdog@inspon apache13]$ more migshm*  | grep Request
Requests per second:    0.12 [#/sec] (mean)
Requests per second:    0.16 [#/sec] (mean)
Requests per second:    0.18 [#/sec] (mean)
[sdog@inspon apache13]$ more nomigshm* | grep Request
Requests per second:    0.22 [#/sec] (mean)
Requests per second:    0.22 [#/sec] (mean)
Requests per second:    0.22 [#/sec] (mean)
[sdog@inspon apache13]$ more openmosix* | grep Request
Requests per second:    0.22 [#/sec] (mean)
Requests per second:    0.22 [#/sec] (mean)
Requests per second:    0.22 [#/sec] (mean)

Actually we can do less requests per second with the migShm patch enabled, and the speed does not differ between the plain kernel and the openMosix kernel. These results are different from the results that th MAASK group published. We are currently still looking into the exact details, but my first guess is that given the lower available bandwidth to actually move the shared memory portions around, it takes longer to actually complete the tasks.

Our first conclusion : Apache 1.3.20 migrates on openMosix+migShm, but according to the available bandwidth for migrating memory, it might not increase your performance.

I retried these tests with Apache 2.0.40 my results were slightly different.

  8:00pm  up 18 min,  3 users,  load average: 7.08, 3.53, 1.43
116 processes: 107 sleeping, 9 running, 0 zombie, 0 stopped
CPU states: 70.6% user, 39.1% system,  0.0% nice,  0.0% idle
Mem:   93868K av,  87884K used,   5984K free,      0K shrd,   7828K buff
Swap: 196520K av,    236K used, 196284K free                 55096K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT N# %CPU %MEM   TIME COMMAND
 1009 apache    20   0  5808 5808  5320 R     0  8.6  6.1   0:23 httpd
 1010 apache    20   0  5572 5572  5304 R     0  8.6  5.9   0:23 httpd
 1011 apache    14   0  5576 5576  5308 R     0  8.4  5.9   0:22 httpd
 1014 apache    20   0  5556 5556  5296 R     0  8.1  5.9   0:22 httpd
 1016 apache    20   0  5556 5556  5296 R     0  8.1  5.9   0:23 httpd
 1015 apache    15   0  5556 5556  5296 R     0  7.8  5.9   0:22 httpd
 1079 apache    14   0  5556 5556  5296 R     0  7.7  5.9   0:22 httpd
 1127 apache    20   0  3924 3924  3688 S    2859  1.9  4.1   0:00 httpd
 1125 apache    20   0  3924 3924  3688 S    2859  1.8  4.1   0:00 httpd
 1126 apache    20   0  3924 3924  3688 S    2859  1.8  4.1   0:00 httpd
 1130 apache    15   0  3924 3924  3688 S    2859  1.6  4.1   0:00 httpd
 1122 apache    18   0  3924 3924  3688 S    2859  1.5  4.1   0:01 httpd
As the above mtop screenshot shows the Apache 2.0 processes also migrate, but. When I first start openMosix then apache, everything works. I then start ab
[sdog@inspon MigSHM_openMosix]$ ab -n 5 -c 5 http://10.0.11.44/test.php
> cleansmtest0
[sdog@inspon MigSHM_openMosix]$ ab -n 5 -c 5 http://10.0.11.44/test.php
> cleansmtest1
[sdog@inspon MigSHM_openMosix]$ ab -n 5 -c 5 http://10.0.11.44/test.php
> cleansmtest2
apr_poll: The timeout specified has expired (20507)
Which means that around the 10-15 connection to apache something goes wrong. I see about 6 processes migrated to another node. When I run the tests the first time I have about 18 seconds for the page. In the second run I'm already to 25 seconds.

When I try to stop apache at this moment I get the problem

945(httpd): Failed to Go Back Home.
945(httpd): Failed to Go Back Home.
Even trying to stop Apache fails.
Stopping httpd: [Sun Sep 21 15:00:59 2003] [warn] child process 879
still did not exit, sending a SIGTERM
[Sun Sep 21 15:01:00 2003] [warn] child process 880 still did not exit,
sending a SIGTERM
[root@dhcp89 root]# [Sun Sep 21 15:01:00 2003] [warn] child process 881
still did not exit, sending a SIGTERM
This behavior has been reproduced multiple times in my test setup.

Conclusion 2: Even when applications seem to migrate, sometimes they only work for a limited period.

MySQL

As told before, my initial interest in openMosix was trying to increase the performance of MySQL, it is with this initial disappointment in mind that I kept on following the development of openMosix and related technologies.

Back then I ran in to the problem that all MySQL processes were locked, without actually thinking about the architecture of MySQL I was to eager to try again

Now the msyql_safe process can be migrated back and forth between nodes (tested using moving.sh from the openMosix stress test suite)) and while doing this normal MySQL operations (sql-bench) are not interrupted.

However children still have a clone_vm issue (clone_vm means the application is using threads) Off course MySQL is a fully multi-threaded using kernel threads which are not supported yet by openMosix or migShm But at least a part of MySQL migrates, so there is some minor progress.

PostgreSQL

I was looking for a way to increase the performance of a database. These days PostgreSQL is as popular, or even more popular than MySQL. And PostgreSQL has a number of back-end processes rather than a threading model.

Actually I never tested PostgreSQL migration with the normal openMosix version, but the wiki mentions that it uses shared memory.

While initially I was under the impression that PostgreSQL processes did migrate back and forth over my cluster I was quickly proven wrong. I forgot that I had started PostgreSQL before I had started openMosix. When starting it the opposite way PostgreSQL actually crashed the environment ;(

Actually PostgreSQL uses shared memory but not the system semaphores for locking it. Thus, it does not satisfy migShm constraints and so it cannot benefit from migShm.

Blast

Blast is one of the most popular applications in the Bio-informatics world, lots of people already tried to run Blast on openMosix with mixed success. A normal blast version uses shared memory, however a patched version of Blast exists that doest not use shared memory. People have reported issues with this patched version segfaulting etc.

This mostly happens with the preformatted databases you download from the Internet. If you run formatdb on a raw database these errors tend to go away.

Next to the openMosix blast patch a lot of people run MPIBlast Given the fact that openMosix tends to speed up MPI, adding openMosix to this configuration might even give you more power for your money.

But still people would like to run blast natively on openMosix. Jose Javier Forment Millet was so kind to send me some relevant sequences to run my tests with , with these sequences I formatted a database and ran my initial tests. In order to give blast easy access to it's databases, it is recommended that you start blast with references to the databases pointing to /mfs mountpoints.

I created a small script that started multiple instances of blast and send the output to a file with a timestamp. When running this script multiple times I could witness different blast processes migrating to other nodes. The output of these blast runs was exactly what one would have gotten when running on a single node. During my blast test I did not encounter any segmentation faults in blast itself.

openMosix-migShm complained once with an "Unable to handle kernel NULL pointer dereference at virtual address 00000000" error but continued smoothly. We are currently looking at the cause of this issue.

Conclusion 3: openMosix-migShm enables the migration of Blast, a popular Bio-informatics tool

Others

When asked on the mailing list lots of people suggested to test Mathlab, I will try to do test on Mathlab in the near future, at the moment I did not timely get access to the Mathlab software. However it seems like Mathlab also uses pthreads so probably it won't migrate yet.

As for the applications from whom we know that they use pthreads , I didn't do any test given the fact that I knew in advance that they would fail.