| migShm-openMosix Experiences | ||
|---|---|---|
| <<< Previous | Next >>> | |
The MAASK group did their initial tests with apache 1.3.20 on a 100Mbit network, I decided to rerun similar tests but on my 10Mbit network.
In order to test apache, we had to create a page that took long enough to load in order to benefit from migration, such a php script was provided by the MAASK group. The next step then was to run the Apache Bench mark tool
ab -c 5 -n 5 http://10.0.11.222/test.php |
in the different possible configurations.
I ran these tests in 3 different configurations. Once with an normal openMosix 2.4.21-1 kernel, once with the migShm patch enabled, and once just with a vanilla Linux kernel. It was clear that only with the migShm patch the apache processes were allowed to migrate to other nodes, but maybe that other processes could benefit from normal openMosix migration when under heavier load.
[sdog@inspon apache13]$ more migshm* | grep Request Requests per second: 0.12 [#/sec] (mean) Requests per second: 0.16 [#/sec] (mean) Requests per second: 0.18 [#/sec] (mean) [sdog@inspon apache13]$ more nomigshm* | grep Request Requests per second: 0.22 [#/sec] (mean) Requests per second: 0.22 [#/sec] (mean) Requests per second: 0.22 [#/sec] (mean) [sdog@inspon apache13]$ more openmosix* | grep Request Requests per second: 0.22 [#/sec] (mean) Requests per second: 0.22 [#/sec] (mean) Requests per second: 0.22 [#/sec] (mean) |
Actually we can do less requests per second with the migShm patch enabled, and the speed does not differ between the plain kernel and the openMosix kernel. These results are different from the results that th MAASK group published. We are currently still looking into the exact details, but my first guess is that given the lower available bandwidth to actually move the shared memory portions around, it takes longer to actually complete the tasks.
Our first conclusion : Apache 1.3.20 migrates on openMosix+migShm, but according to the available bandwidth for migrating memory, it might not increase your performance.
I retried these tests with Apache 2.0.40 my results were slightly different.
8:00pm up 18 min, 3 users, load average: 7.08, 3.53, 1.43 116 processes: 107 sleeping, 9 running, 0 zombie, 0 stopped CPU states: 70.6% user, 39.1% system, 0.0% nice, 0.0% idle Mem: 93868K av, 87884K used, 5984K free, 0K shrd, 7828K buff Swap: 196520K av, 236K used, 196284K free 55096K cached PID USER PRI NI SIZE RSS SHARE STAT N# %CPU %MEM TIME COMMAND 1009 apache 20 0 5808 5808 5320 R 0 8.6 6.1 0:23 httpd 1010 apache 20 0 5572 5572 5304 R 0 8.6 5.9 0:23 httpd 1011 apache 14 0 5576 5576 5308 R 0 8.4 5.9 0:22 httpd 1014 apache 20 0 5556 5556 5296 R 0 8.1 5.9 0:22 httpd 1016 apache 20 0 5556 5556 5296 R 0 8.1 5.9 0:23 httpd 1015 apache 15 0 5556 5556 5296 R 0 7.8 5.9 0:22 httpd 1079 apache 14 0 5556 5556 5296 R 0 7.7 5.9 0:22 httpd 1127 apache 20 0 3924 3924 3688 S 2859 1.9 4.1 0:00 httpd 1125 apache 20 0 3924 3924 3688 S 2859 1.8 4.1 0:00 httpd 1126 apache 20 0 3924 3924 3688 S 2859 1.8 4.1 0:00 httpd 1130 apache 15 0 3924 3924 3688 S 2859 1.6 4.1 0:00 httpd 1122 apache 18 0 3924 3924 3688 S 2859 1.5 4.1 0:01 httpd |
[sdog@inspon MigSHM_openMosix]$ ab -n 5 -c 5 http://10.0.11.44/test.php > cleansmtest0 [sdog@inspon MigSHM_openMosix]$ ab -n 5 -c 5 http://10.0.11.44/test.php > cleansmtest1 [sdog@inspon MigSHM_openMosix]$ ab -n 5 -c 5 http://10.0.11.44/test.php > cleansmtest2 apr_poll: The timeout specified has expired (20507) |
When I try to stop apache at this moment I get the problem
945(httpd): Failed to Go Back Home. 945(httpd): Failed to Go Back Home. |
Stopping httpd: [Sun Sep 21 15:00:59 2003] [warn] child process 879 still did not exit, sending a SIGTERM [Sun Sep 21 15:01:00 2003] [warn] child process 880 still did not exit, sending a SIGTERM [root@dhcp89 root]# [Sun Sep 21 15:01:00 2003] [warn] child process 881 still did not exit, sending a SIGTERM |
Conclusion 2: Even when applications seem to migrate, sometimes they only work for a limited period.
As told before, my initial interest in openMosix was trying to increase the performance of MySQL, it is with this initial disappointment in mind that I kept on following the development of openMosix and related technologies.
Back then I ran in to the problem that all MySQL processes were locked, without actually thinking about the architecture of MySQL I was to eager to try again
Now the msyql_safe process can be migrated back and forth between nodes (tested using moving.sh from the openMosix stress test suite)) and while doing this normal MySQL operations (sql-bench) are not interrupted.
However children still have a clone_vm issue (clone_vm means the application is using threads) Off course MySQL is a fully multi-threaded using kernel threads which are not supported yet by openMosix or migShm But at least a part of MySQL migrates, so there is some minor progress.
I was looking for a way to increase the performance of a database. These days PostgreSQL is as popular, or even more popular than MySQL. And PostgreSQL has a number of back-end processes rather than a threading model.
Actually I never tested PostgreSQL migration with the normal openMosix version, but the wiki mentions that it uses shared memory.
While initially I was under the impression that PostgreSQL processes did migrate back and forth over my cluster I was quickly proven wrong. I forgot that I had started PostgreSQL before I had started openMosix. When starting it the opposite way PostgreSQL actually crashed the environment ;(
Actually PostgreSQL uses shared memory but not the system semaphores for locking it. Thus, it does not satisfy migShm constraints and so it cannot benefit from migShm.
Blast is one of the most popular applications in the Bio-informatics world, lots of people already tried to run Blast on openMosix with mixed success. A normal blast version uses shared memory, however a patched version of Blast exists that doest not use shared memory. People have reported issues with this patched version segfaulting etc.
This mostly happens with the preformatted databases you download from the Internet. If you run formatdb on a raw database these errors tend to go away.
Next to the openMosix blast patch a lot of people run MPIBlast Given the fact that openMosix tends to speed up MPI, adding openMosix to this configuration might even give you more power for your money.
But still people would like to run blast natively on openMosix. Jose Javier Forment Millet was so kind to send me some relevant sequences to run my tests with , with these sequences I formatted a database and ran my initial tests. In order to give blast easy access to it's databases, it is recommended that you start blast with references to the databases pointing to /mfs mountpoints.
I created a small script that started multiple instances of blast and send the output to a file with a timestamp. When running this script multiple times I could witness different blast processes migrating to other nodes. The output of these blast runs was exactly what one would have gotten when running on a single node. During my blast test I did not encounter any segmentation faults in blast itself.
openMosix-migShm complained once with an "Unable to handle kernel NULL pointer dereference at virtual address 00000000" error but continued smoothly. We are currently looking at the cause of this issue.
Conclusion 3: openMosix-migShm enables the migration of Blast, a popular Bio-informatics tool
When asked on the mailing list lots of people suggested to test Mathlab, I will try to do test on Mathlab in the near future, at the moment I did not timely get access to the Mathlab software. However it seems like Mathlab also uses pthreads so probably it won't migrate yet.
As for the applications from whom we know that they use pthreads , I didn't do any test given the fact that I knew in advance that they would fail.
| <<< Previous | Home | Next >>> |
| The migShm StressTest | Conclusions |