Additions to the FAQ
From OpenMosixWiki
If you have new Frequently Asked questions you can post them here if you like, or just post them directly on the FAQ(its part of the Wiki too). Once Answered they then will added to the Official [openMosix FAQ]. (Feel free to contribute the answers if you know them ;-))
'Q.' How reliable is openMosix?
'A.' An openMosix cluster is only as reliable as its "least" reliable node: In particular, memory corruption can be propagated throughout a cluster if processes are migrated to and from an unreliable COTS (Commodity Off The Shelf) PC without ECC (Error Correction Code) memory. If the memory corruption is sufficient to make a migrated process crash, the load on the unreliable node then decreases and more processes are "attracted" to the node from the rest of the cluster by the openMosix load balancing algorithm. Migrated processes that do not crash on the node may also be corrupted if they make use of unreliable memory. When these processes are migrated away from the unreliable node memory corruption is propagated back to the rest of the openMosix cluster. For this reason, it is essential to test the memory of COTS PC's thoroughly BEFORE allowing them to join an openMosix cluster. This can be done using a stand-alone utility e.g. "memtest86" (http://www.memtest86.com/) or under Linux with a user-mode utility e.g. "memtester" (http://pyropus.ca/software/memtester/).
'Q.' Does IDL migrate? Yes, IDL seems to migrate without problems
'Q.' When i run the openMosix Stress test, a lot of the nodes just die. What could this be?
'Q.' Can openMosix work as a HA web serving cluster?
'A.' NO. It is a HPC Environment , if you want a HA alternative go look at http://www.linux-ha.org/
If you want to make your webserver HA look at [[|http://del.icio.us/kbuytaert/checkpointing Linux-ha||http://www.linux-ha.org/]] please note that processes that are started or runnnig on a failing node will die with that node. You might want to look into checkpointing for making sure you don't loose data from longrunning jobs.
(KrisBuytaert: I removed an in this case totally irrelevant part about HA in general)
'Q.' Can an openMosix cluster act as a network server for any TCP based service and how do socket connection resources get migrated? 'A' No it is not suited for HA
'Q.' Why isn't openMosix part of the kernel? Are their any plans to submit it for inclusion in the 2.7 kernel?
'A.' As there are no plans for a 2.7 kernel, there are no plans for inclusion in it. Apart from that we are focussing on stability in our own code rather than integration with a Linus tree, we also think that the initial patch would probably be too beg to ever bee accepted.
'Q.' OK, IA-64 port is complete, but where is it going? 'A.' IA-64 Notes and Comments
'Status:' The openMosix Project has announced the completion of its port to the IA-64 Intel� Itanium� family of processors. 64-bit openMosix is the first native SSI clustering platform released into production for the IA-64.
After almost one year of work, the openMosix cluster has been ported to the 64-bit Itanium platform. We have no intention, however, of continuing support for the Itanium platform, as support from Intel and related companies is extremely low and the performance of the CPU is quite disappointing.
Having openMosix already ported to 64-bit will help with a possible future port to the AMD Opteron� platform.
'Development Notes:' Moshe�s Advanced OS class at Tel Aviv University (35 students) made a great contribution (70 person-months) to the port effort as a semester project. Other universities should consider making a contribution back to OpenSource.
From this and earlier work, Moshe then completed the port. DEMOCRITOS http://www.democritos.it/ was both tester and beta site.
'What Now?' We no longer support IA-64 for a number of reasons, but the 64-bit code is a great starting point for porting to another 64-bit CPU. We are open for discussions with other CPU platform providers.
'Q.' Benchmarks - What are the common benchmarks to demostrate the power of a openMosix cluster? Before the new cluster get really busy with real jobs, I want to run some tasks with comparable results to find out the potential and limit of my cluster? Also I want to learn how well will a openMosix cluster scale by benchmark.
'A.'
- I like to use kernel or package builds as a benchmark. Try building a plain vanilla kernel on a single machine, then add nodes and repeat. To insure SMP builds use the -jx option for make, where x is 1+number of nodes. <mshannon@themattpad.com>
- (About compiling stuff with 'make -j') I said make -j N+1 where N is the number of CPUs in you cluster, not the number of nodes. CPUs + 1 is good, more is no good, simple enough rule. (Moshe Bar on #openMosix irc channel on irc.freenode.net, april 14, 2003)
- (Jose R. Valverde @ EMBnet/CNB) Update: on modern machines I can't make the above
make trick work. Most often C sources are split into small modules, which can be read and compiled very fast (less than a second) with little memory (relative to current memories). On a system with a 2GHz processor and 1GB memory there is little chance for migration unless one can actually load the system. My experience in these systems is that up to make -j4 will run all four processes simultaneously in the local CPU without migration. To exploit the cluster I must run more than that.
'Q.' threads - one cannot help but wonder whether thread migration will get easier to implement with the new posix threads in linux 2.5.x [1] or more difficult. any ideas?
'Q.' Kernel patches - Is openMosix compatible with some important kernel patches like: low latency, preemption, and others ?
'A.'
- I've had success with the USAGI ipsec patches (they patch cleanly) and the LMSensors patches, however pre-emption touches the same intimate parts as openmosix. More work would have to be done to merge the two, if it's even possible. (mshannon@themattpad.com)
- Pre-emtpive kernels are good for desktop systems, but less good for non-interactive servers. So, in that respect I don't think you lose a lot by not having pre-emptive scheduling for openMosix boxes. (Moshe Bar on openMosix-devel)
'Q.' But openMosix is also cool for desktop systems. I would really like to be able to run my CPU hungry analysis stuff while still be able to play with low latency audio apps at the same time. Wouldn't it be possible to run processes setting SCHED_FIFO only on the box itself whereas other processes are allowed to run on other boxen?
'A.' A simple solution is to set the process not migratable or simply run it on your home node. e.g. 'runhome -z command' or just 'mosrun command' which is equivalent to running the command locked on your home node.
'Q.' I'm using Debian and I've installed the openMosix debs. Why isn't it working with PlumpOS or why isn't the autodiscovery daemon working?
'A.' At this time, the openMosix userspace-tools debs which can be found into the unstable repository do have a 'fake' omdiscd as well as showmap so you'll need to grab the latest userspace-tools from openMosix's CVS repository and compile things by hand. Also, actually there are debs only for 2.4.19 kernels so keep that in mind. But despair not, a workaround exist so that you can enjoy autodiscovery and all the latest Openmosix advances on your Debian system : Getting openMosix Tools 0.3 for Debian
'Q.' Is there a way to set up openMosix to where if all the clusters get up to a certain percent CPU and Memory utilization that it will just queu up the processes that need to be run in the cluster?
'A.' Yes, use a batch queue to submit jobs. You can configure a batch queueing system to satisfy those requirements. Jobs can be submitted to any node, and they will be migrated by OpenMosix as needed. Of course, if you are to go this way, then maybe you can do without openMosix at all...
'Q' Can DFSA be enabled on CODA filesystem? Using CODA can bring in some advantages?' 'A.' - Personally I believe Coda is obsolete since it has a limit of 3.3Gb total storage - Instead I use the openafs system, which seems to integrate well.
'Q.' What issues stand in the way of implementing Migratable Sockets? (other than somebody having to do it :)
'Q.' What are the differences between all those kinds of clustering technologies ? (what means "beowulf" ? what is a SSI-cluster ? )
'A.' A beowulf-cluster means generally to have a bunch of nodes which can access each other using passwordless rsh/ssh and which have a MPI implementation installed on them. Those clusters normally have a master-server and those slave nodes and applications running on this cluster are using e.g. the message-passing functions from MPI (e.g. MPICH, LAM) or PVM, ..., for process-communication between the application-processes running on diffrent (remote) system. Only very special applications linked with the used MPI library can be used on those "good old beowulfs".
HA clusters are providing a high availibility for services and/or systems with some kind of "hearbeat monitoring" which noticed errors on the active system, makes it passive and switch the passive to acitve.
Also there are the GRIDS which are using very "decentralized" nodes by consolidation of computer power in wide area networks by the help of grid-services running on the nodes. As on the beowulfs only very special applications can be run on a GRID.
In an SSI (e.g. openMosix) you do not have to care about this because the applications processes "see" the cluster as one huge multi-processor system). It is completly transparent and you do not have to change your application in any way.
.... and sure there can be a mixture of all those kinds of clusters.
'Q.' What are the main differences between openMosix and OpenSSI?
'A.' See the paper [[|http://www.openssi.org/ OpenSSI Linux Cluster Project||http://www.openssi.org/ssi-intro.pdf]] on the OpenSSI site.
Some significant drawbacks are that nodes can not join/leave the OpenSSI cluster except by rebooting and that there is a single root filesystem for all the nodes.
The single root means that if the master node fails, then some other node must be directly attached to the root disk in order to be able to take over. If a node with no direct connection to the root disk loses communication with the root master it will become useless since it can not have a local copy of the root filesystem.
Now, if you want to harness hundreds of other machines, under *Mosix, you just run the start script and, if the machines are there, they will be used, if not your system works as usual. You can join/leave without even noticing.
Under OpenSSI, while there is a node directly connected to the root disk the cluster works, but if you have a user system and lose access to a root node, then you machine will not work at all. Unless you reboot on a local system, and then on the cluster back when the root becomes available again, you can't work.
OTOH, under *Mosix you must maintain a separate root system disk for each machine, multiplying your administrative costs by the number of systems, while under OpenSSI, having only one common root, you only have to do the administration tasks once, hence saving potentially lots of time.
OpenSSI gives a full SSI system. It includes IP address failover (but you can add it to *Mosix using HA-LINUX heartbeat tools, albeit at increased work) and much more transparent distributed system services. It attempts to leverage and pack together all or most relevant clustering approaches in a single solution (including *Mosix load leveling and process migration). They are also aiming for integration of their source in the main Linux tree.
IMHO (Jose R. Valverde @ EMBnet/CNB), it is a more comprehensive solution that may eventually become the distributed clustering method of choice, specially if they remove the direct attachment to the root disk limitation. Meanwhile, if you have many nodes, widely spread, and want them to be able to transparently switch from standalone to cluster mode (e.g. survive a network failure or a traveling laptop), *Mosix is still the best solution around.
'Q.' Does above mean, every software can use the power of my cluster or just, that every software runs without problems?
'A.' None of them. As of yet OpenMosix doesn't support native threads nor shared memory, hence, any application that makes use of these won't be able to take advantage of your cluster and it may even crash (unless you make sure it runs only on your local node).
However, all other applications will benefit from your cluster straight out of the box, with no modifications and nothing special to do on your part, it should all be transparent.
As for shared memory apps, you may want to add the migshm patch: then these applications will run too. There may still be problems with native threads, which won't be migratable, but that's all.
Shortly put: everything will run, but shared memory or native threads applications must be run only on the local node, and hence won't benefit from the cluster.
'Q.' What are the security implications of openMosix on the local network, and what are the recommended ways to secure traffic (IPSeq, ssh, etc.)?
'Q.' What are the security implications on the local machine, and what's the overall semantic model of openMosix? For example, can a normal user on a node see all the processes running on the cluster with the equivalent of "top", or can they only see processes initiated from the current node?
'A.' The latter: only processes initiated in the local node. Process IDs are not unique across the cluster, each process gets the ID from its starting node and is visible only on its starting node.
'Q.' How are resources handled by openMosix (specifically with respect to disk and memory). How does the cluster handle the aggregated RAM of all the machines, and what implications does openMosix have for storage management on a given node? Does disk space have to be "set aside" for openMosix, etc.
'A.' Nodes are independent regarding resources other than computation. The memory is not pooled into a single address space, hence the maximum memory available is not the sum of all the memories. A process can only address the memory available on the node it is running on.
Same happens with disks: each node has its own disk space and manages it independently.
You don't need to set aside space for openMosix (except for the space required by the kernel and user-level programs).
'Q.' A port for the Opteron platform was mentioned in the answer to the IA-64 question. What is the status of this port? Just a ballpark figure would be nice -- is it going to need several hacker-years more work, or is it just around the corner?
'Q.' There is a way to change how long openMosix belive a migration takes?
'A.' Yes there are several parameter that could be changed for a fine handtune of the cluster. These are manageable trough /proc/hpc/admin/overheads:
====# cat /proc/hpc/admin/overheads
Contents |
==
81 150 39 38 0 11 0 18 0 18 0 12 8141 351
Like you can see this /proc entry is writeable by root:
====# ls -al /proc/hpc/admin/overheads%%%
==
-rw-r--r-- 1 root root 0 Jan 20 18:40 /proc/hpc/admin/overheads
The explanation of the values could be retrived with:%%% ====# mosctl gettune
==
OpenMosix Kernel Tuning Parameters (microseconds):
Home-node overhead in processing a demand-page = 81%%%
Remote overhead in processing a demand-page = 150%%%
Home-node overhead in processing a system call = 39%%%
Remote overhead in processing a system call = 38%%%
Basic home-node overhead for reading data = 0%%%
Home-node overhead per 1KB read = 11%%%
Basic remote overhead for reading data = 0%%%
Remote overhead per 1KB read = 18%%%
Basic home-node overhead for writing data = 0%%%
Home-node overhead per 1KB written = 18%%%
Basic remote overhead for writing data = 0%%%
Remote overhead per 1KB written = 12%%%
Migration time of an empty process = 8141%%%
Extra migration time per dirty page = 351%%%
The description are exaclty in the same order of the /proc entry.%%%
To change the time which openMosix believes to take for a process migration from 8.141 sec. to 10 sec. for example, you could do:%%%
====#echo "81 150 39 38 0 11 0 18 0 18 0 12 10000 351" > /proc/hpc/admin/overheads
==
'Note' You could use a sigle white space or a TAB to delimit the values writhing them to the /proc entry.%%% It is possible to tune the oMFS behavior too through /proc/hpc/admin/mfscosts.%%% ====# cat /proc/hpc/admin/mfscosts%%%
==
36 40 14 19 88 56
The explanation of this /proc entry can be retrived in%%% ====/usr/src/linux/include/hpc/mfscosts.h
==
/ mfscosts.h -- MOSIX /%%% / cost of MFS operations (microseconds) /
/ measuring date: Mon Apr 30 20:12:28 IDT 2001 /%%% / between two Pentium-III/1GHz over Ethernet-100 /
====#define MFSCOSTCONN_S 36%%%
==
====#define MFSCOSTCONN_C 40
==
====#define MFSCOSTINKB_S 14%%%
==
====#define MFSCOSTINKB_C 19
==
====#define MFSCOSTOUTKB_S 88%%%
==
====#define MFSCOSTOUTKB_C 56
==
If I undertanded it correcly S is Server and C is client.%%% A dedicated tool to measure this times and automaticaly tune the cluster is under development by openMosix developers.
Alessandro Soraruf.
'Q.' Is there any gain of having OpenMosix cluster in purely desktop environment (i.e. users are runing Gnome, Mozilla and OpenOffice Would any of these processes migrate to other cluster members?
'A.' This can be helpful. Mainly what you will get are the smaller or background programs moving off to make room for the heavylifters in active use to make things feel faster for the user. One big caveat though, If a lot of users are hammering there workstations and one poor guy is just checking email he may very well see a big slowdown of the workstation for no apparent reason, cuasing calls to the helpdesk. The reason will be do to a lot of guest proccess.
'Q.' What command should you use to merge the openmosix patch to the kernel? (no kidding, this really should be here.)
'A.' Enter your linux kernel directory and type bzcat /path/to/patch.bz2 || patch -p1. Be sure to use the right kernel version with the right patch.
'Q.' When migrating Gnome processes (applets, panels, etc..) to non home node, processes die. Why?
'A.' They should not die. Check for other problems on the cluster. If you can not find the cause report it to the list so the developers and troubleshooters know about it and can see what is needed to fix the issue.
'Q.' I'm new to openMosix but have experimented with several other distributed computing packages. Some of the other packages on the market are attempting to tackle issues like security on an unsecured network or system crashes by migrating a single chunk of code to multiple remote systems and comparing the results when they come back. What would be involved in adding similar functionality to openMosix or is it even possible?
'A.' You should have a look at Chaos (http://www.purehacking.com/chaos/) which is a distribution that enhances security within openMosix. It adds an encryption layer on top so you are sure that the results you get back are authentic. No need to multiply the calculation effort. BTW. Could you point Us to tools that do such thiings
'Q.' What will happen if I run /sbin/init under mosix at startup? Will I run the whole system under mosix? and if I modify inittab to run xdm under mosix, like "mosrun xdm" will I parallelize the whole X system?
