FAQ

From OpenMosixWiki

Jump to: navigation, search

Contents

openMosix FAQ


'What is openMosix?'

The openMosix system is a Linux kernel extension for single-image clustering. It extends the outstanding MOSIX project, but is instead licensed under the [[|http://www.gnu.org/copyleft/gpl.html#TOC1 GNU General Public License]] (GPL) ..


'What does the term single-image clustering mean?'

There are many varieties of clusters, and a single-image cluster has multiple copies of a single operating system kernel.


'What is openMosix useful for?'

openMosix allows you to join together multiple computers running the Linux operating system, and have them appear to the user as one large multiple-processor computer. For example, suppose you had two computers, A and B joined in an openMosix cluster. Without openMosix, if you ran two programs on A they would only get 50% of the CPU time each. With openMosix, one of the programs could migrate 'automagically' to B, so both processes would run at 100% CPU. As far as the user is concerned, A now behaves like a two-CPU SMP computer with twice the CPU power available.


'What is openMosix not useful for?'

openMosix lets a cluster of computers behave like one big multi-processor computer. However, it doesn't automatically parallelise programs. Each individual process only runs on one computer at a time. For example, if your computer could convert a WAV to a MP3 in a minute, then buying another nine computers and joining them in a ten-node openMosix cluster would 'NOT' let you convert a WAV in six seconds. However, what it 'would' allow you to do is convert 10 WAVs simultanously. Each one would take a minute, but since you can do lots in parallel you'd get through your CD collection much faster :).

If what you need to do is take a single process and parallelise it across multiple machines, then openMosix is probably not the technology you're looking for.


'Does openMosix have a homepage?'

Yes. It is at [[1]]. The Source Forge project page is at www.openMosix.org.


'Are there any mailing lists for openMosix?'

Yes. There are three:

  1. For general discussion, use [[2]], whose general information page is at [3].
  2. For developers, use [[4]], whose general information page is at [5].
  3. [Italian Language openMosix Mailing List hosted by Democritos] (the INFM National Simulation Center in Trieste)

'Can I contribute to the openMosix project?'

Yes. The openMosix effort already has more than 10 contributors. Unlike the Linux kernel maintenance system, Moshe Bar appoints official maintainers and then gives these maintainers the commit bit to the openMosix CVS source tree, similarly to FreeBSD.

Right now we are looking for more experienced kernel hackers to work on new features like checkpoint/restart.

Write to [at qlusters.com|moshe at qlusters.com|] if you would like to become an openMosix developer.


'Who is the copyright holder of openMosix?'

All MOSIX code is copyright by Professor Amnon Barak of Hebrew University of Jerusalem. All openMosix code is copyright by Moshe Bar, Tel Aviv. The openMosix system does not contain any non-GPL (i.e. MOSIX) code.


'Is openMosix a fork of MOSIX?'

Originally, openMosix was a fork of MOSIX, but it has evolved into an advanced clustering platform. The openMosix system no longer contains any non-GPL (i.e. MOSIX) code.

Compared to MOSIX, a number of features were added:

  • A port to the UML (User-mode Linux) architecture
  • New and cleaner migration code
  • A better load balancer
  • Much reduced kernel latencies
  • Support for Dolphin and IA64
  • A greatly simplified installation processes that uses RPM packaging
  • A wealth of documentation

'Why did openMosix split from the MOSIX group?'

The principal issue was that MOSIX was not licensed with an [Open Source] license.

%%%

Getting, building, installing and running openMosix


'Where do I get openMosix?':

The RPMs and source for openMosix are available from our [[|http://openmosix.sourceforge.net/#LatestRelease Downloads/Files Section]]. Please read the release notes first.

Also, [[|http://www.debian.org/distrib/packages Gentoo Linux' emerge sys-apps/openmosix-user||http://howto.x-tend.be/openMosixWiki/index.php/Install%20openMosix%20on%20Gentoo%20Linux]] and Debian GNU/Linux openMosix packages are available.

Note that openMosix is currently only available for 2.4 series kernels: a 2.6 series openMosix kernel is currently in development but is not ready for production use.


'Can I mix MOSIX and openMosix nodes in the same cluster?'

No. Just like the older MOSIX, you should not mix nodes because the protocols are subject to unannounced changes from version to version. In addition, every new version has bug fixes which warrant updating to the new kernels.


'How do I build openMosix?'

  1. Start by unpacking both the Linux kernel sources and the corresponding openMosix distribution in a directory, say /usr/src.
  2. Then
     $ cd /usr/src
     tar xzf linux-2.x.xx.tar.gz
     gunzip openMosix2.x.xx.gz
  1. Apply the openMosix patches to the pristine Linux kernel sources with
     $ patch -p1 <openMosix2.x.xx-x

The directory /usr/src/linux-2.x.xx now contains the 2.x.xx kernel sources with the openMosix patches. Compile and install the resulting kernel as usual.


'Why do I get `causelinkerrorbyroutinethatdoesnotexist' when I am building a kernel from source?' I applied the openMosix-2.4.xx-x patch to a "vanilla" kernel source, but the build aborts with undefined references.

Check to be sure you are using gcc 2.95.3 or RH 2.96 and not gcc3.x. The Red Hat gcc 2.96 compiler is gcc 2.95 + RH patches. In this case, you should ensure you use gcc-2.96-74 or later. In addition, please pay attention to compiler optimization. Anything greater than -O2 may not be wise. Similarly, if you choose to use gcc-2.95.x or derivatives, be sure not to use -fstrict-aliasing (which, depending on your version of gcc 2.95.x, may necessitate using -fno-strict-aliasing).


'My rpm installed nodes work great, but my compiled from source node can not migrate processes.' When I try to do this, in my /var/log/kernel.log, I found "Migration request denied. Cannot mix DFSA and NON-DFSA kernels."

"cat /proc/hpc/admin/version" will tell you if DFSA is enabeled.


'Can Open Mosix use IEEE 1394 connections (aka Firewire) to improve its performance?'

Yes. see: [Fire Wire Clustering]


'What are userland tools?'

Userland tools are a collection of administrative tools used to examine and control an openMosix node.


'/proc/mosix /proc/hpc ... I`m getting confused.'

In openMosix 2.4.16 the /proc interface was /proc/mosix/; in openMosix 2.4.17 it changed to /proc/hpc/.


'What happens to a job if its node fails?'

If a node crashes all jobs started from that node and all jobs which are currently running on that node due to migration are lost, just as on any ordinary computer.

If you halt a node gracefully (halt, shutdown, or poweroff commands), remote jobs will be sent back to the node where they were initiated (they may then migrate to other running nodes), but obviously all locally initiated jobs should be ended before issuing the halt. (Disconnecting the power on a running machine is never a good idea.)


%%%

Kernel Questions


'What kernel versions does openMosix support?'

The latest Linux kernel supported is 2.4.26. Later versions of the 2.4 series will be supported, as will kernel versions in the 2.6 series.

The 2.6 series kernel is almost finished, but the user tools are still being worked on. Testers for this kernel are very welcome, but at this time the 2.6 series kernel is not ready for production use.


'I'm trying to compile an openMosix-patched kernel. What compiler version should I use?'

You should use gcc-2.95.3 as this is the recommended compiler for 2.4 kernels. This is a Linux kernel requirement, not just an openMosix requirement. However, nothing precludes you from having, on the same system, gcc-2.95.3 for kernel compiles and gcc-3.x for non-kernel compiles.

Additional notes: There are many kernel-related issues with gcc-3.x compilers. Inlining, optimization and page alignment do strange things to operating systems kernels. The standard Linux kernel is only guaranteed to compile and work properly with gcc 2.95.3.

However, the Red Hat gcc 2.96 compiler is 2.95 + RH patches. In this case, you should ensure you use gcc-2.96-74 or later. gcc-2.96-54 will not build the kernel correctly. In addition, please pay attention to compiler optimization. Anything greater than -O2 may not be wise. Similarly, if you choose to use gcc-2.95.x or derivatives, be sure not to use -fstrict-aliasing (which, depending on your version of gcc 2.95.x, may necessitate using -fno-strict-aliasing).


'I've compiled the kernel from the sources. How do I add it to the bootloader (LILO, GRUB, other)?'

Treat an openMosix kernel just like any other kernel. The openMosix system is simply an extension to the kernel, and will be treated like a standard kernel by your bootloader.


'I installed a Linux distribution and it says that its kernel is x.x.x-x. The openMosix README says not to mix kernel versions. Does that mean that the openmosix-x.x.x-y RPM will not work on my machine?'

No. It means is that if you install openMosix on your cluster, all your machines should have the openmosix-x.x.x-y kernel installed. You should not mix kernels which have different kernel versions, i.e. do not mix openmosix-x.x.z-x, and openmosix-x.x.x-y, etc.


'What does the phrase the same kernel on every machine mean? Does it mean the same kernel version, or the same kernel image?'

It means the same kernel version. You can build different kernel images of the same source version to meet the hardware/software needs of a given node.


'Which ports does openMosix use?'

According to Moshe, "openMosix uses only UDP protocol ports in the address range of 5000-5700."

From the souce code :

in linux/include/linux/mfs_socket.h define MFSMAINPORT 0xD302 This port is in "network byte order" so its hex value is 02D3 and its decimal value is 723 Protocol used TCP

in linux/hpc/comm.c define MIGDAEMONPORT 0x3412 decimal value 4660 Protocol used TCP

define INFODAEMONPORT 0x3415 decimal value 5428 Protocol used UDP


'Should Hyperthreading be disabled?'

Disabling Hyperthreading (hyper threading, hyperthreading, HT) will improve openMosix performance in most cases with the current Linux scheduler. This can be done in the BIOS, or using the 'noht' option when booting the kernel.

%%%

Linux Terminal Server Project (LTSP) and openMosix


'Will openMosix slow things down for the LTSP client users in X Windows if all the machines are clustered?'

If it does, it won't be by much. Special care was put into making sure that overhead, network or otherwise, does not increase as you add nodes. In other words, you have the same overhead with 2 nodes or 2000 nodes. Network traffic is in all cases, i.e. worst case, limited to no more then 2% of bandwidth. Independent tests and benchmarks by research institutions confirm this.


'What kind of impact will Client A see if the LTSP server migrates a process that it is running for Client A to Client B, and Client B suddenly drops off the network?'

In the LTSP+openMosix How-To I maintain, the LTSP clients do not migrate their local processes (basically Xwindows, which is all they really run). All processes originate from the server.:

There is a basic difference between hardware failure and shutdown. In case a computer fails, then obviously everything goes down with it, too, but that is to be expected and not different on non-openMosix machines. In case client B shutdown down all foreign processes will be migrated away again and things keep running normally.


'What benefits will I derive from implementing openMosix?'

With openMosix you will save the costs of buying an expensive SMP machine by distributing processes among available cluster nodes transparently to the user and the applications. Also, you can make all the PCs in your cluster, including the clients, work together like one single, giant computer. The user or system administrator need not intervene; the cluster auto-balances itself constantly in the background.


%%%

File Systems


'Can somebody explain to me the difference bewteen MFS and DFSA, and why I would need DFSA?'

DFSA stands for Direct File System Access and is an optimization. It allows remote proccesses to perform some file system system calls locally rather than sending them to their home node. MFS stands for Mosix File System and allows all nodes access to all node filesystems. DFSA runs on top of a cluster filesystem, in this case MFS.


'What's oMFS, how do I use, and where do I get it?'

NOTICE: oMFS is being removed from openMosix as of the 2.4.26 kernel patches.

The openMosix File System (oMFS) is the filesystem used by openMosix kernels. You get it by installing an openMosix kernel on the nodes of your cluster with oMFS enabled in the kernel-config. (It should be enabled in the openMosix RPMs by default.)

You should also enable Direct Filesystem Access (DFSA) which allows a migrated process to execute many syscalls on the remote node locally without the need to migrate it back to its home node.

The use and administration of oMFS is very similar to NFS, but unlike NFS, oMFS features:

  • Cache consistency
  • Timestamp consistency
  • Link consistency

The DFSA layer on top of oMFS makes sure to move the process to the data, instead of vice versa, whenever it makes sense.

The use of oMFS is now deprecated. It is going to be replaced with an implementation of GFS. (question: redhat Global File System, Google File System, or what?)

Please read more about oMFS and how to use it in the [openMosix-HOWTO]. %%%

Programming openMosix


'Generally, how do I write an openMosix-aware program?':

Write your programs as you normally would. Any processes that you spawn are candidates for migration to another node.

Note that openMosix can't (as of yet) migrate threaded programs. If you want a single task to run on multiple machines simultaneously, you'll have to use fork() to create multiple processes. For example, to parallelise a raytracing animation with Povray, you can set up a script which renders frame 1 in one Pov instance, frame 2 in another and so on. The separate povray processes will migrate and load-balance automatically on your cluster.

If your task is of such a nature that multiple forked processes aren't practical, you'll need to look into message-passing systems (MPI and so on). This is much more complicated: google for 'MPI' and 'beowulf cluster' to get started.


'Can I write openMosix programs in perl?':

Yes. Use the Parallel::ForkManager available from CPAN or directly from [6].


'I'm having problems with applications compiled with the Intel C/Fortran compilers?':

There are three gotchas here. Firstly, some versions of the Intel compilers produce executables which memory map /dev/zero. openMosix can't migrate processes which memory map devices, so executables compiled on these versions cannot migrate (check /proc/<pid>/cantmove to see if this is the case). You'll need to upgrade and/or complain to Intel in this case.

Secondly, don't compile executables with the -ax compiler flags. This flag produces multiple versions of subroutines, each optimised for a specific set of processor features. Although the executable will work fine on a range of architectures, trouble occurs when the executable is migrated from a machine with one type of CPU (A) to another with a different CPU (B). If it's running a routine which uses CPU features which A has and B doesn't it will crash. Instead of using -ax, use -x with the flags appropriate to your lowest-common-denominator machine in the cluster.

Lastly, even if you don't use the -ax compiler flags it appears that the Intel math library was compiled with these flags. As a result your program may randomly crash if you have a mixed-architecture cluster anyway: there's no (easy) way to get the Intel compiler not to link in libimf.

Starting each job on a node with the lowest-common-denominator hardware may work, as it looks like libimf does the processor-type-check only when a math library routine is called for the first time. This workaround isn't foolproof, however, as the process may be migrated before this happens. One way to lessen the likelihood is to put a spurious call to expf() (or similar) right at the beginning of your program, so that the processor-flags checking and initialisation happens as close to program startup as possible.

Newer versions of the oM kernel handle illegal instruction exceptions more gracefully (they bounce the process to its home node rather than killing it), so it might be worth checking this and upgrading if you're still having problems.

Contributed by [Mackey|].

%%%

Resources


'Where can I find out technical details about openMosix?':


'What other resources are available?':

%%%


Miscellaneous

'I don't see all my nodes. What's happening?'

When you run 'mosmon', press 't' to see the total number of machines running. Does it warn you that openMosix is not running?

If it does, then make sure your machine's IP address is included in /etc/openmosix.map. Don't use 127.0.0.1. If you do, you will probably have problems with your DHCP server or your DNS nameserver.

If it does not, then see what machines show up. Do you see only your machine?

If yes, then your machine is most likely running a firewall and is not letting openMosix through.

If not, then the problem is most likely with the machine that doesn't show up.

If you "see only your machine" = not, then "the problem is most likely with the machine that doesn't show up"? This is contradicting itself. What if the hosts show up, but mosmon still complains?

Also: Do you have two NIC cards on a node? If so, you have to edit /etc/hosts to have a line that has the following format

     <non-cluster ip> <cluster-hostname>.<cluster-domain> <cluster-hostname>

You might also need to set up a routing table, which is an entirely different subject. See [[|http://www.yolinux.com/TUTORIALS/LinuxTutorialNetworking.html#ADDNIC here]] for more information.

Maybe you used different kernel-parameters on each machine? Especially if you use the 'Support clusters with a complex network topology' option you should take care that you use the same value for the also appearing option 'Maximum network-topology complexity support' on each machine.


'What's the difference between /etc/mosix.map, /etc/hpc.map, and /etc/openmosix.map?'

They represent three stages of Mosix/openmosix growth. The file /etc/mosix.map is the orginal Mosix map name, The file /etc/hpc.map was an early openMosix map name (and 'hpc' is still used for the /proc files in openMosix). The current map name is /etc/openmosix.map.


'setpe: the supplied table is well-formatted, but my IP address (127.0.0.1) is not there'

You'll need to modify your /etc/hosts file. On Red Hat machines mostly the /etc/hosts file includes a line like:

  127.0.0.1  hostname.domain.com localhost

If hostname.domain.com has an IP address of 192.168.10.250, and if you looked up hostname.domain.com you might get 127.0.0.1 as an answer.

However, if you put

  192.168.10.250  hostname.domain.com
  127.0.0.1       localhost

in your /etc/hosts, openMosix won't complain.


'I want to install openMosix but I am afraid my machines are too weak for this'

A machine is never too weak: I have three P200s (64MB each) and two P166s (one with 48MB and one with 192MB). Two of them are on 10Base-T and the other three on 100Base-T. Even with these antiquated machines and "heterogenous" network, I get perfect load balancing to run simulation programs that I write in Perl. (See [ProgramToTestACluster]. Don't be held back by the fact your machines are old. To me this is a nice feature of openMosix: you can add newer machines to an existing cluster as they become available. And you do not need to have all identical machines. That's fantastic, However, a 100Base-T network is recommended

Contributed by [Charles Nadeau].

'In fact:'

I had 6 486 computers (from 25 Mhz to 66 Mhz) sharing a Coax 10Mbit network. They all had 16MB of memory and no harddisk. That worked just fine. Processes migrated perfectly and everything was going as smooth as a baby's arse :)

Adding a P75 was no problem at all; simply editing openmosix.map and making sure all the kernels were the same version.

Contributed by [Baardman|].


'Under what conditions does VMWare work with openMosix'

Unlike previously mentionned here, it won't.


'What architectures besides x86 (e.g. SPARC, AXP, PPC...) are supported by openMosix?'

Only IA-32 is currently supported. The port of openMosix to the Intel(r) Itanium(tm) IA-64 Processor Family is complete. Project plans for openMosix' second year include porting to the 64-bit AMD Opteron(tm) processor.


'Is there a parallel make tool for openMosix such as MPmake?'

You can use regular make. Use 'make -j#', where the # represents how many child proccesses to spawn.


'Where is openMosix currently deployed? Any organisation/university names?'

There are many, search the Web or the Archives. (Try a Google Search for openMosix +edu )


'Any realistic chance of seeing openMosix on OS X or *BSD?'

Not likely soon. The next Port in the Plan is to Opteron. Could a new sponsor bring a new Port? Do note that changes are being made to make openMosix easier to Port to other platforms.

[Xgrid] is something that's kind of similar.

Personal tools