Folding @ Home FAQ

Select a question and click it for answer or simply scroll down the page

What is Folding at Home (F@H) and why should I care?
Will F@H succeed?
How do I participate? What is a "client"?
How does distributed computing actually work? What kinds of problems are suitable for distributed computing?
How do I get started?
I just joined and I'm not on the KWSN team list at Stanford yet. Why?
What is a "WU"? What is a work unit?
What are "points"? What do they mean? How are they earned?
What kind of work units are there?
Can I pick one kind of project over the other?
Are there problems with the Stanford servers? What is an "outage"?
What's going on? I can download new work, but my finished WU aren't uploading. When do I get my points?
Who keeps track of my points?
Are there any scoring problems? Do I really get the same points per day no matter what?
Is it possible to "game" the system and take advantage of the faster work units?
What is work unit starvation and how do I avoid it?
What is caching? What is queuing? Why doesn't F@H do that?
What about dialup machines? What if a unit ends and the modem is off-line because I am at work?
Is my machine efficient?
Are Pentium IVs an advantage?
How do I get the most out of my Pentium IV (or III)?
What about AMD?
So, what is this -advmethods switch, really?
Since work units have a deadline, how slow can my machine be?
I'm running the Windows console (CLI) and nothing's happening. How come?
Is my machine reliable?
Is the client reliable?
Has anyone "hacked" the Folding at home server? Anyone tried to send bad results back?
What is a "farm"? Should I buy machines just to run Folding at home?
Are there other machines than Intel and AMD on this project?
What is a "deadline"? What is the lifetime of a work unit and why does it matter?
I have an SMP. Why doesn't the Folding client do multi-threading so I can use both CPUs?
Are there add-on products like SETIQueue?

---------------------------------------------------------------------------------------------------------------------------------------------------------

What is Folding at Home (F@H) and why should I care?

The short answer is that the problem of how exactly proteins fold will hold the key to many medical mysteries. Computers can simulate this folding, but only with massive, inexpensive horsepower. Donated computer time (the idle time your computer spends doing nothing) is ideal for this purpose. Folding@Home is a distributed computing project run under the auspices of the smart guys at the Stanford University Chemistry Department. They rely on the contributions of millions of hours of spare computing time by people like you and I. This processing time is used to solve the hardest, latest and most pressing problems in protein biochemistry. 

The understanding gained by the Folding@Home team speeds up the search for therapies and cures for a number of important degenerative conditions of aging. Currently, Alzheimer's is at the head of the list. 

Will F@H succeed?

Very likely. Science doesn't move in 24 hour news cycles. Certainly, papers are being produced and good work is being done. You can go to the site and see animations of proteins folding. That is hardly nothing! The hard part is predicting when something major will happen with all this work. That is to say, news big enough to be easily understood by non-specialists. That may require some patience.

How do I participate? What is a "client"?

To make this kind of project work, there is a central repository (called a "server") and several satellite computers (called "clients"). The client/server relationship is a venerable bit of computer jargon and applies to many relationships on the Internet.

Stanford supplies the server and software to run on the client. We usually call "software to run on the client" as simply "the client." We run "the client" on our own machines.

By a process similar to web browsing (your browser is also a "client" albeit a different kind), work is downloaded to your PC and then the client calculates an answer. We call this "shrubbing or crunching" which is also a venerable bit of computer slang.

When the crunching is done, a result is uploaded to the server. To get the code, one simply downloads it from the Stanford site.

How does distributed computing actually work? What kinds of problems are suitable for distributed computing?

Certain projects are suitable for distribution over the Internet. They have a large set of things to do, with each individual item independent from the others. If these can be broken up into work units, downloaded individually to clients who then work on them for quite a while with no need to contact another computer, you have a project that can be performed with distributed computing.

Folding at home suits this model of computing.

How do I get started?

Pick a client type (see next question) and download it. Installation is straightforward. Simply follow instructions. In Linux, you will have to make it executable with the command: chmod +x F@H3Console- Linux.exe. When configuring, make sure you select team number 117 for The Knights Who Say Ni.

The "console" or CLI has the most options, including the ability to run "invisibly" as a "service" on Windows NT, Windows 2000, and Windows XP using a program called FireDaemon. Experienced users will gravitate towards it. Since it has no pretty pictures to draw, all its efforts are spent actually creating answers.

The graphical (GUI) client is nice for machines you control, especially if you never logoff. It draws pretty pictures and makes it easy to see progress. After one or two work units, it is likely someone wanting to make a stronger contribution will move away from it. The rate of drawing can be reduced which will also make it work harder but keep the pictures coming. One not-so-obvious feature is that it has a slider bar that enables it to consume less than one hundred per cent of the CPU. As will be seen, F@H is often best run with more than one client and this feature is useful for that. 

The screen saver is the classical way to run clients. It is also suitable mainly to more casual users. If you have friends at work who are willing to install it for you, this is by far the simplest way to deploy it. It won't be the most productive, but it will be easy and comprehensible if your friends aren't very computer literate. Running the "console" version as a service (for the NT type machines) is also a great option, but there will be cases where that just won't work out (e.g. Grandma's machine in Alaska).

I just joined and I'm not on the KWSN team list at Stanford yet. Why?

You'll show up when your first unit is completed and uploaded. As long as your client.cfg file has team=117 in it, you've joined up correctly.

What is a "WU"? What is a work unit?

In Folding@home, a work unit is simply something you download and work upon. 

What are "points"? What do they mean? How are they earned?

Once you download a work unit, you'll quickly find they can take anywhere from half a day to a week to complete.

Stanford has learned that participants are highly motivated by their score. Accordingly, each work unit is awarded (by Stanford) a "point" value. The idea is that if you contribute three days of computation, you should get about the same number of points as someone else using the same machine for three days.

Likewise, if your machine is twice as powerful as mine, you should get twice as many points for each day's contribution.

Points are awarded when the unit is completed and returned to Stanford.

What kind of work units are there?

There are two main kinds: Protein Folding and Genome Sequencing. Within both of these broad categories, there are many individual problems at any given time (dictated largely by the interests of particular researchers).

Can I pick one kind of project over the other?

Sort of. You can configure it to run one kind of problem or another, but if it runs short of work, it may occasionally give the "wrong" problem to you. This is not critical.

Dealing with Stanford

Are there problems with the Stanford servers? What is an "outage"?

This is a real world project. Data flows to and from each client to the central server. Moreover, the nature of Folding in particular is that it is important to return answers as quickly as possible.

It is possible that the Stanford servers may become unavailable for any number of reasons for shorter or longer times. This we call an "outage."

See the question on Avoiding Work Unit Starvation for more on how to handle this problem.

What's going on? I can download new work, but my finished WU aren't uploading. When do I get my points?

Soon. One form that outages take seems to be an ability to download new work, but not upload any answers. This is not particularly bad -- typically, they upload in a day or two and you get credit then. In the meantime, you crunch on.

 

Statistics

Who keeps track of my points?

Stanford keeps track of your points and the team for which you crunched them. They are published on the Stanford site (easily found by navigating the site).

Also, various other sites, such as Extreme Overclocking also parse the official stats in various ways and keep score. Various trends may be visible elsewhere that the official site's statistics may not make evident. Most usefully, at the team level, is to track points per day or, better, points per week. This gives an idea not just what the lifetime production is, but what current production is.

Are there any scoring problems? Do I really get the same points per day no matter what?

Getting the same number of points per day is the project designers' goal, but it isn't that clear-cut. As per the FAQ, they use an actual Celeron 500 MHz machine to create their expectations of how much the unit will cost. This isn't a bad scheme -- a "reference" processor is a well-tried technique.

The problem is that the more modern chips (those over 1 GHz in clock rate) may perform substantially better on particular proteins than the Pentium Celeron at 500 MHz. This is especially true if the newer floating point instructions (SSE) can be used. SSE is not available on the 500 MHz Celeron. SSE causes variances of nearly 2-to-1 in points per day from the most to the least advantaged proteins in terms of points per day. 3dNow! also can be an advantage for some units.

Is it possible to "game" the system and take advantage of the faster work units?

It is just imaginable that those with smaller numbers of machines might manage to obtain an advantage. This would involve having two work units per machine and favoring one unit type over another. The -advmethods switch (see here) may also be a source of advantage for some machines, though you should see the forum to confirm this.

 

Avoiding Work Unit Starvation

What is work unit starvation and how do I avoid it?

It is possible to simply run out of work. That is, your machine is "starved" for work. This can happen because Stanford is unavailable for a while or because your machine finishes a work unit while you are away from the machine (e.g. at work). For instance, the dialup connection isn't activated when work completes.

What is caching? What is queuing? Why doesn't F@H do that?

In many projects, it is possible to download more than one work unit at a time. This is especially nice on a dialup network. We call this "caching" but it is also called "queuing." It is the classical way to avoid work unit starvation.

There is infrastructure in place that would appear to allow multiple work units to download, but these haven't been activated in this project.

There's a good reason for that. It turns out that if results are returned as soon as possible it will speed up the overall project. That is, some results are so favorable to the project that it is possible to simply have the server notice this and cancel work units right at the server. Thus, some work is simply avoided. Avoided, that is, if results come back as soon as possible. So, caching is not enabled.

The downside is that since one can't download more than one work unit at a time, if one runs out of work, then the machine is "starved" and doesn't really do much besides wake up once every twenty minutes or so and try and return results.

What about dialup machines? What if a unit ends and the modem is off-line because I am at work?

If you have a dialup machine, the end of a work unit can mean idle cycles since the real reason no upload happens is not that Stanford has an outage -- it is because you, yourself do by not having dialup active at the crucial moment. Thus, for a dialup, an outage is possible on either your end or Stanford's.

OK, so if caching isn't available and either Stanford can't download or I can't upload, how do I cope?

There are several methods, all depending on your interests.

You can run two different projects. The Folding client has some ability to adjust the priority of the program. If you adjust it right, you should end up with about 50 per cent on Folding and 50 per cent on the other project. By preferring or perhaps penalizing Folding you can spend about 20 per cent of the CPU on the unpreferred project instead of 50/50. The two project method works if you want to make progress on two different projects. Dividing two machines so that each gets half the CPU may work better than dedicating each of them, one to a project.
You can run two copies of the Folding at Home client per CPU. This stretches out the time between uploads. Also, if one unit terminates, the other takes over and runs 100 per cent. You probably have ample time to get the finished work uploaded and a new, second unit downloaded.
With the Windows GUI client, it is easy to fine-tune the ratio of Folding to the other project (even a second Folding client). You select the higher priority and then move the slider bar as you wish (it is unmarked, but under "configure" and "advanced" where the CPU percentage bar is, each vertical line is about 10 per cent of performance). Note that you can make the Windows GUI have most of the CPU or not so much of it, but you always give it higher priority. The slider bar sets the ratio very nicely. Note that a second Folding client must be the console ("CLI").
In Linux, it is possible to monitor the machine and, when the CPU goes idle long enough, launch a second project. This would be some other suitable project. The GIMPS (Great Internet Mersenne Prime Search) has work units suitable for this (double checking is particularly nice). ECC2 or possibly Distributed Folding could also work. The second project should be able to check out work units for long periods of time (measured in weeks or months). The more successful you are at keeping F@H running, the more a long checkout time matters.

The Perl script for the Linux "launch a second project" technique looks like this:

#!/usr/bin/perl

@input = <STDIN>;

$_ = shift @input;

($one,$five,$fifteen,$junk)= split / /;
if ($fifteen < 0.7) { system($ARGV[0]); };


and it is activated as follows (if the above is made into runbackup.pl):

cat /proc/loadavg | ./runbackup.pl "./mprime"


The "if" statement compares against 0.7, which is about right. You don't want to pick a number too high or else your CPU will sit idle longer than it needs to after a folding unit finishes. Pick a comparison value too low and you launch the backup unnecessarily (e. g. while the finished work unit is uploading normally).

A cron table entry can be used to launch that string every fifteen minutes or so:

1,16,31,46 * * * * cd /home/me/gimps ; cat /proc/loadavg \
| ./runbackup.pl "./mprime -d"


Note that the backslash is just for aesthetics on this page. Remove it and join the next line to it for a single, one line entry.

A second instance of Folding is not suitable for this technique because work units need to finish and be returned. Run two copies of Folding if you want Folding to be your "backup."

This technique should cause your CPU to be idle less than one per cent of the time and yet give most of the time to Folding.

 

Performance and General Operation

Is my machine efficient?

Probably the most important performance item, in the end, is avoiding work unit starvation. Having one's machine idle will add up quickly and probably matters more than tweaking things.

Folding itself varies quite a bit, but many work units have the same "signature" (e.g. 400 frames) and will use the same "core" (you can see which is being used with Linux "top" or the Windows Task Manager). Once you get the hang of this, you'll be able to get some idea by looking at the output (or the log file) as to whether the time to do a frame is "normal" or not.

In terms of starting out with a new box, there are a couple of ideas:

Run the SETI@home standard work unit. On a modern, fast PC, this benchmark will probably run under seven hours. In fact, it is probably under four. 
Run the GIMPS client in "torture test mode" but select benchmarking. That is, download the client and when you invoke it type "no" to the "do you want to join GIMPS" question. Here, instead of actually running the torture test, though, you run the built-in benchmark instead, which includes directions on where to find results for other machines like yours.

Generally speaking, if your results are within 15 per cent of the published numbers, you're in good shape. You're looking for bigger problems, 2-to-1 deficits and the like, which can come from running something you forgot, or perhaps because you set some BIOS parameter wrong.

Are Pentium IVs an advantage?

Any unit that profits from SSE will profit from Pentium IV. However, P III should also have a similar advantage.

How do I get the most out of my Pentium IV (or III)?

When you configure the client, select Folding as your preference. Then, attempt to get work units that exploit the SSE instruction set. SSE are instructions recently added to the Intel architecture that older machines don't have. At present writing, ensuring SSE is used is difficult. The -advmethods switch currently can accomplish this, but check with the team newsgroup to see if that has continued to be true (-advmethods may change over time, see a few questions down for more).

If you didn't select Folding at the initial install, you can still use the -advmethods switch at any time, assuming it works.

What about AMD?

Later AMD models (basically, most of those over 1 GHz) should work better with Folding and - advmethods set.

So, what is this -advmethods switch, really?

It is a kind of prototyping technique built-in to the client. New algorithms, levels of code, or new proteins can all be tried out. At this writing, it tends to favor proteins which use SSE-based calculations.

One possible downside: While it isn't clear, it may be that the -advmethods is less reliable. We've seen occasional failures in the client and are trying to understand when this happens. The best guess is that it is more likely when -advmethods is set as this represents the "beta" part of the project.

Overall, it appears anecdotally that the client will be up at least 98 per cent of the time, though it is possible to lose several days' work in the worst case. If you are concerned about this (especially if you are on dialup or otherwise away from the machine for longer periods of time), perhaps you should avoid -advmethods, at least at this writing.

Consult the forum for the latest news on -advmethods.

Since work units have a deadline, how slow can my machine be?

You should find that any Pentium II class machine of 300 MHz or more is usable, provided it can run 24 hours a day, 7 days a week. These machines should only run one work unit, so they probably need a continuous connection. For home use, you can probably save money by replacing them (electricity costs matter if you run long enough). Thus, we're mostly talking work machines or maybe a few firewall boxes at home. If you're on dialup, 500 MHz is probably the minimum so that two units per machine is somewhat rational. For dialup, Pentium IIIs, with 700 MHz or more, will be more comfortable.

K6 machines might also work, though it is not clear what the cut-off is. 3dNow! might help, but some of them were slower than Pentiums at the same MHz rating in that era. 450-500 MHz is probably "safe."

Strategically, all these slower machines (up through about 700 MHz at minimum) are best run with - advmethods off. Whatever -advmethods means will have little or no advantage for these machines. You gain stability and, for your SSE-capable boxes, more points.

I'm running the Windows console (CLI) and nothing's happening. How come?

Did you simply press return when it asked you what CPU percentage to use? The default (oddly enough) is zero. Reconfigure and put in 100 per cent and all will be well.

Is my machine reliable?

Too often, especially when overclocking, one can lose sight of the fact that a correct answer beats a quicker but wrong one. There are a variety of tests available. Many members swear by the test that ships as part of the GIMPS client. Any machine that can run that for a day is very robust. You can have confidence your machine is good (with or without overclock) if it passes that test.

To get the GIMPS code, download the GIMPS client from www.mersenne.org. When you start that client, the first question is "do you want to join GIMPS?" Answer "no" and you don't have to join the GIMPS project to do this test. Think of it as a coincidence that GIMPS is another distributed project. In fact, if you search around the net, you'll see this "torture test" is popular amongst gamers and others who don't even do distributed computing.

Is the client reliable?

As noted above, the -advmethods switch may cause some occasional failures. The worst ones have required a partially finished work unit to be tossed away and restarted with a new download. But, that is beta code, so some failure is expected.

Not as clear is how reliable the regular client is. Advise: Run the GIMPS torture test a while to make sure your hardware is sound and then (if you worry about this) avoid using -advmethods.



Hacking, Farming, and Other Advanced Issues

Has anyone "hacked" the Folding at home server? Anyone tried to send bad results back?

This isn't clear. Other projects have needed to fend off various "hack" attacks. SETI@home has been plagued with this although it seems only to result in extra "credit" without harming the projects' science. The RC5-64 project also had problems as well.

The regular F@H FAQ does not discuss this problem. There may be safeguards that aren't obvious (we know that security items are not always advertised by distributed computing project designers). In particular, it is always possible for the server to send out a problem with a well-known answer from time to time. This will catch most hacks and cheats who must, in the end, simulate users.

In the end, the best assurance comes from the project surviving an actual attack. None have been publicized.

What is a "farm"? Should I buy machines just to run Folding at home?

Buying machines just for distributed computing (a "farm" when it is more than one) is a very personal decision. Some people get the 'bug' very badly and do buy their own machines (often "stripped down" in various ways so that they really are just for Folding at home).

This is a volunteer project. The original intent was to simply use idle cycles on existing machines bought for another reason. The project owes you nothing in particular except a "thank you." It did not ask you to spend money on it.

If you buy a machine just for this project, you must be prepared to see arbitrary changes made to the client software or the project for reasons that are not today obvious. This may make particular hardware purchases obsolete overnight in the sense that their advantage over some other bit of hardware vanishes. The psychic reward of getting more points may vanish. Stanford has done this in the past. You have been warned.

That said, there's a lot to be learned about building machines on the cheap, doing overclocking, building home networks, and installing Linux.

Are there other machines than Intel and AMD on this project?

You bet! There is a Mac OS X console client and a Mac OS X GUI. The Mac can even manage SMPs (see also the SMP question). To manage SMPs, simply make sure the console version available and executes with the -local flag. This may take making a script or two by hand. OS X has similar facilities to Linux, so most of the advice given for Linux can be used or adapted. Use this script for toggling client console execution:

#!/bin/tcsh
cd /Applications/Folding@Home\ CLI/
setenv VAR `ps aux | grep local | grep OSX-3.12`
if ( "$VAR" != "" ) then
/ENDFAH.command
else
/FAH.command
endif
endif

Note that the name, OSX-3.12, will probably change over time. Note also the "back quote" character before the "ps" and at the end. Like Linux, any script of this sort needs to have the preparatory command:

chmod +x name.of.script

before it is executable. The above invokes two other scripts. The FAH.command is:

cd /Applications/Folding@Home\ CLI/
./OSX-3.12 -local

and the ENDFAH.command is:

killall OSX-3.12

What is a "deadline"? What is the lifetime of a work unit and why does it matter?

As noted earlier, it is important to return results as quickly as possible. In some cases, this rapid return will allow some future units to be canceled. So, when the points value was set, Folding at home also establishes a "deadline" based on expected participation. If you don't return a result until after the deadline passes, your work was handed out again. You will get credit, but it is likely you won't have really contributed to progressing the science.

If your machine is faster than a Celeron 500 MHz and you run the machine 24 hours a day, you should never have to worry about deadlines, even if you run two copies of the program per CPU. The author recently ran with F@H at 25 per cent of a 2.4 GHz P IV and still was three times faster than the deadline.

I have an SMP. Why doesn't the Folding client do multi-threading so I can use both CPUs?

It doesn't have to. If you have two CPUs, simply start up the console client in a second directory and (on Windows or Mac) specify -local when invoking the client code. This is a teeny bit of added work, but distributed computing is so ideal, this is just as efficient as a multi-threaded client. Multi-threaded clients are more work for the projects' authors and provide no performance advantages at all. So, they tend not to be provided.

You should not simply copy the first directory to start up the second copy. It should be installed by copying the code and answering the questions over again at the first invocation.

Are there add-on products like SETIQueue?

Here are links to few items:

Electron Microscope III

kdfold

hideit for Windows (makes any program a service)

 

For information on setup of Folding @ Home on your computer please see: The Knights Who Say Ni! Getting Started Guide
For information on the Do's and Don't of Folding @ Home seeThe Knights Who Say Ni Do's and Don'ts Guide