I've been looking at this during the past few months. I will sketch out a few of my findinds below. I can follow up with some details and actual data if necessary. On Mon, 08 Dec 2003 05:52:25 -0800, William Lee Irwin III wrote: > Explicit load control is in order. 2.4 appears to work better in these > instances because it victimizes one process at a time. It vaguely > resembles load control with a random demotion policy (mmlist order is Everybody I talked to seemed to assume that 2.4 does better due to the way mapped pages are freed (i.e. swap_out in 2.4). While it is true that the new VM as merged in 2.5.27 didn't exactly help with thrashing performance, the main factors slowing 2.6 down were merged much later. Have a look at the graph attached to this message to get an idea of what I am talking about (x axis is kernel releases after 2.5.0, y axis is time to complete each benchmark). It is important to note that different work loads show different thrashing behavior. Some changes in 2.5 improved one thrashing benchmark and made another worse. However, 2.4 seems to do better than 2.6 across the board, which suggests that some elements are in fact better for any types of thrashing. > Other important aspects of load control beyond the demotion policy are > explicit suspension the execution contexts of the process address > spaces chosen as its victims, complete eviction of the process address I implemented suspension during memory shortage for 2.6 and I had some code for complete eviction as well. It definitely helped for some benchmarks. There's one problem, though: Latency. If a machine is thrashing, a sys admin won't appreciate that her shell is suspended when she tries to log in to correct the problem. I have some simple criteria for selecting a process to suspend, but it's hard to get it right every time (kind of like the OOM killer, although with smaller damage for bad decisions). For workstations and most servers latency is so important compared to throughput that I began to wonder whether implementing suspension was actually worth it. After benchmarking 2.4 vs 2.6, though, I suspected that there must be plenty of room for improvement _before_ such drastic measures are necessary. It makes little sense to add suspension to 2.6 if performance can be improved _without_ hurting latency. That's why I shelved my work on suspension to find out and document when exactly performance went down during 2.5. > 2.4 does not do any of this. > > The effect of not suspending the execution contexts of the demoted > process address spaces is that the victimized execution contexts thrash > while trying to reload the memory they need to execute. The effect of > incomplete demotion is essentially livelock under sufficient stress. > Its memory scheduling to what extent it has it is RR and hence fair, > but the various caveats above justify "does not do any of this", > particularly incomplete demotion. One thing you can observe with 2.4 is that one process may force another process out. Say you have several instances of the same program which all have the same working set size (i.e. requirements, not RSS) and a constant rate of memory references in the code. If their current RSS differ then some take more major faults and spend more time blocked than others. In a thrashing situation, you can see the small RSSs shrink to virtually zero, while the largest RSS will grow even further -- the thrashing processes are stealing each other's pages while the one which hardly ever faults keeps its complete working set in RAM. Bad for fairness, but can help throughput quite a bit. This effect is harder to trigger in 2.6. > So I predict that a true load control mechanism and policy would be > both an improvement over 2.4 and would correct 2.6 regressions vs. 2.4 > on underprovisioned machines. For now, we lack an implementation. I doubt that you can get performance anywhere near 2.4 just by adding load control to 2.6 unless you measure throughput and nothing else -- otherwise latency will kill you. I am convinced the key is not in _adding_ stuff, but _fixing_ what we have. IMO the question is: How much do we care? Machines with tight memory are not necessarily very concerned about paging (e.g. PDAs), and serious servers rarely operate under such conditions: Admins tend to add RAM when the paging load is significant. If you don't care _that_ much about thrashing in Linux, just tell people to buy more RAM. Computers are cheap, RAM even more so, 64 bit becomes affordable, and heavy paging sucks no matter how good a paging mechanism is. If you care enough to spend resources to address the problem, look at the major regressions in 2.5 and find out where they were a consequence of a deliberate trade-off decision and where it was an oversight which can be fixed or mitigated without sacrificing what was gained through the respective changes in 2.5. Obviously, performing regular testing with thrashing benchmarks would make lasting major regressions like those in the 2.5 development series much less likely in the future. Additional load control mechanisms create new problems (latency, increased complexity), so I think they should be a last resort, not some method to paper over deficiencies elsewhere in the kernel. Roger