J M Cerqueira Esteves wrote: > Andrew Morton wrote: >>We have a candidate fix at >>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/x86_64-mm-blk-bounce.patch. >> [...] The patch is against 2.6.16-rc5. > > Testing that kernel now, with good news: the machine has been apparently > stable, running Gaussian processes for the last 20 hours, with no > oom-killer messages. ... and still using that 2.6.16-rc5 with the suggested patch, during the last 11 days, always doing a lot of number-crunching with Gaussian and other programs, we had no more oom-killings or other noticeable instabilities. I did take the opportunity to configure the kernel with CONFIG_EDAC, CONFIG_EDAC_MM_EDAC and CONFIG_EDAC_E752X, and during this period (11 days) got about 20 messages like these: Mar 7 15:25:08 localhost kernel: [182069.699544] Non-Fatal Error DRAM Controler Mar 7 15:25:08 localhost kernel: [182069.699559] EDAC MC0: CE page 0x9c334, offset 0x0, grain 4096, syndrome 0x2510, row 2, channel 1, label "": e752x CE always with the same values for page, offset, grain, syndrome, row, and channel values. A defective DIMM? Best regards J Esteves -- +351 939838775 Skype:jmcerqueira http://del.icio.us/jmce