Hi Christoph, thanks again for your time. Christoph Lameter wrote: > On Tue, 15 Feb 2011, Peter Kruse wrote: > >> > > we have set vm.min_free_kbytes = 2097152 but the problem >> > > obviously did not go away. >> > >> > 2GB of reserves? How much memory does your system have? >> >> 48GB > > Ok then you just may potentially clog up the DMA zones. Maybe set the > reserves to a reasonable level like 10M or so? ok, that's what we had before the first incident, and then increased it to this value to see if it makes difference. > > How many buffers are configured at the various levels for the device that > is receiving messages? I guess that may be a bit on the high side? hm, I'm not sure if I know what you want mean or want me to do. > >> > Could you post the entire messages from the kernel log? We need the OOM >> > info to figure out more about the problem. >> > >> >> I attach one of the call traces, or would it be better if I send the >> kern.log (about 6MB)? > > The call traces are sufficient but the traces vanished when I hit reply. > Include them inline next time. It would be good to have the log starting > at the last system boot. There is some information cut off that I would to > see. Ok, I attach the gzipped kern.log. > > An atomic order 1 allocation failed and led to the OOM but it seems that > there is still ample memory available. Slab is in "fallback_alloc" so > something went wrong with the regular allocation attempt. Any use of > cpusets or cgroups? not that I know of, no. > > A significant amount of memory has been allocated to reclaimable slabs. > I guess these are the socket buffers? > > Feb 10 11:59:49 beosrv1-t kernel: [1968911.211777] Node 0 Normal > free:965164kB min:917952kB low:1147440kB high:1376928kB > active_anon:2742680kB inactive_anon:293184kB active_file:4801512kB > inactive_file:11129708kB unevictable:0kB isolated(anon):0kB > isolated(file):0kB present:21719040kB mlocked:0kB dirty:600kB > writeback:0kB mapped:26356kB shmem:4896kB slab_reclaimable:1780208kB > <-----!! > slab_unreclaimable:199576kB kernel_stack:1576kB pagetables:22956kB > unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 > all_unreclaimable? no > > Could you try to reduce the number of network buffers? which parameter? thanks, Peter