From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 27 Sep 2002 13:19:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 27 Sep 2002 13:19:46 -0400 Received: from dbl.q-ag.de ([80.146.160.66]:65432 "EHLO dbl.q-ag.de") by vger.kernel.org with ESMTP id ; Fri, 27 Sep 2002 13:19:43 -0400 Message-ID: <3D949468.4010202@colorfullife.com> Date: Fri, 27 Sep 2002 19:24:56 +0200 From: Manfred Spraul User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0) X-Accept-Language: en, de MIME-Version: 1.0 To: Ed Tomlinson CC: Andrew Morton , linux-kernel@vger.kernel.org Subject: Re: [patch 3/4] slab reclaim balancing References: <3D931608.3040702@colorfullife.com> <3D9372D3.3000908@colorfullife.com> <3D937E87.D387F358@digeo.com> <200209262041.11227.tomlins@cam.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Ed Tomlinson wrote: > > There is no dispute that in some cases it will be slower from a slab perspective. As > Andrew and you have discussed there are things that can be done to speed things > up. Is not the question really, "Are the vm and slab faster together when slab pages > are freed asap?" > Some caches are quite bursty - what about the 2 kB generic cache that is used for the MTU sized socket buffers? With interrupt mitigation enabled, I'd expect that a GigE nic could allocate a few dozend 2kb objects in every interrupt, and I don't think it's the right approach to effectively disable the cache in slab.c for such loads. I do not have many data points, but in a netbench run on 4-way Xeon, kmem_cache_free is called 5 million times/minute, and additional 4 million calls to kfree - I agree that _reap right now is bad, but IMHO it's questionable if the fix should be inside the hot-path of the allocator. What about this approach: * enable batching even on UP, with a LIFO array in front of the lists. * After flushing a batch back into the lists, the number of free objects in the lists is calculated. If freeable pages exist and the number exceeds a target, then the freeable pages above the target are returned to the page buddy. * The target of freeable pages is increased by kmem_cache_grow - if we had to get another page from gfp, then our own cache was too small. Since the test for the number of freeable objects only happens after batching, i.e. in the worst case once for every 30 kmem_cache_free calls, it doesn't matter if it's a bit expensive. Open problems: * What about cache with large objects (>PAGE_SIZE, e.g. the bio MAX_PAGES object, or the 16 kb socket buffers used over loopback)? Right now, they are not cached in the per-cpu arrays, to reduce the memory pressure. If the list processing becomes slower, we would slow down these slab users. But OTHO if you memcpy 16 kB, then a few cycles in kmalloc probably won't matter much. * Where to flush the per-cpu caches? On a 16-way system, they can contain up to 4000 objects, for each cache. Right now that happens in kmem_cache_reap(). One flush per second would be enough, just to avoid that on lightly loaded slabs, objects remain forever in the per-cpu arrays and prevent pages from becoming freeable. * where is the freeable pages limit decreased? -- Manfred