From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965352Ab1GMLe4 (ORCPT ); Wed, 13 Jul 2011 07:34:56 -0400 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:27059 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964915Ab1GMLez (ORCPT ); Wed, 13 Jul 2011 07:34:55 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AowDAPt/HU55LCkBgWdsb2JhbABUhEaidBUBARYmJYh6slCRAA6BHYIKgXaBDwSjPw Date: Wed, 13 Jul 2011 21:34:50 +1000 From: Dave Chinner To: Chris Wilson Cc: KOSAKI Motohiro , keithp@keithp.com, linux-kernel@vger.kernel.org, airlied@linux.ie, dri-devel@lists.freedesktop.org Subject: Re: [PATCH] i915: slab shrinker have to return -1 if it cant shrink any objects Message-ID: <20110713113450.GT23038@dastard> References: <4E0444CA.3080407@jp.fujitsu.com> <1309424153_44559@CP5-2952> <4E1C15B2.9020800@jp.fujitsu.com> <4E1CE48C.2070402@jp.fujitsu.com> <4E1D550A.80301@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 13, 2011 at 09:40:31AM +0100, Chris Wilson wrote: > On Wed, 13 Jul 2011 17:19:22 +0900, KOSAKI Motohiro wrote: > > (2011/07/13 16:41), Chris Wilson wrote: > > > On Wed, 13 Jul 2011 09:19:24 +0900, KOSAKI Motohiro wrote: > > >> (2011/07/12 19:06), Chris Wilson wrote: > > >>> On Tue, 12 Jul 2011 18:36:50 +0900, KOSAKI Motohiro wrote: > > >>>> Hi, > > >>>> > > >>>> sorry for the delay. > > >>>> > > >>>>> On Wed, 29 Jun 2011 20:53:54 -0700, Keith Packard wrote: > > >>>>>> On Fri, 24 Jun 2011 17:03:22 +0900, KOSAKI Motohiro wrote: > > >> The matter is not in contention. The problem is happen if the mutex is taken > > >> by shrink_slab calling thread. i915_gem_inactive_shrink() have no way to shink > > >> objects. How do you detect such case? > > > > > > In the primary allocator for the backing pages whilst the mutex is held we > > > do __NORETRY and a manual shrinkage of our buffers before failing. That's > > > the largest allocator, all the others are tiny and short-lived by > > > comparison and left to fail. > > > > __NORETRY perhaps might help to avoid false positive oom. But, __NORETRY still makes > > full page reclaim and may drop a lot of innocent page cache, and then system may > > become slow down. > > But in this context, that is memory the user has requested to be used with > the GPU, so the page cache is sacrificed to meet the allocation, if > possible. > > > Of course, you don't meet such worst case scenario so easy. But you may need to > > think worst case if you touch memory management code. > > Actually we'd much rather you took us into account when designing the mm. Heh. Now where have I heard that before? :/ > > If you are thinking the shrinker protocol is too complicated, doc update > > patch is really welcome. > > What I don't understand is the disconnect between objects to shrink and > the number of pages released. We may have tens of thousands of single page > objects that are expensive to free in comparison to a few 10-100MiB > objects that are just sitting idle. Would it be better to report the > estimated number of shrinkable pages instead? Maybe. Then again, maybe not. The shrinker API is designed for slab caches which have a fixed object size, not a variable amount of memory per object, so if you report the number of shrinkable pages, you can make whatever decision you want as to the object(s) you free to free up that many pages. However, that means that if you have 1000 reclaimable pages, and the dentry cache has 1000 reclaimable dentries, the same shrinker calls will free 1000 pages from your cache, but maybe none from the dentry cache due to slab fragmentation. Hence your cache could end up being blown to pieces by light memory pressure by telling the shrinker how many shrinkable pages you have cached. In that case, you want to report a much smaller number so the cache is harder to reclaim under light memory pressure, or don't reclaim as much as the shrinker is asked to reclaim. This is one of the issues I faced when converting the XFS buffer cache to use an internal LRU and a shrinker to reclaim buffers that hold one or more pages. We used to cache the metadata in the page cache and let the VM reclaim from there, but that was a crap-shoot because page reclaim kept trashing the working set of metadata pages and it was simply not fixable. Hence I changed the lifecycle of buffers to include a priority based LRU for reclaiming buffer objects and moved away from using the page cache from holding cached metadata. I let the shrinker know how many reclaimable buffers there are, but it has no idea how much memory each buffer pіns. I don't even keep track of it because from a performance perspective it is irrelevant; what matters is maintaining a minimial working set of metadata buffers under memory pressure. In most cases the buffers hold only one or two pages, but because of the reclaim reference count it can take up to 7 attempts to free a buffer before it is finally reclaimed. Hence the buffer cache tends to hold onto a critical working set quite well under different levels of memory pressure because buffers more likely to be reused are much harder to reclaim than those that are likely to be used ony once. As a result, the LRU resists aggressive reclaim to maintain the necessary working set of buffers quite well. The working set gets smaller as memory pressure goes up, but the shrinker is not able to completely trash the cache like the previous page-cache based version did. It's a very specific solution to the problem of tuning a shrinker for good system behaviour, but it's the only way I found that works.... Oh, and using a mutex to single thread cache reclaim rather than spinlocks is usually a good idea from a scalability point of view because your shrinker can be called simultaneously on every CPU. Spinlocks really, really hurt when that happens, and performance will plummet when it happens because you burn CPU on locks rather than reclaimıng objects. Single threaded object reclaim is still the fastest way to do reclaim if you have global lists and locks. What I'm trying to say is that how you solve the shrinker balance problem for you subsystem will be specific to how you need to hold pages under memory pressure to maintain performance. Sorry I can't give you a better answer than that, but that's what my experience with caches and tuning shrinker behaviour indicates. Cheers, Dave. -- Dave Chinner david@fromorbit.com