From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v6 12/31] fs: convert inode and dentry shrinking to be node aware Date: Sat, 18 May 2013 02:54:23 +0400 Message-ID: <5196B51F.5030508@parallels.com> References: <1368382432-25462-1-git-send-email-glommer@openvz.org> <1368382432-25462-13-git-send-email-glommer@openvz.org> <20130514095200.GI29466@dastard> <5193A95E.70205@parallels.com> <20130516000216.GC24635@dastard> <5195302A.2090406@parallels.com> <20130517005134.GK24635@dastard> <5195DC59.8000205@parallels.com> <51964381.8010406@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Cc: Glauber Costa , , , Andrew Morton , Greg Thelen , , Michal Hocko , Johannes Weiner , , Dave Chinner To: Dave Chinner Return-path: In-Reply-To: <51964381.8010406-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-fsdevel.vger.kernel.org On 05/17/2013 06:49 PM, Glauber Costa wrote: > On 05/17/2013 11:29 AM, Glauber Costa wrote: >> Except that shrink_slab_node would also defer work, right? >> >>>> The only thing I don't like about this is the extra nodemask needed, >>>> which, like the scan control, would have to sit on the stack. >>>> Suggestions for avoiding that problem are welcome.. :) >>>> >> I will try to come up with a patch to do all this, and then we can >> concretely discuss. >> You are also of course welcome to do so as well =) > > > All right. > > I played a bit today with variations of this patch that will keep the > deferred count per node. I will rebase the whole series ontop of it (the > changes can get quite disruptive) and post. I want to believe that > after this, all our regression problems will be gone (famous last words). > > As I have told you, I wasn't seeing problems like you are, and > speculated that this was due to the disk speeds. While this is true, > the patch I came up with makes my workload actually a lot better. > While my caches weren't being emptied, they were being slightly depleted > and then slowly filled again. With my new patch, it is almost > a straight line throughout the whole find run. There is a dent here and > there eventually, but it recovers quickly. It takes some time as well > for steady state to be reached, but once it is, we have all variables > in the equation (dentries, inodes, etc) basically flat. So I guess it > works, and I am confident that it will make your workload better. > > My strategy is to modify the shrinker structure like this: > > struct shrinker { > int (*shrink)(struct shrinker *, struct shrink_control *sc); > long (*count_objects)(struct shrinker *, struct shrink_control *sc); > long (*scan_objects)(struct shrinker *, struct shrink_control *sc); > > int seeks; /* seeks to recreate an obj */ > long batch; /* reclaim batch size, 0 = default */ > unsigned long flags; > > /* These are for internal use */ > struct list_head list; > atomic_long_t *nr_deferred; /* objs pending delete, per node */ > > /* nodes being currently shrunk, only makes sense for NUMA > shrinkers */ > nodemask_t *nodes_shrinking; > > }; > > We need memory allocation now for nr_deferred and nodes_shrinking, but > OTOH we use no stack, and can keep the size of this to be dynamically > adjusted depending on whether or not your shrinker is NUMA aware. > > Guess that is it. Expect news soon. > Except of course that struct shrinker is obviously shared between runs, and this won't cut. Right now I am inclined to really just put this in the stack. The alternative, if it becomes a problem, can be to extend the lru apis to allow us to go for a single node. This way we only need to use 1 extra word in the stack. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx179.postini.com [74.125.245.179]) by kanga.kvack.org (Postfix) with SMTP id AFE2E6B0033 for ; Fri, 17 May 2013 18:53:36 -0400 (EDT) Message-ID: <5196B51F.5030508@parallels.com> Date: Sat, 18 May 2013 02:54:23 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [PATCH v6 12/31] fs: convert inode and dentry shrinking to be node aware References: <1368382432-25462-1-git-send-email-glommer@openvz.org> <1368382432-25462-13-git-send-email-glommer@openvz.org> <20130514095200.GI29466@dastard> <5193A95E.70205@parallels.com> <20130516000216.GC24635@dastard> <5195302A.2090406@parallels.com> <20130517005134.GK24635@dastard> <5195DC59.8000205@parallels.com> <51964381.8010406@parallels.com> In-Reply-To: <51964381.8010406@parallels.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Chinner Cc: Glauber Costa , linux-mm@kvack.org, cgroups@vger.kernel.org, Andrew Morton , Greg Thelen , kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko , Johannes Weiner , linux-fsdevel@vger.kernel.org, Dave Chinner On 05/17/2013 06:49 PM, Glauber Costa wrote: > On 05/17/2013 11:29 AM, Glauber Costa wrote: >> Except that shrink_slab_node would also defer work, right? >> >>>> The only thing I don't like about this is the extra nodemask needed, >>>> which, like the scan control, would have to sit on the stack. >>>> Suggestions for avoiding that problem are welcome.. :) >>>> >> I will try to come up with a patch to do all this, and then we can >> concretely discuss. >> You are also of course welcome to do so as well =) > > > All right. > > I played a bit today with variations of this patch that will keep the > deferred count per node. I will rebase the whole series ontop of it (the > changes can get quite disruptive) and post. I want to believe that > after this, all our regression problems will be gone (famous last words). > > As I have told you, I wasn't seeing problems like you are, and > speculated that this was due to the disk speeds. While this is true, > the patch I came up with makes my workload actually a lot better. > While my caches weren't being emptied, they were being slightly depleted > and then slowly filled again. With my new patch, it is almost > a straight line throughout the whole find run. There is a dent here and > there eventually, but it recovers quickly. It takes some time as well > for steady state to be reached, but once it is, we have all variables > in the equation (dentries, inodes, etc) basically flat. So I guess it > works, and I am confident that it will make your workload better. > > My strategy is to modify the shrinker structure like this: > > struct shrinker { > int (*shrink)(struct shrinker *, struct shrink_control *sc); > long (*count_objects)(struct shrinker *, struct shrink_control *sc); > long (*scan_objects)(struct shrinker *, struct shrink_control *sc); > > int seeks; /* seeks to recreate an obj */ > long batch; /* reclaim batch size, 0 = default */ > unsigned long flags; > > /* These are for internal use */ > struct list_head list; > atomic_long_t *nr_deferred; /* objs pending delete, per node */ > > /* nodes being currently shrunk, only makes sense for NUMA > shrinkers */ > nodemask_t *nodes_shrinking; > > }; > > We need memory allocation now for nr_deferred and nodes_shrinking, but > OTOH we use no stack, and can keep the size of this to be dynamically > adjusted depending on whether or not your shrinker is NUMA aware. > > Guess that is it. Expect news soon. > Except of course that struct shrinker is obviously shared between runs, and this won't cut. Right now I am inclined to really just put this in the stack. The alternative, if it becomes a problem, can be to extend the lru apis to allow us to go for a single node. This way we only need to use 1 extra word in the stack. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [PATCH v6 12/31] fs: convert inode and dentry shrinking to be node aware Date: Sat, 18 May 2013 02:54:23 +0400 Message-ID: <5196B51F.5030508@parallels.com> References: <1368382432-25462-1-git-send-email-glommer@openvz.org> <1368382432-25462-13-git-send-email-glommer@openvz.org> <20130514095200.GI29466@dastard> <5193A95E.70205@parallels.com> <20130516000216.GC24635@dastard> <5195302A.2090406@parallels.com> <20130517005134.GK24635@dastard> <5195DC59.8000205@parallels.com> <51964381.8010406@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51964381.8010406-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Dave Chinner Cc: Glauber Costa , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Andrew Morton , Greg Thelen , kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org, Michal Hocko , Johannes Weiner , linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Dave Chinner On 05/17/2013 06:49 PM, Glauber Costa wrote: > On 05/17/2013 11:29 AM, Glauber Costa wrote: >> Except that shrink_slab_node would also defer work, right? >> >>>> The only thing I don't like about this is the extra nodemask needed, >>>> which, like the scan control, would have to sit on the stack. >>>> Suggestions for avoiding that problem are welcome.. :) >>>> >> I will try to come up with a patch to do all this, and then we can >> concretely discuss. >> You are also of course welcome to do so as well =) > > > All right. > > I played a bit today with variations of this patch that will keep the > deferred count per node. I will rebase the whole series ontop of it (the > changes can get quite disruptive) and post. I want to believe that > after this, all our regression problems will be gone (famous last words). > > As I have told you, I wasn't seeing problems like you are, and > speculated that this was due to the disk speeds. While this is true, > the patch I came up with makes my workload actually a lot better. > While my caches weren't being emptied, they were being slightly depleted > and then slowly filled again. With my new patch, it is almost > a straight line throughout the whole find run. There is a dent here and > there eventually, but it recovers quickly. It takes some time as well > for steady state to be reached, but once it is, we have all variables > in the equation (dentries, inodes, etc) basically flat. So I guess it > works, and I am confident that it will make your workload better. > > My strategy is to modify the shrinker structure like this: > > struct shrinker { > int (*shrink)(struct shrinker *, struct shrink_control *sc); > long (*count_objects)(struct shrinker *, struct shrink_control *sc); > long (*scan_objects)(struct shrinker *, struct shrink_control *sc); > > int seeks; /* seeks to recreate an obj */ > long batch; /* reclaim batch size, 0 = default */ > unsigned long flags; > > /* These are for internal use */ > struct list_head list; > atomic_long_t *nr_deferred; /* objs pending delete, per node */ > > /* nodes being currently shrunk, only makes sense for NUMA > shrinkers */ > nodemask_t *nodes_shrinking; > > }; > > We need memory allocation now for nr_deferred and nodes_shrinking, but > OTOH we use no stack, and can keep the size of this to be dynamically > adjusted depending on whether or not your shrinker is NUMA aware. > > Guess that is it. Expect news soon. > Except of course that struct shrinker is obviously shared between runs, and this won't cut. Right now I am inclined to really just put this in the stack. The alternative, if it becomes a problem, can be to extend the lru apis to allow us to go for a single node. This way we only need to use 1 extra word in the stack.