From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mel Gorman Subject: Re: [PATCH v5 08/31] list: add a new LRU list type Date: Fri, 10 May 2013 10:21:05 +0100 Message-ID: <20130510092105.GK11497@suse.de> References: <1368079608-5611-1-git-send-email-glommer@openvz.org> <1368079608-5611-9-git-send-email-glommer@openvz.org> <20130509133742.GW11497@suse.de> <518C0ECF.8010302@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Cc: Glauber Costa , linux-mm@kvack.org, Andrew Morton , cgroups@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com, Johannes Weiner , Michal Hocko , hughd@google.com, Greg Thelen , linux-fsdevel@vger.kernel.org, Dave Chinner To: Glauber Costa Return-path: Content-Disposition: inline In-Reply-To: <518C0ECF.8010302@parallels.com> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Fri, May 10, 2013 at 01:02:07AM +0400, Glauber Costa wrote: > On 05/09/2013 05:37 PM, Mel Gorman wrote: > > On Thu, May 09, 2013 at 10:06:25AM +0400, Glauber Costa wrote: > >> From: Dave Chinner > >> > >> Several subsystems use the same construct for LRU lists - a list > >> head, a spin lock and and item count. They also use exactly the same > >> code for adding and removing items from the LRU. Create a generic > >> type for these LRU lists. > >> > >> This is the beginning of generic, node aware LRUs for shrinkers to > >> work with. > >> > >> [ glommer: enum defined constants for lru. Suggested by gthelen, > >> don't relock over retry ] > >> Signed-off-by: Dave Chinner > >> Signed-off-by: Glauber Costa > >> Reviewed-by: Greg Thelen > >>> > >>> > >>> > >> + > >> +unsigned long > >> +list_lru_walk( > >> + struct list_lru *lru, > >> + list_lru_walk_cb isolate, > >> + void *cb_arg, > >> + long nr_to_walk) > >> +{ > >> + struct list_head *item, *n; > >> + unsigned long removed = 0; > >> + > >> + spin_lock(&lru->lock); > >> +restart: > >> + list_for_each_safe(item, n, &lru->list) { > >> + enum lru_status ret; > >> + > >> + if (nr_to_walk-- < 0) > >> + break; > >> + > >> + ret = isolate(item, &lru->lock, cb_arg); > >> + switch (ret) { > >> + case LRU_REMOVED: > >> + lru->nr_items--; > >> + removed++; > >> + break; > >> + case LRU_ROTATE: > >> + list_move_tail(item, &lru->list); > >> + break; > >> + case LRU_SKIP: > >> + break; > >> + case LRU_RETRY: > >> + goto restart; > >> + default: > >> + BUG(); > >> + } > >> + } > > > > What happened your suggestion to only retry once for each object to > > avoid any possibility of infinite looping or stalling for prolonged > > periods of time waiting on XFS to do something? > > > > Sorry. It wasn't clear for me if you were just trying to make sure we > had a way out in case it proves to be a problem, or actually wanted a > change. > Either. If you are sure there is a way out for XFS using LRU_RETRY without prolonged stalls then it's fine. If it is not certain then I would be much more comfortable with a retry-once and then moving onto the next LRU node. > In any case, I cannot claim to be as knowledgeable as Dave in the > subtleties of such things in the final behavior of the shrinker. Dave, > can you give us your input here? > > I also have another recent observation on this: > > The main difference between LRU_SKIP and LRU_RETRY is that LRU_RETRY > will go back to the beginning of the list, and start scanning it again. > Only sortof true. Lets say we had a list of 8 LRU nodes. Nodes 1-3 get isolated. Node 4 returns LRU_RETRY so we goto restart. The first item on the list is now potentially LRU_RETRY which it must handle before reaching Nodes 5-8 LRU_SKIP is different. If Node 4 returned LRU_SKIP then Node 5-8 are ignored entirely. Actually..... why is that? LRU_SKIP is documented to "item cannot be locked, skip" but what it actually does it "item cannot be locked, abort the walk". It's documented behaviour LRU_SKIP implies continue, not break. case LRU_SKIP: continue; > This is *not* the same behavior we had before, where we used to read: > > for (nr_scanned = nr_to_scan; nr_scanned >= 0; nr_scanned--) { > struct inode *inode; > [ ... ] > > if (inode_has_buffers(inode) || inode->i_data.nrpages) { > __iget(inode); > [ ... ] > iput(inode); > spin_lock(&sb->s_inode_lru_lock); > > if (inode != list_entry(sb->s_inode_lru.next, > struct inode, i_lru)) > continue; <===== > /* avoid lock inversions with trylock */ > if (!spin_trylock(&inode->i_lock)) > continue; <===== > if (!can_unuse(inode)) { > spin_unlock(&inode->i_lock); > continue; <===== > } > } > > It is my interpretation that we in here, we won't really reset the > search, but just skip this inode. > > Another problem is that by restarting the search the way we are doing > now, we actually decrement nr_to_walk twice in case of a retry. By doing > a retry-once test, we can actually move nr_to_walk to the end of the > switch statement, which has the good side effect of getting rid of the > reason we had to allow it to go negative. > > How about we fold the following attached patch to this one? (I would > still have to give it a round of testing) > > diff --git a/lib/list_lru.c b/lib/list_lru.c > index da9b837..4aa069b 100644 > --- a/lib/list_lru.c > +++ b/lib/list_lru.c > @@ -195,12 +195,10 @@ list_lru_walk_node( > unsigned long isolated = 0; > > spin_lock(&nlru->lock); > -restart: > list_for_each_safe(item, n, &nlru->list) { > + bool first_pass = true; > enum lru_status ret; > - > - if ((*nr_to_walk)-- < 0) > - break; > +restart: > > ret = isolate(item, &nlru->lock, cb_arg); > switch (ret) { > @@ -217,10 +215,17 @@ restart: > case LRU_SKIP: > break; > case LRU_RETRY: > + if (!first_pass) > + break; > + first_pass = true; > goto restart; I think this is generally much safer and less likely to report bugs about occasional long stalls during slab shrink. Similar to LRU_SKIP comment above, should this be continue though to actually skip the LRU node instead of aborting the LRU walk? > default: > BUG(); > } > + > + if ((*nr_to_walk)-- == 0) > + break; > + > } > spin_unlock(&nlru->lock); > return isolated; -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org