linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: clameter@sgi.com, linux-kernel@vger.kernel.org,
	kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org
Subject: Re: [PATCH 5/6] Have zonelist contains structs with both a zone pointer and zone_idx
Date: Wed, 21 Nov 2007 18:48:02 +0000	[thread overview]
Message-ID: <20071121184801.GA31674@csn.ul.ie> (raw)
In-Reply-To: <1195670327.5294.24.camel@localhost>

On (21/11/07 13:38), Lee Schermerhorn didst pronounce:
> On Wed, 2007-11-21 at 13:18 -0500, Lee Schermerhorn wrote:
> > On Wed, 2007-11-21 at 13:12 -0500, Lee Schermerhorn wrote:
> > > Mel:
> > > 
> > > Are the comparisons noted below correct--i.e., '>' rather than '<'?  I'm
> > > trying to understand how this matches the comments and code.  Doesn't
> > > look right to me, but I could be missing something. 
> > > 
> > > Lee
> > 
> > Never mind...   I get it now.  I was confusing zone ids with node ids.  

Cool. I'll admit that the iterator is a little mind-bending but it was
hard to make it obvious in a manner that did not suck performance-wise.

> However...  [possibly another stupid question]
> 
> Don't the loops in {first|next}_zones_zonelist() depend on the zonelists
> being in "zone order" as opposed to "node order".  If the nodelists are
> in node order, the lower zones can be interspersed with the upper zones,
> I think. 

That sounds right.

> Looks like this would terminate the loops as soon as we hit a
> lower zone, even tho' we might not have visited all of the nodes
> represented in the zonelist.

Well, the first|next loops will terminate and return a known useful
zone. The allocator sees if it can be used to satisfy the request and if
not, next() is called again and it moves to the next useful zone.

If in node order, it will try the low zones on the closet node and still
should continue to the next node.

for_each_zone_zonelist_nodemask() will not terminate until the zone
returned by next_zones_zonelist() is NULL at which point there are no
zones left, whether it be node or zone ordering.

> Again, I could be missing some fine point of zonerefs that handles this
> condition, but I can't see it.
> 

I think it is fine but I'm going to run a boot-test for node-order just
in case.

> Lee
> 
> 
> > Lee
> > > 
> > > On Wed, 2007-11-21 at 00:40 +0000, Mel Gorman wrote:
> > > > Filtering zonelists requires very frequent use of zone_idx(). This is costly
> > > > as it involves a lookup of another structure and a substraction operation. As
> > > > the zone_idx is often required, it should be quickly accessible.  The node
> > > > idx could also be stored here if it was found that accessing zone->node is
> > > > significant which may be the case on workloads where nodemasks are heavily
> > > > used.
> > > > 
> > > > This patch introduces a struct zoneref to store a zone pointer and a zone
> > > > index.  The zonelist then consists of an array of these struct zonerefs which
> > > > are looked up as necessary. Helpers are given for accessing the zone index
> > > > as well as the node index.
> > > > 
> > > > [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
> > > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > > Acked-by: Christoph Lameter <clameter@sgi.com>
> > > > Acked-by: David Rientjes <rientjes@google.com>
> > > > ---
> > > > 
> > > >  arch/parisc/mm/init.c  |    2 -
> > > >  fs/buffer.c            |    6 ++--
> > > >  include/linux/mmzone.h |   64 ++++++++++++++++++++++++++++++++++++------
> > > >  include/linux/oom.h    |    4 +-
> > > >  kernel/cpuset.c        |    4 +-
> > > >  mm/hugetlb.c           |    3 +-
> > > >  mm/mempolicy.c         |   36 ++++++++++++++----------
> > > >  mm/oom_kill.c          |   45 ++++++++++++++----------------
> > > >  mm/page_alloc.c        |   66 ++++++++++++++++++++++++--------------------
> > > >  mm/slab.c              |    2 -
> > > >  mm/slub.c              |    2 -
> > > >  mm/vmscan.c            |    6 ++--
> > > >  12 files changed, 149 insertions(+), 91 deletions(-)
> > > > 
> > > <snip>
> > > > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc2-mm1-010_use_two_zonelists/include/linux/mmzone.h linux-2.6.24-rc2-mm1-020_zoneid_zonelist/include/linux/mmzone.h
> > > > --- linux-2.6.24-rc2-mm1-010_use_two_zonelists/include/linux/mmzone.h	2007-11-20 23:27:04.000000000 +0000
> > > > +++ linux-2.6.24-rc2-mm1-020_zoneid_zonelist/include/linux/mmzone.h	2007-11-20 23:27:34.000000000 +0000
> > > > @@ -469,6 +469,15 @@ struct zonelist_cache;
> > > >  #endif
> > > >  
> > > >  /*
> > > > + * This struct contains information about a zone in a zonelist. It is stored
> > > > + * here to avoid dereferences into large structures and lookups of tables
> > > > + */
> > > > +struct zoneref {
> > > > +	struct zone *zone;	/* Pointer to actual zone */
> > > > +	int zone_idx;		/* zone_idx(zoneref->zone) */
> > > > +};
> > > > +
> > > > +/*
> > > >   * One allocation request operates on a zonelist. A zonelist
> > > >   * is a list of zones, the first one is the 'goal' of the
> > > >   * allocation, the other zones are fallback zones, in decreasing
> > > > @@ -476,11 +485,18 @@ struct zonelist_cache;
> > > >   *
> > > >   * If zlcache_ptr is not NULL, then it is just the address of zlcache,
> > > >   * as explained above.  If zlcache_ptr is NULL, there is no zlcache.
> > > > + * *
> > > > + * To speed the reading of the zonelist, the zonerefs contain the zone index
> > > > + * of the entry being read. Helper functions to access information given
> > > > + * a struct zoneref are
> > > > + *
> > > > + * zonelist_zone()	- Return the struct zone * for an entry in _zonerefs
> > > > + * zonelist_zone_idx()	- Return the index of the zone for an entry
> > > > + * zonelist_node_idx()	- Return the index of the node for an entry
> > > >   */
> > > > -
> > > >  struct zonelist {
> > > >  	struct zonelist_cache *zlcache_ptr;		     // NULL or &zlcache
> > > > -	struct zone *zones[MAX_ZONES_PER_ZONELIST + 1];      // NULL delimited
> > > > +	struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1];
> > > >  #ifdef CONFIG_NUMA
> > > >  	struct zonelist_cache zlcache;			     // optional ...
> > > >  #endif
> > > > @@ -713,26 +729,52 @@ extern struct zone *next_zone(struct zon
> > > >  	     zone;					\
> > > >  	     zone = next_zone(zone))
> > > >  
> > > > +static inline struct zone *zonelist_zone(struct zoneref *zoneref)
> > > > +{
> > > > +	return zoneref->zone;
> > > > +}
> > > > +
> > > > +static inline int zonelist_zone_idx(struct zoneref *zoneref)
> > > > +{
> > > > +	return zoneref->zone_idx;
> > > > +}
> > > > +
> > > > +static inline int zonelist_node_idx(struct zoneref *zoneref)
> > > > +{
> > > > +#ifdef CONFIG_NUMA
> > > > +	/* zone_to_nid not available in this context */
> > > > +	return zoneref->zone->node;
> > > > +#else
> > > > +	return 0;
> > > > +#endif /* CONFIG_NUMA */
> > > > +}
> > > > +
> > > > +static inline void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
> > > > +{
> > > > +	zoneref->zone = zone;
> > > > +	zoneref->zone_idx = zone_idx(zone);
> > > > +}
> > > > +
> > > >  /* Returns the first zone at or below highest_zoneidx in a zonelist */
> > > > -static inline struct zone **first_zones_zonelist(struct zonelist *zonelist,
> > > > +static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
> > > >  					enum zone_type highest_zoneidx)
> > > >  {
> > > > -	struct zone **z;
> > > > +	struct zoneref *z;
> > > >  
> > > >  	/* Find the first suitable zone to use for the allocation */
> > > > -	z = zonelist->zones;
> > > > -	while (*z && zone_idx(*z) > highest_zoneidx)
> > > > +	z = zonelist->_zonerefs;
> > > > +	while (zonelist_zone_idx(z) > highest_zoneidx)
> > > !!! HERE:                             ^
> > > >  		z++;
> > > >  
> > > >  	return z;
> > > >  }
> > > >  
> > > >  /* Returns the next zone at or below highest_zoneidx in a zonelist */
> > > > -static inline struct zone **next_zones_zonelist(struct zone **z,
> > > > +static inline struct zoneref *next_zones_zonelist(struct zoneref *z,
> > > >  					enum zone_type highest_zoneidx)
> > > >  {
> > > >  	/* Find the next suitable zone to use for the allocation */
> > > > -	while (*z && zone_idx(*z) > highest_zoneidx)
> > > > +	while (zonelist_zone_idx(z) > highest_zoneidx)
> > > !!! and HERE:                         ^
> > > >  		z++;
> > > >  
> > > >  	return z;
> > > > @@ -748,9 +790,11 @@ static inline struct zone **next_zones_z
> > > >   * This iterator iterates though all zones at or below a given zone index.
> > > >   */
> > > >  #define for_each_zone_zonelist(zone, z, zlist, highidx) \
> > > > -	for (z = first_zones_zonelist(zlist, highidx), zone = *z++;	\
> > > > +	for (z = first_zones_zonelist(zlist, highidx),			\
> > > > +					zone = zonelist_zone(z++);	\
> > > >  		zone;							\
> > > > -		z = next_zones_zonelist(z, highidx), zone = *z++)
> > > > +		z = next_zones_zonelist(z, highidx),			\
> > > > +					zone = zonelist_zone(z++))
> > > >  
> > > >  #ifdef CONFIG_SPARSEMEM
> > > >  #include <asm/sparsemem.h>
> > > <snip>
> 

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

  reply	other threads:[~2007-11-21 18:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-21  0:38 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v10 Mel Gorman
2007-11-21  0:39 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-11-21  0:39 ` [PATCH 2/6] Introduce node_zonelist() for accessing the zonelist for a GFP mask Mel Gorman
2007-11-21  0:39 ` [PATCH 3/6] Remember what the preferred zone is for zone_statistics Mel Gorman
2007-11-21  0:40 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Mel Gorman
2007-11-21  2:37   ` 小崎資広
2007-11-21  0:40 ` [PATCH 5/6] Have zonelist contains structs with both a zone pointer and zone_idx Mel Gorman
2007-11-21 18:12   ` Lee Schermerhorn
2007-11-21 18:18     ` Lee Schermerhorn
2007-11-21 18:38       ` Lee Schermerhorn
2007-11-21 18:48         ` Mel Gorman [this message]
2007-11-21  0:40 ` [PATCH 6/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-12-11 20:21 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2 Mel Gorman
2007-12-11 20:23 ` [PATCH 5/6] Have zonelist contains structs with both a zone pointer and zone_idx Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071121184801.GA31674@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).