All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] vmscan: initialize sc.order in indirect shrink_list() users
@ 2009-02-10 16:51 ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2009-02-10 16:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

shrink_all_memory() and __zone_reclaim() currently don't initialize
the .order field of their scan control.

Both of them call into functions which use that field and make certain
decisions based on a random value.

The functions depending on the .order field are marked with a star,
the faulty entry points are marked with a percentage sign:

* shrink_page_list()
  * shrink_inactive_list()
  * shrink_active_list()
    shrink_list()
      shrink_all_zones()
        % shrink_all_memory()
      shrink_zone()
        % __zone_reclaim()

Initialize .order to zero in shrink_all_memory().  Initialize .order
to the order parameter in __zone_reclaim().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4422301..9ce85ea 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 		.may_unmap = 0,
 		.swap_cluster_max = nr_pages,
 		.may_writepage = 1,
+		.order = 0,
 		.isolate_pages = isolate_pages_global,
 	};
 
@@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 					SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,
 		.swappiness = vm_swappiness,
+		.order = order,
 		.isolate_pages = isolate_pages_global,
 	};
 	unsigned long slab_reclaimable;
-- 
1.6.0.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [patch] vmscan: initialize sc.order in indirect shrink_list() users
@ 2009-02-10 16:51 ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2009-02-10 16:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel

shrink_all_memory() and __zone_reclaim() currently don't initialize
the .order field of their scan control.

Both of them call into functions which use that field and make certain
decisions based on a random value.

The functions depending on the .order field are marked with a star,
the faulty entry points are marked with a percentage sign:

* shrink_page_list()
  * shrink_inactive_list()
  * shrink_active_list()
    shrink_list()
      shrink_all_zones()
        % shrink_all_memory()
      shrink_zone()
        % __zone_reclaim()

Initialize .order to zero in shrink_all_memory().  Initialize .order
to the order parameter in __zone_reclaim().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4422301..9ce85ea 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 		.may_unmap = 0,
 		.swap_cluster_max = nr_pages,
 		.may_writepage = 1,
+		.order = 0,
 		.isolate_pages = isolate_pages_global,
 	};
 
@@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 					SWAP_CLUSTER_MAX),
 		.gfp_mask = gfp_mask,
 		.swappiness = vm_swappiness,
+		.order = order,
 		.isolate_pages = isolate_pages_global,
 	};
 	unsigned long slab_reclaimable;
-- 
1.6.0.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
  2009-02-10 16:51 ` Johannes Weiner
@ 2009-02-11  0:29   ` Andrew Morton
  -1 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2009-02-11  0:29 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, linux-kernel

On Tue, 10 Feb 2009 17:51:35 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> shrink_all_memory() and __zone_reclaim() currently don't initialize
> the .order field of their scan control.
> 
> Both of them call into functions which use that field and make certain
> decisions based on a random value.
> 
> The functions depending on the .order field are marked with a star,
> the faulty entry points are marked with a percentage sign:
> 
> * shrink_page_list()
>   * shrink_inactive_list()
>   * shrink_active_list()
>     shrink_list()
>       shrink_all_zones()
>         % shrink_all_memory()
>       shrink_zone()
>         % __zone_reclaim()
> 
> Initialize .order to zero in shrink_all_memory().  Initialize .order
> to the order parameter in __zone_reclaim().
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4422301..9ce85ea 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  		.may_unmap = 0,
>  		.swap_cluster_max = nr_pages,
>  		.may_writepage = 1,
> +		.order = 0,
>  		.isolate_pages = isolate_pages_global,
>  	};
>  
> @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  					SWAP_CLUSTER_MAX),
>  		.gfp_mask = gfp_mask,
>  		.swappiness = vm_swappiness,
> +		.order = order,
>  		.isolate_pages = isolate_pages_global,
>  	};
>  	unsigned long slab_reclaimable;

The second hunk might fix something, but it would need a correcter
changelog, and some thought about what its runtimes effects are likely
to be, please.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
@ 2009-02-11  0:29   ` Andrew Morton
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2009-02-11  0:29 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, linux-kernel

On Tue, 10 Feb 2009 17:51:35 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> shrink_all_memory() and __zone_reclaim() currently don't initialize
> the .order field of their scan control.
> 
> Both of them call into functions which use that field and make certain
> decisions based on a random value.
> 
> The functions depending on the .order field are marked with a star,
> the faulty entry points are marked with a percentage sign:
> 
> * shrink_page_list()
>   * shrink_inactive_list()
>   * shrink_active_list()
>     shrink_list()
>       shrink_all_zones()
>         % shrink_all_memory()
>       shrink_zone()
>         % __zone_reclaim()
> 
> Initialize .order to zero in shrink_all_memory().  Initialize .order
> to the order parameter in __zone_reclaim().
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4422301..9ce85ea 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  		.may_unmap = 0,
>  		.swap_cluster_max = nr_pages,
>  		.may_writepage = 1,
> +		.order = 0,
>  		.isolate_pages = isolate_pages_global,
>  	};
>  
> @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  					SWAP_CLUSTER_MAX),
>  		.gfp_mask = gfp_mask,
>  		.swappiness = vm_swappiness,
> +		.order = order,
>  		.isolate_pages = isolate_pages_global,
>  	};
>  	unsigned long slab_reclaimable;

The second hunk might fix something, but it would need a correcter
changelog, and some thought about what its runtimes effects are likely
to be, please.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
  2009-02-11  0:29   ` Andrew Morton
@ 2009-02-11  1:52     ` Johannes Weiner
  -1 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2009-02-11  1:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, linux-mm, linux-kernel

[added Mel to CC]

On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> On Tue, 10 Feb 2009 17:51:35 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > the .order field of their scan control.
> > 
> > Both of them call into functions which use that field and make certain
> > decisions based on a random value.
> > 
> > The functions depending on the .order field are marked with a star,
> > the faulty entry points are marked with a percentage sign:
> > 
> > * shrink_page_list()
> >   * shrink_inactive_list()
> >   * shrink_active_list()
> >     shrink_list()
> >       shrink_all_zones()
> >         % shrink_all_memory()
> >       shrink_zone()
> >         % __zone_reclaim()
> > 
> > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > to the order parameter in __zone_reclaim().
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/vmscan.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 4422301..9ce85ea 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> >  		.may_unmap = 0,
> >  		.swap_cluster_max = nr_pages,
> >  		.may_writepage = 1,
> > +		.order = 0,
> >  		.isolate_pages = isolate_pages_global,
> >  	};
> >  
> > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> >  					SWAP_CLUSTER_MAX),
> >  		.gfp_mask = gfp_mask,
> >  		.swappiness = vm_swappiness,
> > +		.order = order,
> >  		.isolate_pages = isolate_pages_global,
> >  	};
> >  	unsigned long slab_reclaimable;
> 
> The second hunk might fix something, but it would need a correcter
> changelog, and some thought about what its runtimes effects are likely
> to be, please.

zone_reclaim() is used by the watermark rebalancing of the buddy
allocator right before trying to do an allocation.  Even though it
tries to reclaim at least 1 << order pages, it doesn't raise sc.order
to increase clustering efforts.

I think this happens with the assumption that the upcoming allocation
can still succeed and in that case we don't want to lump too
aggressively to refill the zone.  The allocation might succeed on
another zone and now we have evicted precious pages due to clustering
while we are still not sure it's even needed.

If the allocation fails from all zones, we still will use lumpy
reclaim for higher orders in kswapd and try_to_free_pages().

So I think that __zone_reclaim() leaves sc.order = 0 intentionally.

Mel?

	Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
@ 2009-02-11  1:52     ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2009-02-11  1:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, linux-mm, linux-kernel

[added Mel to CC]

On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> On Tue, 10 Feb 2009 17:51:35 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > the .order field of their scan control.
> > 
> > Both of them call into functions which use that field and make certain
> > decisions based on a random value.
> > 
> > The functions depending on the .order field are marked with a star,
> > the faulty entry points are marked with a percentage sign:
> > 
> > * shrink_page_list()
> >   * shrink_inactive_list()
> >   * shrink_active_list()
> >     shrink_list()
> >       shrink_all_zones()
> >         % shrink_all_memory()
> >       shrink_zone()
> >         % __zone_reclaim()
> > 
> > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > to the order parameter in __zone_reclaim().
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/vmscan.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 4422301..9ce85ea 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> >  		.may_unmap = 0,
> >  		.swap_cluster_max = nr_pages,
> >  		.may_writepage = 1,
> > +		.order = 0,
> >  		.isolate_pages = isolate_pages_global,
> >  	};
> >  
> > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> >  					SWAP_CLUSTER_MAX),
> >  		.gfp_mask = gfp_mask,
> >  		.swappiness = vm_swappiness,
> > +		.order = order,
> >  		.isolate_pages = isolate_pages_global,
> >  	};
> >  	unsigned long slab_reclaimable;
> 
> The second hunk might fix something, but it would need a correcter
> changelog, and some thought about what its runtimes effects are likely
> to be, please.

zone_reclaim() is used by the watermark rebalancing of the buddy
allocator right before trying to do an allocation.  Even though it
tries to reclaim at least 1 << order pages, it doesn't raise sc.order
to increase clustering efforts.

I think this happens with the assumption that the upcoming allocation
can still succeed and in that case we don't want to lump too
aggressively to refill the zone.  The allocation might succeed on
another zone and now we have evicted precious pages due to clustering
while we are still not sure it's even needed.

If the allocation fails from all zones, we still will use lumpy
reclaim for higher orders in kswapd and try_to_free_pages().

So I think that __zone_reclaim() leaves sc.order = 0 intentionally.

Mel?

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
  2009-02-11  1:52     ` Johannes Weiner
@ 2009-02-16 14:53       ` Mel Gorman
  -1 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2009-02-16 14:53 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, linux-mm, linux-kernel

On Wed, Feb 11, 2009 at 02:52:27AM +0100, Johannes Weiner wrote:
> [added Mel to CC]
> 
> On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> > On Tue, 10 Feb 2009 17:51:35 +0100
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > 
> > > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > > the .order field of their scan control.
> > > 
> > > Both of them call into functions which use that field and make certain
> > > decisions based on a random value.
> > > 
> > > The functions depending on the .order field are marked with a star,
> > > the faulty entry points are marked with a percentage sign:
> > > 
> > > * shrink_page_list()
> > >   * shrink_inactive_list()
> > >   * shrink_active_list()
> > >     shrink_list()
> > >       shrink_all_zones()
> > >         % shrink_all_memory()
> > >       shrink_zone()
> > >         % __zone_reclaim()
> > > 
> > > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > > to the order parameter in __zone_reclaim().
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > ---
> > >  mm/vmscan.c |    2 ++
> > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 4422301..9ce85ea 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> > >  		.may_unmap = 0,
> > >  		.swap_cluster_max = nr_pages,
> > >  		.may_writepage = 1,
> > > +		.order = 0,
> > >  		.isolate_pages = isolate_pages_global,
> > >  	};
> > >  
> > > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > >  					SWAP_CLUSTER_MAX),
> > >  		.gfp_mask = gfp_mask,
> > >  		.swappiness = vm_swappiness,
> > > +		.order = order,
> > >  		.isolate_pages = isolate_pages_global,
> > >  	};
> > >  	unsigned long slab_reclaimable;
> > 
> > The second hunk might fix something, but it would need a correcter
> > changelog, and some thought about what its runtimes effects are likely
> > to be, please.
> 
> zone_reclaim() is used by the watermark rebalancing of the buddy
> allocator right before trying to do an allocation.  Even though it
> tries to reclaim at least 1 << order pages, it doesn't raise sc.order
> to increase clustering efforts.
> 

This affects lumpy reclaim. Direct reclaim via try_to_free_pages() and
kswapd() is still working but the earlier reclaim attempt via zone_reclaim()
on unmapped file and slab pages is ignoring teh order. While it'd be tricky
to measure any difference, it does make sense that __zone_reclaim() initialse
the order with what the caller requested.

> I think this happens with the assumption that the upcoming allocation
> can still succeed and in that case we don't want to lump too
> aggressively to refill the zone. 

I don't get what you mean here. The caller requested the higher order so
the work has been requested.

> The allocation might succeed on
> another zone and now we have evicted precious pages due to clustering
> while we are still not sure it's even needed.
> 

Also not sure what you are getting at here. zone_reclaim() is called for the
preferred zones in order. Attempts are made to free within the preferred zone
and then allocate from it. Granted, it might evict pages and the clustering
was ineffective, but this is the cost of high-order reclaim.

> If the allocation fails from all zones, we still will use lumpy
> reclaim for higher orders in kswapd and try_to_free_pages().
> 
> So I think that __zone_reclaim() leaves sc.order = 0 intentionally.
> 
> Mel?
> 

I don't believe it's intentional. The caller called zone_reclaim() with
a given order. It should be used as part of the reclaim decision.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
@ 2009-02-16 14:53       ` Mel Gorman
  0 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2009-02-16 14:53 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, linux-mm, linux-kernel

On Wed, Feb 11, 2009 at 02:52:27AM +0100, Johannes Weiner wrote:
> [added Mel to CC]
> 
> On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> > On Tue, 10 Feb 2009 17:51:35 +0100
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > 
> > > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > > the .order field of their scan control.
> > > 
> > > Both of them call into functions which use that field and make certain
> > > decisions based on a random value.
> > > 
> > > The functions depending on the .order field are marked with a star,
> > > the faulty entry points are marked with a percentage sign:
> > > 
> > > * shrink_page_list()
> > >   * shrink_inactive_list()
> > >   * shrink_active_list()
> > >     shrink_list()
> > >       shrink_all_zones()
> > >         % shrink_all_memory()
> > >       shrink_zone()
> > >         % __zone_reclaim()
> > > 
> > > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > > to the order parameter in __zone_reclaim().
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > ---
> > >  mm/vmscan.c |    2 ++
> > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 4422301..9ce85ea 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> > >  		.may_unmap = 0,
> > >  		.swap_cluster_max = nr_pages,
> > >  		.may_writepage = 1,
> > > +		.order = 0,
> > >  		.isolate_pages = isolate_pages_global,
> > >  	};
> > >  
> > > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > >  					SWAP_CLUSTER_MAX),
> > >  		.gfp_mask = gfp_mask,
> > >  		.swappiness = vm_swappiness,
> > > +		.order = order,
> > >  		.isolate_pages = isolate_pages_global,
> > >  	};
> > >  	unsigned long slab_reclaimable;
> > 
> > The second hunk might fix something, but it would need a correcter
> > changelog, and some thought about what its runtimes effects are likely
> > to be, please.
> 
> zone_reclaim() is used by the watermark rebalancing of the buddy
> allocator right before trying to do an allocation.  Even though it
> tries to reclaim at least 1 << order pages, it doesn't raise sc.order
> to increase clustering efforts.
> 

This affects lumpy reclaim. Direct reclaim via try_to_free_pages() and
kswapd() is still working but the earlier reclaim attempt via zone_reclaim()
on unmapped file and slab pages is ignoring teh order. While it'd be tricky
to measure any difference, it does make sense that __zone_reclaim() initialse
the order with what the caller requested.

> I think this happens with the assumption that the upcoming allocation
> can still succeed and in that case we don't want to lump too
> aggressively to refill the zone. 

I don't get what you mean here. The caller requested the higher order so
the work has been requested.

> The allocation might succeed on
> another zone and now we have evicted precious pages due to clustering
> while we are still not sure it's even needed.
> 

Also not sure what you are getting at here. zone_reclaim() is called for the
preferred zones in order. Attempts are made to free within the preferred zone
and then allocate from it. Granted, it might evict pages and the clustering
was ineffective, but this is the cost of high-order reclaim.

> If the allocation fails from all zones, we still will use lumpy
> reclaim for higher orders in kswapd and try_to_free_pages().
> 
> So I think that __zone_reclaim() leaves sc.order = 0 intentionally.
> 
> Mel?
> 

I don't believe it's intentional. The caller called zone_reclaim() with
a given order. It should be used as part of the reclaim decision.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
  2009-02-16 14:53       ` Mel Gorman
@ 2009-02-16 22:03         ` Johannes Weiner
  -1 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2009-02-16 22:03 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel

On Mon, Feb 16, 2009 at 02:53:49PM +0000, Mel Gorman wrote:
> On Wed, Feb 11, 2009 at 02:52:27AM +0100, Johannes Weiner wrote:
> > [added Mel to CC]
> > 
> > On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> > > On Tue, 10 Feb 2009 17:51:35 +0100
> > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > 
> > > > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > > > the .order field of their scan control.
> > > > 
> > > > Both of them call into functions which use that field and make certain
> > > > decisions based on a random value.
> > > > 
> > > > The functions depending on the .order field are marked with a star,
> > > > the faulty entry points are marked with a percentage sign:
> > > > 
> > > > * shrink_page_list()
> > > >   * shrink_inactive_list()
> > > >   * shrink_active_list()
> > > >     shrink_list()
> > > >       shrink_all_zones()
> > > >         % shrink_all_memory()
> > > >       shrink_zone()
> > > >         % __zone_reclaim()
> > > > 
> > > > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > > > to the order parameter in __zone_reclaim().
> > > > 
> > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > ---
> > > >  mm/vmscan.c |    2 ++
> > > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 4422301..9ce85ea 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> > > >  		.may_unmap = 0,
> > > >  		.swap_cluster_max = nr_pages,
> > > >  		.may_writepage = 1,
> > > > +		.order = 0,
> > > >  		.isolate_pages = isolate_pages_global,
> > > >  	};
> > > >  
> > > > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > > >  					SWAP_CLUSTER_MAX),
> > > >  		.gfp_mask = gfp_mask,
> > > >  		.swappiness = vm_swappiness,
> > > > +		.order = order,
> > > >  		.isolate_pages = isolate_pages_global,
> > > >  	};
> > > >  	unsigned long slab_reclaimable;
> > > 
> > > The second hunk might fix something, but it would need a correcter
> > > changelog, and some thought about what its runtimes effects are likely
> > > to be, please.
> > 
> > zone_reclaim() is used by the watermark rebalancing of the buddy
> > allocator right before trying to do an allocation.  Even though it
> > tries to reclaim at least 1 << order pages, it doesn't raise sc.order
> > to increase clustering efforts.
> > 
> 
> This affects lumpy reclaim. Direct reclaim via try_to_free_pages() and
> kswapd() is still working but the earlier reclaim attempt via zone_reclaim()
> on unmapped file and slab pages is ignoring teh order. While it'd be tricky
> to measure any difference, it does make sense that __zone_reclaim() initialse
> the order with what the caller requested.
> 
> > I think this happens with the assumption that the upcoming allocation
> > can still succeed and in that case we don't want to lump too
> > aggressively to refill the zone. 
> 
> I don't get what you mean here. The caller requested the higher order so
> the work has been requested.

I meant the buffered_rmqueue() might still succeed even without lumpy
reclaim in the case of low watermarks reached.  And if it does, we
reclaimed 'in aggressive mode without reason'.  If it does NOT, we
still drop into direct reclaim with lumpy reclaim.  Well, this is at
least what I had in mind when writing the above.

> > The allocation might succeed on
> > another zone and now we have evicted precious pages due to clustering
> > while we are still not sure it's even needed.
> > 
> 
> Also not sure what you are getting at here. zone_reclaim() is called for the
> preferred zones in order. Attempts are made to free within the preferred zone
> and then allocate from it. Granted, it might evict pages and the clustering
> was ineffective, but this is the cost of high-order reclaim.

Sure, agreed.  I was just wondering whether higher-order reclaim was
needed up-front when the low watermarks are reached or if it was
enough when direct reclaim is lumpy in case the allocation fails.

	Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
@ 2009-02-16 22:03         ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2009-02-16 22:03 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel

On Mon, Feb 16, 2009 at 02:53:49PM +0000, Mel Gorman wrote:
> On Wed, Feb 11, 2009 at 02:52:27AM +0100, Johannes Weiner wrote:
> > [added Mel to CC]
> > 
> > On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> > > On Tue, 10 Feb 2009 17:51:35 +0100
> > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > 
> > > > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > > > the .order field of their scan control.
> > > > 
> > > > Both of them call into functions which use that field and make certain
> > > > decisions based on a random value.
> > > > 
> > > > The functions depending on the .order field are marked with a star,
> > > > the faulty entry points are marked with a percentage sign:
> > > > 
> > > > * shrink_page_list()
> > > >   * shrink_inactive_list()
> > > >   * shrink_active_list()
> > > >     shrink_list()
> > > >       shrink_all_zones()
> > > >         % shrink_all_memory()
> > > >       shrink_zone()
> > > >         % __zone_reclaim()
> > > > 
> > > > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > > > to the order parameter in __zone_reclaim().
> > > > 
> > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > ---
> > > >  mm/vmscan.c |    2 ++
> > > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 4422301..9ce85ea 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> > > >  		.may_unmap = 0,
> > > >  		.swap_cluster_max = nr_pages,
> > > >  		.may_writepage = 1,
> > > > +		.order = 0,
> > > >  		.isolate_pages = isolate_pages_global,
> > > >  	};
> > > >  
> > > > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > > >  					SWAP_CLUSTER_MAX),
> > > >  		.gfp_mask = gfp_mask,
> > > >  		.swappiness = vm_swappiness,
> > > > +		.order = order,
> > > >  		.isolate_pages = isolate_pages_global,
> > > >  	};
> > > >  	unsigned long slab_reclaimable;
> > > 
> > > The second hunk might fix something, but it would need a correcter
> > > changelog, and some thought about what its runtimes effects are likely
> > > to be, please.
> > 
> > zone_reclaim() is used by the watermark rebalancing of the buddy
> > allocator right before trying to do an allocation.  Even though it
> > tries to reclaim at least 1 << order pages, it doesn't raise sc.order
> > to increase clustering efforts.
> > 
> 
> This affects lumpy reclaim. Direct reclaim via try_to_free_pages() and
> kswapd() is still working but the earlier reclaim attempt via zone_reclaim()
> on unmapped file and slab pages is ignoring teh order. While it'd be tricky
> to measure any difference, it does make sense that __zone_reclaim() initialse
> the order with what the caller requested.
> 
> > I think this happens with the assumption that the upcoming allocation
> > can still succeed and in that case we don't want to lump too
> > aggressively to refill the zone. 
> 
> I don't get what you mean here. The caller requested the higher order so
> the work has been requested.

I meant the buffered_rmqueue() might still succeed even without lumpy
reclaim in the case of low watermarks reached.  And if it does, we
reclaimed 'in aggressive mode without reason'.  If it does NOT, we
still drop into direct reclaim with lumpy reclaim.  Well, this is at
least what I had in mind when writing the above.

> > The allocation might succeed on
> > another zone and now we have evicted precious pages due to clustering
> > while we are still not sure it's even needed.
> > 
> 
> Also not sure what you are getting at here. zone_reclaim() is called for the
> preferred zones in order. Attempts are made to free within the preferred zone
> and then allocate from it. Granted, it might evict pages and the clustering
> was ineffective, but this is the cost of high-order reclaim.

Sure, agreed.  I was just wondering whether higher-order reclaim was
needed up-front when the low watermarks are reached or if it was
enough when direct reclaim is lumpy in case the allocation fails.

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
  2009-02-16 22:03         ` Johannes Weiner
@ 2009-02-17  9:57           ` Mel Gorman
  -1 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2009-02-17  9:57 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, linux-mm, linux-kernel

On Mon, Feb 16, 2009 at 11:03:02PM +0100, Johannes Weiner wrote:
> On Mon, Feb 16, 2009 at 02:53:49PM +0000, Mel Gorman wrote:
> > On Wed, Feb 11, 2009 at 02:52:27AM +0100, Johannes Weiner wrote:
> > > [added Mel to CC]
> > > 
> > > On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> > > > On Tue, 10 Feb 2009 17:51:35 +0100
> > > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > 
> > > > > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > > > > the .order field of their scan control.
> > > > > 
> > > > > Both of them call into functions which use that field and make certain
> > > > > decisions based on a random value.
> > > > > 
> > > > > The functions depending on the .order field are marked with a star,
> > > > > the faulty entry points are marked with a percentage sign:
> > > > > 
> > > > > * shrink_page_list()
> > > > >   * shrink_inactive_list()
> > > > >   * shrink_active_list()
> > > > >     shrink_list()
> > > > >       shrink_all_zones()
> > > > >         % shrink_all_memory()
> > > > >       shrink_zone()
> > > > >         % __zone_reclaim()
> > > > > 
> > > > > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > > > > to the order parameter in __zone_reclaim().
> > > > > 
> > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > > ---
> > > > >  mm/vmscan.c |    2 ++
> > > > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > > > > 
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 4422301..9ce85ea 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> > > > >  		.may_unmap = 0,
> > > > >  		.swap_cluster_max = nr_pages,
> > > > >  		.may_writepage = 1,
> > > > > +		.order = 0,
> > > > >  		.isolate_pages = isolate_pages_global,
> > > > >  	};
> > > > >  
> > > > > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > > > >  					SWAP_CLUSTER_MAX),
> > > > >  		.gfp_mask = gfp_mask,
> > > > >  		.swappiness = vm_swappiness,
> > > > > +		.order = order,
> > > > >  		.isolate_pages = isolate_pages_global,
> > > > >  	};
> > > > >  	unsigned long slab_reclaimable;
> > > > 
> > > > The second hunk might fix something, but it would need a correcter
> > > > changelog, and some thought about what its runtimes effects are likely
> > > > to be, please.
> > > 
> > > zone_reclaim() is used by the watermark rebalancing of the buddy
> > > allocator right before trying to do an allocation.  Even though it
> > > tries to reclaim at least 1 << order pages, it doesn't raise sc.order
> > > to increase clustering efforts.
> > > 
> > 
> > This affects lumpy reclaim. Direct reclaim via try_to_free_pages() and
> > kswapd() is still working but the earlier reclaim attempt via zone_reclaim()
> > on unmapped file and slab pages is ignoring teh order. While it'd be tricky
> > to measure any difference, it does make sense that __zone_reclaim() initialse
> > the order with what the caller requested.
> > 
> > > I think this happens with the assumption that the upcoming allocation
> > > can still succeed and in that case we don't want to lump too
> > > aggressively to refill the zone. 
> > 
> > I don't get what you mean here. The caller requested the higher order so
> > the work has been requested.
> 
> I meant the buffered_rmqueue() might still succeed even without lumpy
> reclaim in the case of low watermarks reached. 

I think I get you now. You are saying that reclaiming order-0 pages increases
the free count so we might get over the watermarks and the allocation would
succeed. Thing is, watermarks calculated in zone_watermark_ok() take order
as a parameter and has this in it.

        for (o = 0; o < order; o++) {
                /* At the next order, this order's pages become * unavailable */
                free_pages -= z->free_area[o].nr_free << o;

So, to reach the low watermarks for a high-order allocation, we still
need lumpy reclaim.

> And if it does, we
> reclaimed 'in aggressive mode without reason'.  If it does NOT, we
> still drop into direct reclaim with lumpy reclaim.  Well, this is at
> least what I had in mind when writing the above.
> 
> > > The allocation might succeed on
> > > another zone and now we have evicted precious pages due to clustering
> > > while we are still not sure it's even needed.
> > > 
> > 
> > Also not sure what you are getting at here. zone_reclaim() is called for the
> > preferred zones in order. Attempts are made to free within the preferred zone
> > and then allocate from it. Granted, it might evict pages and the clustering
> > was ineffective, but this is the cost of high-order reclaim.
> 
> Sure, agreed.  I was just wondering whether higher-order reclaim was
> needed up-front when the low watermarks are reached or if it was
> enough when direct reclaim is lumpy in case the allocation fails.
> 

It's needed up front because order is also taken into account for the
watermark calculations.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] vmscan: initialize sc.order in indirect shrink_list() users
@ 2009-02-17  9:57           ` Mel Gorman
  0 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2009-02-17  9:57 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, linux-mm, linux-kernel

On Mon, Feb 16, 2009 at 11:03:02PM +0100, Johannes Weiner wrote:
> On Mon, Feb 16, 2009 at 02:53:49PM +0000, Mel Gorman wrote:
> > On Wed, Feb 11, 2009 at 02:52:27AM +0100, Johannes Weiner wrote:
> > > [added Mel to CC]
> > > 
> > > On Tue, Feb 10, 2009 at 04:29:48PM -0800, Andrew Morton wrote:
> > > > On Tue, 10 Feb 2009 17:51:35 +0100
> > > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > 
> > > > > shrink_all_memory() and __zone_reclaim() currently don't initialize
> > > > > the .order field of their scan control.
> > > > > 
> > > > > Both of them call into functions which use that field and make certain
> > > > > decisions based on a random value.
> > > > > 
> > > > > The functions depending on the .order field are marked with a star,
> > > > > the faulty entry points are marked with a percentage sign:
> > > > > 
> > > > > * shrink_page_list()
> > > > >   * shrink_inactive_list()
> > > > >   * shrink_active_list()
> > > > >     shrink_list()
> > > > >       shrink_all_zones()
> > > > >         % shrink_all_memory()
> > > > >       shrink_zone()
> > > > >         % __zone_reclaim()
> > > > > 
> > > > > Initialize .order to zero in shrink_all_memory().  Initialize .order
> > > > > to the order parameter in __zone_reclaim().
> > > > > 
> > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > > ---
> > > > >  mm/vmscan.c |    2 ++
> > > > >  1 files changed, 2 insertions(+), 0 deletions(-)
> > > > > 
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 4422301..9ce85ea 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -2112,6 +2112,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> > > > >  		.may_unmap = 0,
> > > > >  		.swap_cluster_max = nr_pages,
> > > > >  		.may_writepage = 1,
> > > > > +		.order = 0,
> > > > >  		.isolate_pages = isolate_pages_global,
> > > > >  	};
> > > > >  
> > > > > @@ -2294,6 +2295,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > > > >  					SWAP_CLUSTER_MAX),
> > > > >  		.gfp_mask = gfp_mask,
> > > > >  		.swappiness = vm_swappiness,
> > > > > +		.order = order,
> > > > >  		.isolate_pages = isolate_pages_global,
> > > > >  	};
> > > > >  	unsigned long slab_reclaimable;
> > > > 
> > > > The second hunk might fix something, but it would need a correcter
> > > > changelog, and some thought about what its runtimes effects are likely
> > > > to be, please.
> > > 
> > > zone_reclaim() is used by the watermark rebalancing of the buddy
> > > allocator right before trying to do an allocation.  Even though it
> > > tries to reclaim at least 1 << order pages, it doesn't raise sc.order
> > > to increase clustering efforts.
> > > 
> > 
> > This affects lumpy reclaim. Direct reclaim via try_to_free_pages() and
> > kswapd() is still working but the earlier reclaim attempt via zone_reclaim()
> > on unmapped file and slab pages is ignoring teh order. While it'd be tricky
> > to measure any difference, it does make sense that __zone_reclaim() initialse
> > the order with what the caller requested.
> > 
> > > I think this happens with the assumption that the upcoming allocation
> > > can still succeed and in that case we don't want to lump too
> > > aggressively to refill the zone. 
> > 
> > I don't get what you mean here. The caller requested the higher order so
> > the work has been requested.
> 
> I meant the buffered_rmqueue() might still succeed even without lumpy
> reclaim in the case of low watermarks reached. 

I think I get you now. You are saying that reclaiming order-0 pages increases
the free count so we might get over the watermarks and the allocation would
succeed. Thing is, watermarks calculated in zone_watermark_ok() take order
as a parameter and has this in it.

        for (o = 0; o < order; o++) {
                /* At the next order, this order's pages become * unavailable */
                free_pages -= z->free_area[o].nr_free << o;

So, to reach the low watermarks for a high-order allocation, we still
need lumpy reclaim.

> And if it does, we
> reclaimed 'in aggressive mode without reason'.  If it does NOT, we
> still drop into direct reclaim with lumpy reclaim.  Well, this is at
> least what I had in mind when writing the above.
> 
> > > The allocation might succeed on
> > > another zone and now we have evicted precious pages due to clustering
> > > while we are still not sure it's even needed.
> > > 
> > 
> > Also not sure what you are getting at here. zone_reclaim() is called for the
> > preferred zones in order. Attempts are made to free within the preferred zone
> > and then allocate from it. Granted, it might evict pages and the clustering
> > was ineffective, but this is the cost of high-order reclaim.
> 
> Sure, agreed.  I was just wondering whether higher-order reclaim was
> needed up-front when the low watermarks are reached or if it was
> enough when direct reclaim is lumpy in case the allocation fails.
> 

It's needed up front because order is also taken into account for the
watermark calculations.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-02-17  9:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-10 16:51 [patch] vmscan: initialize sc.order in indirect shrink_list() users Johannes Weiner
2009-02-10 16:51 ` Johannes Weiner
2009-02-11  0:29 ` Andrew Morton
2009-02-11  0:29   ` Andrew Morton
2009-02-11  1:52   ` Johannes Weiner
2009-02-11  1:52     ` Johannes Weiner
2009-02-16 14:53     ` Mel Gorman
2009-02-16 14:53       ` Mel Gorman
2009-02-16 22:03       ` Johannes Weiner
2009-02-16 22:03         ` Johannes Weiner
2009-02-17  9:57         ` Mel Gorman
2009-02-17  9:57           ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.