[2/2] mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo
diff mbox series

Message ID 20191025072610.18526-3-mhocko@kernel.org
State In Next
Commit c1ec24201fa90a9c1c11aea11526efc0d85ac470
Headers show
Series
  • mm: reduce /proc/pagetypeinfo ovehead
Related show

Commit Message

Michal Hocko Oct. 25, 2019, 7:26 a.m. UTC
From: Michal Hocko <mhocko@suse.com>

pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
This is not really nice because it blocks both any interrupts on that
cpu and the page allocator. On large machines this might even trigger
the hard lockup detector.

Considering the pagetypeinfo is a debugging tool we do not really need
exact numbers here. The primary reason to look at the outuput is to see
how pageblocks are spread among different migratetypes and low number of
pages is much more interesting therefore putting a bound on the number
of pages on the free_list sounds like a reasonable tradeoff.

The new output will simply tell
[...]
Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648

instead of
Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648

The limit has been chosen arbitrary and it is a subject of a future
change should there be a need for that.

While we are at it, also drop the zone lock after each free_list
iteration which will help with the IRQ and page allocator responsiveness
even further as the IRQ lock held time is always bound to those 100k
pages.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmstat.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Comments

Vlastimil Babka Oct. 25, 2019, 7:35 a.m. UTC | #1
On 10/25/19 9:26 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
> This is not really nice because it blocks both any interrupts on that
> cpu and the page allocator. On large machines this might even trigger
> the hard lockup detector.
> 
> Considering the pagetypeinfo is a debugging tool we do not really need
> exact numbers here. The primary reason to look at the outuput is to see
> how pageblocks are spread among different migratetypes and low number of
> pages is much more interesting therefore putting a bound on the number
> of pages on the free_list sounds like a reasonable tradeoff.
> 
> The new output will simply tell
> [...]
> Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648
> 
> instead of
> Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648
> 
> The limit has been chosen arbitrary and it is a subject of a future
> change should there be a need for that.
> 
> While we are at it, also drop the zone lock after each free_list
> iteration which will help with the IRQ and page allocator responsiveness
> even further as the IRQ lock held time is always bound to those 100k
> pages.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/vmstat.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4e885ecd44d1..ddb89f4e0486 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1383,12 +1383,29 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  			unsigned long freecount = 0;
>  			struct free_area *area;
>  			struct list_head *curr;
> +			bool overflow = false;
>  
>  			area = &(zone->free_area[order]);
>  
> -			list_for_each(curr, &area->free_list[mtype])
> -				freecount++;
> -			seq_printf(m, "%6lu ", freecount);
> +			list_for_each(curr, &area->free_list[mtype]) {
> +				/*
> +				 * Cap the free_list iteration because it might
> +				 * be really large and we are under a spinlock
> +				 * so a long time spent here could trigger a
> +				 * hard lockup detector. Anyway this is a
> +				 * debugging tool so knowing there is a handful
> +				 * of pages in this order should be more than
> +				 * sufficient
> +				 */
> +				if (++freecount >= 100000) {
> +					overflow = true;
> +					break;
> +				}
> +			}
> +			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> +			spin_unlock_irq(&zone->lock);
> +			cond_resched();
> +			spin_lock_irq(&zone->lock);
>  		}
>  		seq_putc(m, '\n');
>  	}
>
David Hildenbrand Oct. 25, 2019, 8:21 a.m. UTC | #2
On 25.10.19 09:26, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
> This is not really nice because it blocks both any interrupts on that
> cpu and the page allocator. On large machines this might even trigger
> the hard lockup detector.
> 
> Considering the pagetypeinfo is a debugging tool we do not really need
> exact numbers here. The primary reason to look at the outuput is to see
> how pageblocks are spread among different migratetypes and low number of
> pages is much more interesting therefore putting a bound on the number
> of pages on the free_list sounds like a reasonable tradeoff.
> 
> The new output will simply tell
> [...]
> Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648
> 
> instead of
> Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648
> 
> The limit has been chosen arbitrary and it is a subject of a future
> change should there be a need for that.
> 
> While we are at it, also drop the zone lock after each free_list
> iteration which will help with the IRQ and page allocator responsiveness
> even further as the IRQ lock held time is always bound to those 100k
> pages.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>   mm/vmstat.c | 23 ++++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4e885ecd44d1..ddb89f4e0486 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1383,12 +1383,29 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>   			unsigned long freecount = 0;
>   			struct free_area *area;
>   			struct list_head *curr;
> +			bool overflow = false;
>   
>   			area = &(zone->free_area[order]);
>   
> -			list_for_each(curr, &area->free_list[mtype])
> -				freecount++;
> -			seq_printf(m, "%6lu ", freecount);
> +			list_for_each(curr, &area->free_list[mtype]) {
> +				/*
> +				 * Cap the free_list iteration because it might
> +				 * be really large and we are under a spinlock
> +				 * so a long time spent here could trigger a
> +				 * hard lockup detector. Anyway this is a
> +				 * debugging tool so knowing there is a handful
> +				 * of pages in this order should be more than

"of this order" ?

> +				 * sufficient

s/sufficient"/sufficient." ?

> +				 */
> +				if (++freecount >= 100000) {
> +					overflow = true;
> +					break;
> +				}
> +			}
> +			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> +			spin_unlock_irq(&zone->lock);
> +			cond_resched();
> +			spin_lock_irq(&zone->lock);
>   		}
>   		seq_putc(m, '\n');
>   	}
> 

Acked-by: David Hildenbrand <david@redhat.com>
Rafael Aquini Oct. 25, 2019, 12:52 p.m. UTC | #3
On Fri, Oct 25, 2019 at 09:26:10AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
> This is not really nice because it blocks both any interrupts on that
> cpu and the page allocator. On large machines this might even trigger
> the hard lockup detector.
> 
> Considering the pagetypeinfo is a debugging tool we do not really need
> exact numbers here. The primary reason to look at the outuput is to see
> how pageblocks are spread among different migratetypes and low number of
> pages is much more interesting therefore putting a bound on the number
> of pages on the free_list sounds like a reasonable tradeoff.
> 
> The new output will simply tell
> [...]
> Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648
> 
> instead of
> Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648
> 
> The limit has been chosen arbitrary and it is a subject of a future
> change should there be a need for that.
> 
> While we are at it, also drop the zone lock after each free_list
> iteration which will help with the IRQ and page allocator responsiveness
> even further as the IRQ lock held time is always bound to those 100k
> pages.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmstat.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4e885ecd44d1..ddb89f4e0486 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1383,12 +1383,29 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  			unsigned long freecount = 0;
>  			struct free_area *area;
>  			struct list_head *curr;
> +			bool overflow = false;
>  
>  			area = &(zone->free_area[order]);
>  
> -			list_for_each(curr, &area->free_list[mtype])
> -				freecount++;
> -			seq_printf(m, "%6lu ", freecount);
> +			list_for_each(curr, &area->free_list[mtype]) {
> +				/*
> +				 * Cap the free_list iteration because it might
> +				 * be really large and we are under a spinlock
> +				 * so a long time spent here could trigger a
> +				 * hard lockup detector. Anyway this is a
> +				 * debugging tool so knowing there is a handful
> +				 * of pages in this order should be more than
> +				 * sufficient
> +				 */
> +				if (++freecount >= 100000) {
> +					overflow = true;
> +					break;
> +				}
> +			}
> +			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> +			spin_unlock_irq(&zone->lock);
> +			cond_resched();
> +			spin_lock_irq(&zone->lock);
>  		}
>  		seq_putc(m, '\n');
>  	}
> -- 
> 2.20.1
> 
Acked-by: Rafael Aquini <aquini@redhat.com>
David Rientjes Oct. 25, 2019, 9:08 p.m. UTC | #4
On Fri, 25 Oct 2019, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
> This is not really nice because it blocks both any interrupts on that
> cpu and the page allocator. On large machines this might even trigger
> the hard lockup detector.
> 
> Considering the pagetypeinfo is a debugging tool we do not really need
> exact numbers here. The primary reason to look at the outuput is to see
> how pageblocks are spread among different migratetypes and low number of
> pages is much more interesting therefore putting a bound on the number
> of pages on the free_list sounds like a reasonable tradeoff.
> 
> The new output will simply tell
> [...]
> Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648
> 
> instead of
> Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648
> 
> The limit has been chosen arbitrary and it is a subject of a future
> change should there be a need for that.
> 
> While we are at it, also drop the zone lock after each free_list
> iteration which will help with the IRQ and page allocator responsiveness
> even further as the IRQ lock held time is always bound to those 100k
> pages.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

I think 100k is a very reasonable threshold.

Acked-by: David Rientjes <rientjes@google.com>

> ---
>  mm/vmstat.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4e885ecd44d1..ddb89f4e0486 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1383,12 +1383,29 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  			unsigned long freecount = 0;
>  			struct free_area *area;
>  			struct list_head *curr;
> +			bool overflow = false;
>  
>  			area = &(zone->free_area[order]);
>  
> -			list_for_each(curr, &area->free_list[mtype])
> -				freecount++;
> -			seq_printf(m, "%6lu ", freecount);
> +			list_for_each(curr, &area->free_list[mtype]) {
> +				/*
> +				 * Cap the free_list iteration because it might
> +				 * be really large and we are under a spinlock
> +				 * so a long time spent here could trigger a
> +				 * hard lockup detector. Anyway this is a
> +				 * debugging tool so knowing there is a handful
> +				 * of pages in this order should be more than
> +				 * sufficient
> +				 */
> +				if (++freecount >= 100000) {

I suppose it's most precise to check freecount > 1000000 to print >100000, 
but I doubt anybody cares :)

> +					overflow = true;
> +					break;
> +				}
> +			}
> +			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> +			spin_unlock_irq(&zone->lock);
> +			cond_resched();
> +			spin_lock_irq(&zone->lock);
>  		}
>  		seq_putc(m, '\n');
>  	}

Patch
diff mbox series

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4e885ecd44d1..ddb89f4e0486 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1383,12 +1383,29 @@  static void pagetypeinfo_showfree_print(struct seq_file *m,
 			unsigned long freecount = 0;
 			struct free_area *area;
 			struct list_head *curr;
+			bool overflow = false;
 
 			area = &(zone->free_area[order]);
 
-			list_for_each(curr, &area->free_list[mtype])
-				freecount++;
-			seq_printf(m, "%6lu ", freecount);
+			list_for_each(curr, &area->free_list[mtype]) {
+				/*
+				 * Cap the free_list iteration because it might
+				 * be really large and we are under a spinlock
+				 * so a long time spent here could trigger a
+				 * hard lockup detector. Anyway this is a
+				 * debugging tool so knowing there is a handful
+				 * of pages in this order should be more than
+				 * sufficient
+				 */
+				if (++freecount >= 100000) {
+					overflow = true;
+					break;
+				}
+			}
+			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
+			spin_unlock_irq(&zone->lock);
+			cond_resched();
+			spin_lock_irq(&zone->lock);
 		}
 		seq_putc(m, '\n');
 	}