[PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
@ 2017-12-07 17:03 Lucas Stach
  2017-12-07 19:51 ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Lucas Stach @ 2017-12-07 17:03 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman
  Cc: Michal Hocko, Vlastimil Babka, linux-mm, kernel, patchwork-lst

Since 9cca35d42eb6 (mm, page_alloc: enable/disable IRQs once when freeing
a list of pages) we see excessive IRQ disabled times of up to 250ms on an
embedded ARM system (tracing overhead included).

This is due to graphics buffers being freed back to the system via
release_pages(). Graphics buffers can be huge, so it's not hard to hit
cases where the list of pages to free has 2048 entries. Disabling IRQs
while freeing all those pages is clearly not a good idea.

Introduce a batch limit, which allows IRQ servicing once every few pages.
The batch count is the same as used in other parts of the MM subsystem
when dealing with IRQ disabled regions.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 mm/page_alloc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 73f5d4556b3d..7e5e775e97f4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2684,6 +2684,7 @@ void free_unref_page_list(struct list_head *list)
 {
 	struct page *page, *next;
 	unsigned long flags, pfn;
+	int batch_count = 0;
 
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
@@ -2700,6 +2701,16 @@ void free_unref_page_list(struct list_head *list)
 		set_page_private(page, 0);
 		trace_mm_page_free_batched(page);
 		free_unref_page_commit(page, pfn);
+
+		/*
+		 * Guard against excessive IRQ disabled times when we get
+		 * a large list of pages to free.
+		 */
+		if (++batch_count == SWAP_CLUSTER_MAX) {
+			local_irq_restore(flags);
+			batch_count = 0;
+			local_irq_save(flags);
+		}
 	}
 	local_irq_restore(flags);
 }
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
  2017-12-07 17:03 [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list Lucas Stach
@ 2017-12-07 19:51 ` Mel Gorman
  2017-12-07 23:20   ` Andrew Morton
  2017-12-08 10:03   ` Lucas Stach
  0 siblings, 2 replies; 8+ messages in thread
From: Mel Gorman @ 2017-12-07 19:51 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, linux-mm, kernel,
	patchwork-lst

On Thu, Dec 07, 2017 at 06:03:14PM +0100, Lucas Stach wrote:
> Since 9cca35d42eb6 (mm, page_alloc: enable/disable IRQs once when freeing
> a list of pages) we see excessive IRQ disabled times of up to 250ms on an
> embedded ARM system (tracing overhead included).
> 
> This is due to graphics buffers being freed back to the system via
> release_pages(). Graphics buffers can be huge, so it's not hard to hit
> cases where the list of pages to free has 2048 entries. Disabling IRQs
> while freeing all those pages is clearly not a good idea.
> 

250ms to free 2048 entries? That seems excessive but I guess the
embedded ARM system is not that fast.

> Introduce a batch limit, which allows IRQ servicing once every few pages.
> The batch count is the same as used in other parts of the MM subsystem
> when dealing with IRQ disabled regions.
> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

Thanks.

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
  2017-12-07 19:51 ` Mel Gorman
@ 2017-12-07 23:20   ` Andrew Morton
  2017-12-08  0:25     ` Mel Gorman
  2017-12-08 10:03   ` Lucas Stach
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2017-12-07 23:20 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Lucas Stach, Michal Hocko, Vlastimil Babka, linux-mm, kernel,
	patchwork-lst

On Thu, 7 Dec 2017 19:51:03 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:

> On Thu, Dec 07, 2017 at 06:03:14PM +0100, Lucas Stach wrote:
> > Since 9cca35d42eb6 (mm, page_alloc: enable/disable IRQs once when freeing
> > a list of pages) we see excessive IRQ disabled times of up to 250ms on an
> > embedded ARM system (tracing overhead included).
> > 
> > This is due to graphics buffers being freed back to the system via
> > release_pages(). Graphics buffers can be huge, so it's not hard to hit
> > cases where the list of pages to free has 2048 entries. Disabling IRQs
> > while freeing all those pages is clearly not a good idea.
> > 
> 
> 250ms to free 2048 entries? That seems excessive but I guess the
> embedded ARM system is not that fast.

I wonder how common such lenghty lists are.

If "significantly" then there may be additional benefit in rearranging
free_hot_cold_page_list() so it only walks a small number of list
entries at a time.  So the data from the first loop is still in cache
during execution of the second loop.  And that way this
long-irq-off-time problem gets fixed automagically.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
  2017-12-07 23:20   ` Andrew Morton
@ 2017-12-08  0:25     ` Mel Gorman
  2017-12-08  0:53       ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2017-12-08  0:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lucas Stach, Michal Hocko, Vlastimil Babka, linux-mm, kernel,
	patchwork-lst

On Thu, Dec 07, 2017 at 03:20:59PM -0800, Andrew Morton wrote:
> On Thu, 7 Dec 2017 19:51:03 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > On Thu, Dec 07, 2017 at 06:03:14PM +0100, Lucas Stach wrote:
> > > Since 9cca35d42eb6 (mm, page_alloc: enable/disable IRQs once when freeing
> > > a list of pages) we see excessive IRQ disabled times of up to 250ms on an
> > > embedded ARM system (tracing overhead included).
> > > 
> > > This is due to graphics buffers being freed back to the system via
> > > release_pages(). Graphics buffers can be huge, so it's not hard to hit
> > > cases where the list of pages to free has 2048 entries. Disabling IRQs
> > > while freeing all those pages is clearly not a good idea.
> > > 
> > 
> > 250ms to free 2048 entries? That seems excessive but I guess the
> > embedded ARM system is not that fast.
> 
> I wonder how common such lenghty lists are.
> 

Well, it's release_pages. From core VM and the block layer, not very long
but for drivers and filesystems, it can be arbitrarily long. Even from the
VM, the function can be called a lot but as it's from pagevec context so
it's naturally broken into small pieces anyway.

> If "significantly" then there may be additional benefit in rearranging
> free_hot_cold_page_list() so it only walks a small number of list
> entries at a time.  So the data from the first loop is still in cache
> during execution of the second loop.  And that way this
> long-irq-off-time problem gets fixed automagically.
> 

I'm not sure it's worthwhile. In too many cases, the list of pages being
released are either cache cold or are so long that the cache data is
being thrashed anyway. Once the core page allocator is involved, then
there will be further cache thrashing due to buddy page merging accessing
data that is potentially very close. I think it's unlikely there would be
much value in using alternative schemes unless we were willing to have
very large per-cpu lists -- something I prototyped for fast networking
but never heard back whether it's worthwhile or not.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
  2017-12-08  0:25     ` Mel Gorman
@ 2017-12-08  0:53       ` Andrew Morton
  2017-12-08 10:21         ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2017-12-08  0:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Lucas Stach, Michal Hocko, Vlastimil Babka, linux-mm, kernel,
	patchwork-lst

On Fri, 8 Dec 2017 00:25:37 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:

> Well, it's release_pages. From core VM and the block layer, not very long
> but for drivers and filesystems, it can be arbitrarily long. Even from the
> VM, the function can be called a lot but as it's from pagevec context so
> it's naturally broken into small pieces anyway.

OK.

> > If "significantly" then there may be additional benefit in rearranging
> > free_hot_cold_page_list() so it only walks a small number of list
> > entries at a time.  So the data from the first loop is still in cache
> > during execution of the second loop.  And that way this
> > long-irq-off-time problem gets fixed automagically.
> > 
> 
> I'm not sure it's worthwhile. In too many cases, the list of pages being
> released are either cache cold or are so long that the cache data is
> being thrashed anyway.

Well, whether the incoming data is cache-cold or very-long, doing that
double pass in small bites would reduce thrashing.

> Once the core page allocator is involved, then
> there will be further cache thrashing due to buddy page merging accessing
> data that is potentially very close. I think it's unlikely there would be
> much value in using alternative schemes unless we were willing to have
> very large per-cpu lists -- something I prototyped for fast networking
> but never heard back whether it's worthwhile or not.

I mean something like this....

(strangely indented for clarity)

--- a/mm/page_alloc.c~a
+++ a/mm/page_alloc.c
@@ -2685,12 +2685,17 @@ void free_unref_page_list(struct list_he
 	struct page *page, *next;
 	unsigned long flags, pfn;
 
+while (!list_empty(list)) {
+	unsigned batch = 0;
+
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
 		pfn = page_to_pfn(page);
 		if (!free_unref_page_prepare(page, pfn))
 			list_del(&page->lru);
 		set_page_private(page, pfn);
+		if (batch++ == SWAP_CLUSTER_MAX)
+			break;
 	}
 
 	local_irq_save(flags);
@@ -2699,8 +2704,10 @@ void free_unref_page_list(struct list_he
 
 		set_page_private(page, 0);
 		trace_mm_page_free_batched(page);
+		list_del(&page->lru);	/* now needed, I think? */
 		free_unref_page_commit(page, pfn);
 	}
+}
 	local_irq_restore(flags);
 }
 

But I agree that freeing of a lengthy list is likely to be rare.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
  2017-12-07 19:51 ` Mel Gorman
  2017-12-07 23:20   ` Andrew Morton
@ 2017-12-08 10:03   ` Lucas Stach
  2017-12-08 11:38     ` Mel Gorman
  1 sibling, 1 reply; 8+ messages in thread
From: Lucas Stach @ 2017-12-08 10:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, linux-mm, kernel,
	patchwork-lst

Am Donnerstag, den 07.12.2017, 19:51 +0000 schrieb Mel Gorman:
> On Thu, Dec 07, 2017 at 06:03:14PM +0100, Lucas Stach wrote:
> > Since 9cca35d42eb6 (mm, page_alloc: enable/disable IRQs once when
> > freeing
> > a list of pages) we see excessive IRQ disabled times of up to 250ms
> > on an
> > embedded ARM system (tracing overhead included).
> > 
> > This is due to graphics buffers being freed back to the system via
> > release_pages(). Graphics buffers can be huge, so it's not hard to
> > hit
> > cases where the list of pages to free has 2048 entries. Disabling
> > IRQs
> > while freeing all those pages is clearly not a good idea.
> > 
> 
> 250ms to free 2048 entries? That seems excessive but I guess the
> embedded ARM system is not that fast.

Urgh, yes, I've messed up the order of magnitude in the commit log. It
really is on the order of 25ms. Which is still prohibitively long for
an IRQs off section.

Regards,
Lucas

> > Introduce a batch limit, which allows IRQ servicing once every few
> > pages.
> > The batch count is the same as used in other parts of the MM
> > subsystem
> > when dealing with IRQ disabled regions.
> > 
> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> 
> Thanks.
> 
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
  2017-12-08  0:53       ` Andrew Morton
@ 2017-12-08 10:21         ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2017-12-08 10:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lucas Stach, Michal Hocko, Vlastimil Babka, linux-mm, kernel,
	patchwork-lst

On Thu, Dec 07, 2017 at 04:53:17PM -0800, Andrew Morton wrote:
> On Fri, 8 Dec 2017 00:25:37 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > Well, it's release_pages. From core VM and the block layer, not very long
> > but for drivers and filesystems, it can be arbitrarily long. Even from the
> > VM, the function can be called a lot but as it's from pagevec context so
> > it's naturally broken into small pieces anyway.
> 
> OK.
> 
> > > If "significantly" then there may be additional benefit in rearranging
> > > free_hot_cold_page_list() so it only walks a small number of list
> > > entries at a time.  So the data from the first loop is still in cache
> > > during execution of the second loop.  And that way this
> > > long-irq-off-time problem gets fixed automagically.
> > > 
> > 
> > I'm not sure it's worthwhile. In too many cases, the list of pages being
> > released are either cache cold or are so long that the cache data is
> > being thrashed anyway.
> 
> Well, whether the incoming data is cache-cold or very-long, doing that
> double pass in small bites would reduce thrashing.
> 
> > Once the core page allocator is involved, then
> > there will be further cache thrashing due to buddy page merging accessing
> > data that is potentially very close. I think it's unlikely there would be
> > much value in using alternative schemes unless we were willing to have
> > very large per-cpu lists -- something I prototyped for fast networking
> > but never heard back whether it's worthwhile or not.
> 
> I mean something like this....
> 

Ok yes, I see. That is a viable alternative to Lucas's patch that should
achieve the same result with the bonus of some of the entries still being
cache hot. Lucas, care to give it a spin and see does it also address
your problem?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list
  2017-12-08 10:03   ` Lucas Stach
@ 2017-12-08 11:38     ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2017-12-08 11:38 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, linux-mm, kernel,
	patchwork-lst

On Fri, Dec 08, 2017 at 11:03:23AM +0100, Lucas Stach wrote:
> Am Donnerstag, den 07.12.2017, 19:51 +0000 schrieb Mel Gorman:
> > On Thu, Dec 07, 2017 at 06:03:14PM +0100, Lucas Stach wrote:
> > > Since 9cca35d42eb6 (mm, page_alloc: enable/disable IRQs once when
> > > freeing
> > > a list of pages) we see excessive IRQ disabled times of up to 250ms
> > > on an
> > > embedded ARM system (tracing overhead included).
> > > 
> > > This is due to graphics buffers being freed back to the system via
> > > release_pages(). Graphics buffers can be huge, so it's not hard to
> > > hit
> > > cases where the list of pages to free has 2048 entries. Disabling
> > > IRQs
> > > while freeing all those pages is clearly not a good idea.
> > > 
> > 
> > 250ms to free 2048 entries? That seems excessive but I guess the
> > embedded ARM system is not that fast.
> 
> Urgh, yes, I've messed up the order of magnitude in the commit log. It
> really is on the order of 25ms. Which is still prohibitively long for
> an IRQs off section.
> 

Ok, 25ms is more plausible but I agree that it's still an excessive
amount of time to have IRQs disabled. The problem still needs fixing but
I'd like to see Andrew's approach at least attempted as it should
achieve the same goal while being slightly nicer from a cache hotness
perspective.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-08 11:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-07 17:03 [PATCH] mm: page_alloc: avoid excessive IRQ disabled times in free_unref_page_list Lucas Stach
2017-12-07 19:51 ` Mel Gorman
2017-12-07 23:20   ` Andrew Morton
2017-12-08  0:25     ` Mel Gorman
2017-12-08  0:53       ` Andrew Morton
2017-12-08 10:21         ` Mel Gorman
2017-12-08 10:03   ` Lucas Stach
2017-12-08 11:38     ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.