Linux-m68k Archive on lore.kernel.org
 help / color / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-m68k <linux-m68k@lists.linux-m68k.org>
Subject: [PATCH] mm, page_alloc: Reset zonelist iterator after resetting fair zone allocation policy
Date: Tue, 31 May 2016 11:08:48 +0100
Message-ID: <20160531100848.GR2527@techsingularity.net> (raw)

Geert Uytterhoeven reported the following problem that bisected to commit
c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a
zonelist twice") on m68k/ARAnyM

    BUG: scheduling while atomic: cron/668/0x10c9a0c0
    Modules linked in:
    CPU: 0 PID: 668 Comm: cron Not tainted 4.6.0-atari-05133-gc33d6c06f60f710f #364
    Stack from 10c9a074:
            10c9a074 003763ca 0003d7d0 00361a58 00bcf834 0000029c 10c9a0c0 10c9a0c0
            002f0f42 00bcf5e0 00000000 00000082 0048e018 00000000 00000000 002f0c30
            000410de 00000000 00000000 10c9a0e0 002f112c 00000000 7fffffff 10c9a180
            003b1490 00bcf60c 10c9a1f0 10c9a118 002f2d30 00000000 10c9a174 10c9a180
            0003ef56 003b1490 00bcf60c 003b1490 00bcf60c 0003eff6 003b1490 00bcf60c
            003b1490 10c9a128 002f118e 7fffffff 00000082 002f1612 002f1624 7fffffff
    Call Trace: [<0003d7d0>] __schedule_bug+0x40/0x54
     [<002f0f42>] __schedule+0x312/0x388
     [<002f0c30>] __schedule+0x0/0x388
     [<000410de>] prepare_to_wait+0x0/0x52
     [<002f112c>] schedule+0x64/0x82
     [<002f2d30>] schedule_timeout+0xda/0x104
     [<0003ef56>] set_next_entity+0x18/0x40
     [<0003eff6>] pick_next_task_fair+0x78/0xda
     [<002f118e>] io_schedule_timeout+0x36/0x4a
     [<002f1612>] bit_wait_io+0x0/0x40
     [<002f1624>] bit_wait_io+0x12/0x40
     [<002f12c4>] __wait_on_bit+0x46/0x76
     [<0006a06a>] wait_on_page_bit_killable+0x64/0x6c
     [<002f1612>] bit_wait_io+0x0/0x40
     [<000411fe>] wake_bit_function+0x0/0x4e
     [<0006a1b8>] __lock_page_or_retry+0xde/0x124
     [<00217000>] do_scan_async+0x114/0x17c
     [<00098856>] lookup_swap_cache+0x24/0x4e
     [<0008b7c8>] handle_mm_fault+0x626/0x7de
     [<0008ef46>] find_vma+0x0/0x66
     [<002f2612>] down_read+0x0/0xe
     [<0006a001>] wait_on_page_bit_killable_timeout+0x77/0x7c
     [<0008ef5c>] find_vma+0x16/0x66
     [<00006b44>] do_page_fault+0xe6/0x23a
     [<0000c350>] res_func+0xa3c/0x141a
     [<00005bb8>] buserr_c+0x190/0x6d4
     [<0000c350>] res_func+0xa3c/0x141a
     [<000028ec>] buserr+0x20/0x28
     [<0000c350>] res_func+0xa3c/0x141a
     [<000028ec>] buserr+0x20/0x28

The relationship is not obvious but it's due to a failure to rescan the full
zonelist after the fair zone allocation policy exhausts the batch count. While
this is a functional problem, it's also a performance issue. A page allocator
microbenchmark showed the following

                                 4.7.0-rc1                  4.7.0-rc1
                                   vanilla                 reset-v1r2
Min      alloc-odr0-1     327.00 (  0.00%)           326.00 (  0.31%)
Min      alloc-odr0-2     235.00 (  0.00%)           235.00 (  0.00%)
Min      alloc-odr0-4     198.00 (  0.00%)           198.00 (  0.00%)
Min      alloc-odr0-8     170.00 (  0.00%)           170.00 (  0.00%)
Min      alloc-odr0-16    156.00 (  0.00%)           156.00 (  0.00%)
Min      alloc-odr0-32    150.00 (  0.00%)           150.00 (  0.00%)
Min      alloc-odr0-64    146.00 (  0.00%)           146.00 (  0.00%)
Min      alloc-odr0-128   145.00 (  0.00%)           145.00 (  0.00%)
Min      alloc-odr0-256   155.00 (  0.00%)           155.00 (  0.00%)
Min      alloc-odr0-512   168.00 (  0.00%)           165.00 (  1.79%)
Min      alloc-odr0-1024  175.00 (  0.00%)           174.00 (  0.57%)
Min      alloc-odr0-2048  180.00 (  0.00%)           180.00 (  0.00%)
Min      alloc-odr0-4096  187.00 (  0.00%)           186.00 (  0.53%)
Min      alloc-odr0-8192  190.00 (  0.00%)           190.00 (  0.00%)
Min      alloc-odr0-16384 191.00 (  0.00%)           191.00 (  0.00%)
Min      alloc-odr1-1     736.00 (  0.00%)           445.00 ( 39.54%)
Min      alloc-odr1-2     343.00 (  0.00%)           335.00 (  2.33%)
Min      alloc-odr1-4     277.00 (  0.00%)           270.00 (  2.53%)
Min      alloc-odr1-8     238.00 (  0.00%)           233.00 (  2.10%)
Min      alloc-odr1-16    224.00 (  0.00%)           218.00 (  2.68%)
Min      alloc-odr1-32    210.00 (  0.00%)           208.00 (  0.95%)
Min      alloc-odr1-64    207.00 (  0.00%)           203.00 (  1.93%)
Min      alloc-odr1-128   276.00 (  0.00%)           202.00 ( 26.81%)
Min      alloc-odr1-256   206.00 (  0.00%)           202.00 (  1.94%)
Min      alloc-odr1-512   207.00 (  0.00%)           202.00 (  2.42%)
Min      alloc-odr1-1024  208.00 (  0.00%)           205.00 (  1.44%)
Min      alloc-odr1-2048  213.00 (  0.00%)           212.00 (  0.47%)
Min      alloc-odr1-4096  218.00 (  0.00%)           216.00 (  0.92%)
Min      alloc-odr1-8192  341.00 (  0.00%)           219.00 ( 35.78%)
.
Note that order-0 allocations are unaffected but higher orders get
a small boost from this patch and a large reduction in system CPU
usage overall as can be seen here

           4.7.0-rc1   4.7.0-rc1
             vanilla  reset-v1r2
User           85.32       86.31
System       2221.39     2053.36
Elapsed      2368.89     2202.47

Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb320cde4d6d..557549c81083 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3024,6 +3024,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 		apply_fair = false;
 		fair_skipped = false;
 		reset_alloc_batches(ac->preferred_zoneref->zone);
+		z = ac->preferred_zoneref;
 		goto zonelist_scan;
 	}

                 reply index

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160531100848.GR2527@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=geert@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-m68k Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-m68k/0 linux-m68k/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-m68k linux-m68k/ https://lore.kernel.org/linux-m68k \
		linux-m68k@vger.kernel.org linux-m68k@lists.linux-m68k.org
	public-inbox-index linux-m68k

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-m68k


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git