From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757407Ab2BIMLu (ORCPT ); Thu, 9 Feb 2012 07:11:50 -0500 Received: from mail-ww0-f44.google.com ([74.125.82.44]:50743 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756716Ab2BIMLs convert rfc822-to-8bit (ORCPT ); Thu, 9 Feb 2012 07:11:48 -0500 MIME-Version: 1.0 In-Reply-To: <20120206090841.GF5938@suse.de> References: <20120206090841.GF5938@suse.de> Date: Thu, 9 Feb 2012 20:11:47 +0800 Message-ID: Subject: Re: mm: compaction: Check for overlapping nodes during isolation for migration From: Hillf Danton To: Mel Gorman Cc: Andrew Morton , Linux-MM , LKML Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 6, 2012 at 5:08 PM, Mel Gorman wrote: > When isolating pages for migration, migration starts at the start of a > zone while the free scanner starts at the end of the zone. Migration > avoids entering a new zone by never going beyond the free scanned. > Unfortunately, in very rare cases nodes can overlap. When this happens, > migration isolates pages without the LRU lock held, corrupting lists > which will trigger errors in reclaim or during page free such as in the > following oops > > [ 8739.994311] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > [ 8739.994331] IP: [] free_pcppages_bulk+0xcc/0x450 > [ 8739.994344] PGD 1dda554067 PUD 1e1cb58067 PMD 0 > [ 8739.994350] Oops: 0000 [#1] SMP > [ 8739.994357] CPU 37 > [ 8739.994359] Modules linked in: veth(X) > [ 8739.994457] Supported: Yes > [ 8739.994461] > [ 8739.994465] Pid: 17088, comm: memcg_process_s Tainted: G            X > [ 8739.994477] RIP: 0010:[]  [] free_pcppages_bulk+0xcc/0x450 > [ 8739.994483] RSP: 0000:ffff881c2926f7a8  EFLAGS: 00010082 > [ 8739.994488] RAX: 0000000000000010 RBX: 0000000000000000 RCX: ffff881e7f4546c8 > [ 8739.994491] RDX: ffff881e7f4546b0 RSI: 0000000000000000 RDI: 0000000000000167 > [ 8739.994498] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 8739.994502] R10: 0000000000000166 R11: ffffea0060ea0e50 R12: ffffffffffffffd8 > [ 8739.994506] R13: 0000000000000001 R14: ffff881c7ffd9e00 R15: 0000000000000000 > [ 8739.994511] FS:  00007f5072690700(0000) GS:ffff881e7f440000(0000) knlGS:0000000000000000 > [ 8739.994517] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 8739.994522] CR2: 0000000000000008 CR3: 0000001e1f1f9000 CR4: 00000000000006e0 > [ 8739.994525] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 8739.994530] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 8739.994535] Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0) > [ 8739.994539] Stack: > [ 8739.994541]  0000000000000000 ffff881e7f4546c8 0000000000000010 ffff881c7ffd9e60 > [ 8739.994557]  ffff881e7f4546b0 0000001f814498ee 0000000000000000 0000001d81245255 > [ 8739.994565]  ffff881e7f4546c0 ffffea005ecd2f40 ffff881e7f4546b0 0020000000200010 > [ 8739.994573] Call Trace: > [ 8739.994590]  [] free_hot_cold_page+0x17e/0x1f0 > [ 8739.994600]  [] __pagevec_free+0x90/0xb0 > [ 8739.994610]  [] release_pages+0x22a/0x260 > [ 8739.994617]  [] pagevec_lru_move_fn+0xf3/0x110 > [ 8739.994627]  [] putback_lru_page+0x66/0xe0 > [ 8739.994639]  [] unmap_and_move+0x156/0x180 > [ 8739.994647]  [] migrate_pages+0x9e/0x1b0 > [ 8739.994656]  [] compact_zone+0x1f3/0x2f0 > [ 8739.994665]  [] compact_zone_order+0xa2/0xe0 > [ 8739.994672]  [] try_to_compact_pages+0xdf/0x110 > [ 8739.994678]  [] __alloc_pages_direct_compact+0xee/0x1c0 > [ 8739.994686]  [] __alloc_pages_slowpath+0x370/0x830 > [ 8739.994694]  [] __alloc_pages_nodemask+0x1b1/0x1c0 > [ 8739.994701]  [] alloc_pages_vma+0x9b/0x160 > [ 8739.994712]  [] do_huge_pmd_anonymous_page+0x160/0x270 > [ 8739.994725]  [] do_page_fault+0x207/0x4c0 > [ 8739.994735]  [] page_fault+0x25/0x30 > [ 8739.994748]  [<0000000000400997>] 0x400996 > > The "X" in the taint flag means that external modules were loaded but > but is unrelated to the bug triggering. The real problem was because > the PFN layout looks like this > > [    0.000000] Zone PFN ranges: > [    0.000000]   DMA      0x00000010 -> 0x00001000 > [    0.000000]   DMA32    0x00001000 -> 0x00100000 > [    0.000000]   Normal   0x00100000 -> 0x01e80000 > [    0.000000] Movable zone start PFN for each node > [    0.000000] early_node_map[14] active PFN ranges > [    0.000000]     0: 0x00000010 -> 0x0000009b > [    0.000000]     0: 0x00000100 -> 0x0007a1ec > [    0.000000]     0: 0x0007a354 -> 0x0007a379 > [    0.000000]     0: 0x0007f7ff -> 0x0007f800 > [    0.000000]     0: 0x00100000 -> 0x00680000 > [    0.000000]     1: 0x00680000 -> 0x00e80000 > [    0.000000]     0: 0x00e80000 -> 0x01080000 > [    0.000000]     1: 0x01080000 -> 0x01280000 > [    0.000000]     0: 0x01280000 -> 0x01480000 > [    0.000000]     1: 0x01480000 -> 0x01680000 > [    0.000000]     0: 0x01680000 -> 0x01880000 > [    0.000000]     1: 0x01880000 -> 0x01a80000 > [    0.000000]     0: 0x01a80000 -> 0x01c80000 > [    0.000000]     1: 0x01c80000 -> 0x01e80000 > > The fix is straight-forward. isolate_migratepages() has to make a > similar check to isolate_freepage to ensure that it never isolates > pages from a zone it does not hold the LRU lock for. > > This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x > and current mainline. > > Signed-off-by: Mel Gorman > Cc: Acked-by: Hillf Danton > --- >  mm/compaction.c |   11 ++++++++++- >  1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index bd6e739..6042644 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -330,8 +330,17 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, >                        continue; >                nr_scanned++; > > -               /* Get the page and skip if free */ > +               /* > +                * Get the page and ensure the page is within the same zone. > +                * See the comment in isolate_freepages about overlapping > +                * nodes. It is deliberate that the new zone lock is not taken > +                * as memory compaction should not move pages between nodes. > +                */ >                page = pfn_to_page(low_pfn); > +               if (page_zone(page) != zone) > +                       continue; > + > +               /* Skip if free */ >                if (PageBuddy(page)) >                        continue; > > > -- > Mel Gorman > SUSE Labs > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at  http://vger.kernel.org/majordomo-info.html > Please read the FAQ at  http://www.tux.org/lkml/ > >