From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5350C43381 for ; Thu, 21 Feb 2019 03:17:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A3DA12086C for ; Thu, 21 Feb 2019 03:17:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726986AbfBUDRx (ORCPT ); Wed, 20 Feb 2019 22:17:53 -0500 Received: from mga02.intel.com ([134.134.136.20]:22447 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726016AbfBUDRx (ORCPT ); Wed, 20 Feb 2019 22:17:53 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Feb 2019 19:17:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,393,1544515200"; d="scan'208";a="140330388" Received: from shao2-debian.sh.intel.com (HELO [10.239.13.107]) ([10.239.13.107]) by orsmga001.jf.intel.com with ESMTP; 20 Feb 2019 19:17:50 -0800 Subject: Re: [LKP] [RFC PATCH] mm, memory_hotplug: fix off-by-one in is_pageblock_removable To: Michal Hocko Cc: Oscar Salvador , LKML , Matthew Wilcox , Mike Rapoport , linux-mm@kvack.org, Andrew Morton , lkp@01.org References: <20190218052823.GH29177@shao2-debian> <20190218181544.14616-1-mhocko@kernel.org> <20190220125732.GB4525@dhcp22.suse.cz> From: Rong Chen Message-ID: Date: Thu, 21 Feb 2019 11:18:07 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190220125732.GB4525@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, The patch can fix the issue for me. Best Regards, Rong Chen On 2/20/19 8:57 PM, Michal Hocko wrote: > Rong Chen, > coudl you double check this indeed fixes the issue for you please? > > On Mon 18-02-19 19:15:44, Michal Hocko wrote: >> From: Michal Hocko >> >> Rong Chen has reported the following boot crash >> [ 40.305212] PGD 0 P4D 0 >> [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI >> [ 40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 #1 >> [ 40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 >> [ 40.330813] RIP: 0010:page_mapping+0x12/0x80 >> [ 40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48 >> [ 40.356704] RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202 >> [ 40.362714] RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a >> [ 40.370798] RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000 >> [ 40.378830] RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13 >> [ 40.386902] R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000 >> [ 40.395033] R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001 >> [ 40.403138] FS: 00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000 >> [ 40.412243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 40.418846] CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0 >> [ 40.426951] Call Trace: >> [ 40.429843] __dump_page+0x14/0x2c0 >> [ 40.433947] is_mem_section_removable+0x24c/0x2c0 >> [ 40.439327] removable_show+0x87/0xa0 >> [ 40.443613] dev_attr_show+0x25/0x60 >> [ 40.447763] sysfs_kf_seq_show+0xba/0x110 >> [ 40.452363] seq_read+0x196/0x3f0 >> [ 40.456282] __vfs_read+0x34/0x180 >> [ 40.460233] ? lock_acquire+0xb6/0x1e0 >> [ 40.464610] vfs_read+0xa0/0x150 >> [ 40.468372] ksys_read+0x44/0xb0 >> [ 40.472129] ? do_syscall_64+0x1f/0x4a0 >> [ 40.476593] do_syscall_64+0x5e/0x4a0 >> [ 40.480809] ? trace_hardirqs_off_thunk+0x1a/0x1c >> [ 40.486195] entry_SYSCALL_64_after_hwframe+0x49/0xbe >> >> and bisected it down to efad4e475c31 ("mm, memory_hotplug: >> is_mem_section_removable do not pass the end of a zone"). The reason for >> the crash is that the mapping is garbage for poisoned (uninitialized) page. >> This shouldn't happen as all pages in the zone's boundary should be >> initialized. Later debugging revealed that the actual problem is an >> off-by-one when evaluating the end_page. start_pfn + nr_pages resp. >> zone_end_pfn refers to a pfn after the range and as such it might belong >> to a differen memory section. This along with CONFIG_SPARSEMEM then >> makes the loop condition completely bogus because a pointer arithmetic >> doesn't work for pages from two different sections in that memory model. >> >> Fix the issue by reworking is_pageblock_removable to be pfn based and >> only use struct page where necessary. This makes the code slightly >> easier to follow and we will remove the problematic pointer arithmetic >> completely. >> >> Fixes: efad4e475c31 ("mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone") >> Reported-by: >> Signed-off-by: Michal Hocko >> --- >> mm/memory_hotplug.c | 27 +++++++++++++++------------ >> 1 file changed, 15 insertions(+), 12 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 124e794867c5..1ad28323fb9f 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1188,11 +1188,13 @@ static inline int pageblock_free(struct page *page) >> return PageBuddy(page) && page_order(page) >= pageblock_order; >> } >> >> -/* Return the start of the next active pageblock after a given page */ >> -static struct page *next_active_pageblock(struct page *page) >> +/* Return the pfn of the start of the next active pageblock after a given pfn */ >> +static unsigned long next_active_pageblock(unsigned long pfn) >> { >> + struct page *page = pfn_to_page(pfn); >> + >> /* Ensure the starting page is pageblock-aligned */ >> - BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); >> + BUG_ON(pfn & (pageblock_nr_pages - 1)); >> >> /* If the entire pageblock is free, move to the end of free page */ >> if (pageblock_free(page)) { >> @@ -1200,16 +1202,16 @@ static struct page *next_active_pageblock(struct page *page) >> /* be careful. we don't have locks, page_order can be changed.*/ >> order = page_order(page); >> if ((order < MAX_ORDER) && (order >= pageblock_order)) >> - return page + (1 << order); >> + return pfn + (1 << order); >> } >> >> - return page + pageblock_nr_pages; >> + return pfn + pageblock_nr_pages; >> } >> >> -static bool is_pageblock_removable_nolock(struct page *page) >> +static bool is_pageblock_removable_nolock(unsigned long pfn) >> { >> + struct page *page = pfn_to_page(pfn); >> struct zone *zone; >> - unsigned long pfn; >> >> /* >> * We have to be careful here because we are iterating over memory >> @@ -1232,13 +1234,14 @@ static bool is_pageblock_removable_nolock(struct page *page) >> /* Checks if this range of memory is likely to be hot-removable. */ >> bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) >> { >> - struct page *page = pfn_to_page(start_pfn); >> - unsigned long end_pfn = min(start_pfn + nr_pages, zone_end_pfn(page_zone(page))); >> - struct page *end_page = pfn_to_page(end_pfn); >> + unsigned long end_pfn, pfn; >> + >> + end_pfn = min(start_pfn + nr_pages, >> + zone_end_pfn(page_zone(pfn_to_page(start_pfn)))); >> >> /* Check the starting page of each pageblock within the range */ >> - for (; page < end_page; page = next_active_pageblock(page)) { >> - if (!is_pageblock_removable_nolock(page)) >> + for (pfn = start_pfn; pfn < end_pfn; pfn = next_active_pageblock(pfn)) { >> + if (!is_pageblock_removable_nolock(pfn)) >> return false; >> cond_resched(); >> } >> -- >> 2.20.1 >> From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============7262987998109630829==" MIME-Version: 1.0 From: Rong Chen To: lkp@lists.01.org Subject: Re: [RFC PATCH] mm, memory_hotplug: fix off-by-one in is_pageblock_removable Date: Thu, 21 Feb 2019 11:18:07 +0800 Message-ID: In-Reply-To: <20190220125732.GB4525@dhcp22.suse.cz> List-Id: --===============7262987998109630829== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi, The patch can fix the issue for me. Best Regards, Rong Chen On 2/20/19 8:57 PM, Michal Hocko wrote: > Rong Chen, > coudl you double check this indeed fixes the issue for you please? > > On Mon 18-02-19 19:15:44, Michal Hocko wrote: >> From: Michal Hocko >> >> Rong Chen has reported the following boot crash >> [ 40.305212] PGD 0 P4D 0 >> [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI >> [ 40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-g= efad4e4 #1 >> [ 40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BI= OS 1.10.2-1 04/01/2014 >> [ 40.330813] RIP: 0010:page_mapping+0x12/0x80 >> [ 40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b = 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da = <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48 >> [ 40.356704] RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202 >> [ 40.362714] RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 00000000= 0000000a >> [ 40.370798] RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff8880= 1e5c0000 >> [ 40.378830] RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 00000000= 01b64d13 >> [ 40.386902] R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff8880= 1e640000 >> [ 40.395033] R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 00000000= 00000001 >> [ 40.403138] FS: 00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knl= GS:0000000000000000 >> [ 40.412243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 40.418846] CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000= 000006a0 >> [ 40.426951] Call Trace: >> [ 40.429843] __dump_page+0x14/0x2c0 >> [ 40.433947] is_mem_section_removable+0x24c/0x2c0 >> [ 40.439327] removable_show+0x87/0xa0 >> [ 40.443613] dev_attr_show+0x25/0x60 >> [ 40.447763] sysfs_kf_seq_show+0xba/0x110 >> [ 40.452363] seq_read+0x196/0x3f0 >> [ 40.456282] __vfs_read+0x34/0x180 >> [ 40.460233] ? lock_acquire+0xb6/0x1e0 >> [ 40.464610] vfs_read+0xa0/0x150 >> [ 40.468372] ksys_read+0x44/0xb0 >> [ 40.472129] ? do_syscall_64+0x1f/0x4a0 >> [ 40.476593] do_syscall_64+0x5e/0x4a0 >> [ 40.480809] ? trace_hardirqs_off_thunk+0x1a/0x1c >> [ 40.486195] entry_SYSCALL_64_after_hwframe+0x49/0xbe >> >> and bisected it down to efad4e475c31 ("mm, memory_hotplug: >> is_mem_section_removable do not pass the end of a zone"). The reason for >> the crash is that the mapping is garbage for poisoned (uninitialized) pa= ge. >> This shouldn't happen as all pages in the zone's boundary should be >> initialized. Later debugging revealed that the actual problem is an >> off-by-one when evaluating the end_page. start_pfn + nr_pages resp. >> zone_end_pfn refers to a pfn after the range and as such it might belong >> to a differen memory section. This along with CONFIG_SPARSEMEM then >> makes the loop condition completely bogus because a pointer arithmetic >> doesn't work for pages from two different sections in that memory model. >> >> Fix the issue by reworking is_pageblock_removable to be pfn based and >> only use struct page where necessary. This makes the code slightly >> easier to follow and we will remove the problematic pointer arithmetic >> completely. >> >> Fixes: efad4e475c31 ("mm, memory_hotplug: is_mem_section_removable do no= t pass the end of a zone") >> Reported-by: >> Signed-off-by: Michal Hocko >> --- >> mm/memory_hotplug.c | 27 +++++++++++++++------------ >> 1 file changed, 15 insertions(+), 12 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 124e794867c5..1ad28323fb9f 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1188,11 +1188,13 @@ static inline int pageblock_free(struct page *pa= ge) >> return PageBuddy(page) && page_order(page) >=3D pageblock_order; >> } >> = >> -/* Return the start of the next active pageblock after a given page */ >> -static struct page *next_active_pageblock(struct page *page) >> +/* Return the pfn of the start of the next active pageblock after a giv= en pfn */ >> +static unsigned long next_active_pageblock(unsigned long pfn) >> { >> + struct page *page =3D pfn_to_page(pfn); >> + >> /* Ensure the starting page is pageblock-aligned */ >> - BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1)); >> + BUG_ON(pfn & (pageblock_nr_pages - 1)); >> = >> /* If the entire pageblock is free, move to the end of free page */ >> if (pageblock_free(page)) { >> @@ -1200,16 +1202,16 @@ static struct page *next_active_pageblock(struct= page *page) >> /* be careful. we don't have locks, page_order can be changed.*/ >> order =3D page_order(page); >> if ((order < MAX_ORDER) && (order >=3D pageblock_order)) >> - return page + (1 << order); >> + return pfn + (1 << order); >> } >> = >> - return page + pageblock_nr_pages; >> + return pfn + pageblock_nr_pages; >> } >> = >> -static bool is_pageblock_removable_nolock(struct page *page) >> +static bool is_pageblock_removable_nolock(unsigned long pfn) >> { >> + struct page *page =3D pfn_to_page(pfn); >> struct zone *zone; >> - unsigned long pfn; >> = >> /* >> * We have to be careful here because we are iterating over memory >> @@ -1232,13 +1234,14 @@ static bool is_pageblock_removable_nolock(struct= page *page) >> /* Checks if this range of memory is likely to be hot-removable. */ >> bool is_mem_section_removable(unsigned long start_pfn, unsigned long n= r_pages) >> { >> - struct page *page =3D pfn_to_page(start_pfn); >> - unsigned long end_pfn =3D min(start_pfn + nr_pages, zone_end_pfn(page_= zone(page))); >> - struct page *end_page =3D pfn_to_page(end_pfn); >> + unsigned long end_pfn, pfn; >> + >> + end_pfn =3D min(start_pfn + nr_pages, >> + zone_end_pfn(page_zone(pfn_to_page(start_pfn)))); >> = >> /* Check the starting page of each pageblock within the range */ >> - for (; page < end_page; page =3D next_active_pageblock(page)) { >> - if (!is_pageblock_removable_nolock(page)) >> + for (pfn =3D start_pfn; pfn < end_pfn; pfn =3D next_active_pageblock(p= fn)) { >> + if (!is_pageblock_removable_nolock(pfn)) >> return false; >> cond_resched(); >> } >> -- = >> 2.20.1 >> --===============7262987998109630829==--