All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Rong Chen <rong.a.chen@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-kernel@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>, LKP <lkp@01.org>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI
Date: Mon, 18 Feb 2019 11:30:13 +0100	[thread overview]
Message-ID: <20190218103013.GK4525@dhcp22.suse.cz> (raw)
In-Reply-To: <4c75d424-2c51-0d7d-5c28-78c15600e93c@intel.com>

On Mon 18-02-19 18:01:39, Rong Chen wrote:
> 
> On 2/18/19 4:55 PM, Michal Hocko wrote:
> > [Sorry for an excessive quoting in the previous email]
> > [Cc Pavel - the full report is http://lkml.kernel.org/r/20190218052823.GH29177@shao2-debian[]
> > 
> > On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > [...]
> > > > [   40.305212] PGD 0 P4D 0
> > > > [   40.308255] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 #1
> > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
> > > > [   40.356704] RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202
> > > > [   40.362714] RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a
> > > > [   40.370798] RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000
> > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13
> > > > [   40.386902] R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000
> > > > [   40.395033] R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001
> > > > [   40.403138] FS:  00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000
> > > > [   40.412243] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   40.418846] CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0
> > > > [   40.426951] Call Trace:
> > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > This looks like we are stumbling over an unitialized struct page again.
> > > Something this patch should prevent from. Could you try to apply [1]
> > > which will make __dump_page more robust so that we do not blow up there
> > > and give some more details in return.
> > > 
> > > Btw. is this reproducible all the time?
> > And forgot to ask whether this is reproducible with pending mmotm
> > patches in linux-next.
> 
> 
> Do you mean the below patch? I can reproduce the problem too.

Yes, thanks for the swift response. The patch has just added a debugging
output
[    0.013697] Early memory node ranges
[    0.013701]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.013706]   node   0: [mem 0x0000000000100000-0x000000001ffdffff]
[    0.013711] zeroying 0-1

This is the first pfn.

[    0.013715] zeroying 9f-100

this is [mem 0x9f000, 0xfffff] so it fills up the whole hole between the
above two ranges. This is definitely good.

[    0.013722] zeroying 1ffe0-1ffe0

this is a single page at 0x1ffe0000 right after the zone end.

[    0.013727] Zeroed struct page in unavailable ranges: 98 pages

Hmm, so this is getting really interesting. The whole zone range should
be covered. So this is either some off-by-one or I something that I am
missing right now. Could you apply the following on top please? We
definitely need to see what pfn this is.


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 124e794867c5..59bcfd934e37 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1232,12 +1232,14 @@ static bool is_pageblock_removable_nolock(struct page *page)
 /* Checks if this range of memory is likely to be hot-removable. */
 bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 {
-	struct page *page = pfn_to_page(start_pfn);
+	struct page *page = pfn_to_page(start_pfn), *first_page;
 	unsigned long end_pfn = min(start_pfn + nr_pages, zone_end_pfn(page_zone(page)));
 	struct page *end_page = pfn_to_page(end_pfn);
 
 	/* Check the starting page of each pageblock within the range */
-	for (; page < end_page; page = next_active_pageblock(page)) {
+	for (first_page = page; page < end_page; page = next_active_pageblock(page)) {
+		if (PagePoisoned(page))
+			pr_info("Unexpected poisoned page %px pfn:%lx\n", page, start_pfn + page-first_page);
 		if (!is_pageblock_removable_nolock(page))
 			return false;
 		cond_resched();
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: lkp@lists.01.org
Subject: Re: efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI
Date: Mon, 18 Feb 2019 11:30:13 +0100	[thread overview]
Message-ID: <20190218103013.GK4525@dhcp22.suse.cz> (raw)
In-Reply-To: <4c75d424-2c51-0d7d-5c28-78c15600e93c@intel.com>

[-- Attachment #1: Type: text/plain, Size: 4379 bytes --]

On Mon 18-02-19 18:01:39, Rong Chen wrote:
> 
> On 2/18/19 4:55 PM, Michal Hocko wrote:
> > [Sorry for an excessive quoting in the previous email]
> > [Cc Pavel - the full report is http://lkml.kernel.org/r/20190218052823.GH29177(a)shao2-debian[]
> > 
> > On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > [...]
> > > > [   40.305212] PGD 0 P4D 0
> > > > [   40.308255] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 #1
> > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
> > > > [   40.356704] RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202
> > > > [   40.362714] RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a
> > > > [   40.370798] RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000
> > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13
> > > > [   40.386902] R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000
> > > > [   40.395033] R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001
> > > > [   40.403138] FS:  00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000
> > > > [   40.412243] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   40.418846] CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0
> > > > [   40.426951] Call Trace:
> > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > This looks like we are stumbling over an unitialized struct page again.
> > > Something this patch should prevent from. Could you try to apply [1]
> > > which will make __dump_page more robust so that we do not blow up there
> > > and give some more details in return.
> > > 
> > > Btw. is this reproducible all the time?
> > And forgot to ask whether this is reproducible with pending mmotm
> > patches in linux-next.
> 
> 
> Do you mean the below patch? I can reproduce the problem too.

Yes, thanks for the swift response. The patch has just added a debugging
output
[    0.013697] Early memory node ranges
[    0.013701]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.013706]   node   0: [mem 0x0000000000100000-0x000000001ffdffff]
[    0.013711] zeroying 0-1

This is the first pfn.

[    0.013715] zeroying 9f-100

this is [mem 0x9f000, 0xfffff] so it fills up the whole hole between the
above two ranges. This is definitely good.

[    0.013722] zeroying 1ffe0-1ffe0

this is a single page at 0x1ffe0000 right after the zone end.

[    0.013727] Zeroed struct page in unavailable ranges: 98 pages

Hmm, so this is getting really interesting. The whole zone range should
be covered. So this is either some off-by-one or I something that I am
missing right now. Could you apply the following on top please? We
definitely need to see what pfn this is.


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 124e794867c5..59bcfd934e37 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1232,12 +1232,14 @@ static bool is_pageblock_removable_nolock(struct page *page)
 /* Checks if this range of memory is likely to be hot-removable. */
 bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 {
-	struct page *page = pfn_to_page(start_pfn);
+	struct page *page = pfn_to_page(start_pfn), *first_page;
 	unsigned long end_pfn = min(start_pfn + nr_pages, zone_end_pfn(page_zone(page)));
 	struct page *end_page = pfn_to_page(end_pfn);
 
 	/* Check the starting page of each pageblock within the range */
-	for (; page < end_page; page = next_active_pageblock(page)) {
+	for (first_page = page; page < end_page; page = next_active_pageblock(page)) {
+		if (PagePoisoned(page))
+			pr_info("Unexpected poisoned page %px pfn:%lx\n", page, start_pfn + page-first_page);
 		if (!is_pageblock_removable_nolock(page))
 			return false;
 		cond_resched();
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-02-18 10:30 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18  5:28 [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI kernel test robot
2019-02-18  5:28 ` kernel test robot
2019-02-18  7:08 ` [LKP] " Michal Hocko
2019-02-18  7:08   ` Michal Hocko
2019-02-18  8:47   ` [LKP] " Rong Chen
2019-02-18  8:47     ` Rong Chen
2019-02-18  9:03     ` [LKP] " Michal Hocko
2019-02-18  9:03       ` Michal Hocko
2019-02-18  9:11       ` [LKP] " Rong Chen
2019-02-18  9:11         ` Rong Chen
2019-02-18  9:29         ` [LKP] " Michal Hocko
2019-02-18  9:29           ` Michal Hocko
2019-02-18  8:55   ` [LKP] " Michal Hocko
2019-02-18  8:55     ` Michal Hocko
2019-02-18 10:01     ` [LKP] " Rong Chen
2019-02-18 10:01       ` Rong Chen
2019-02-18 10:30       ` Michal Hocko [this message]
2019-02-18 10:30         ` Michal Hocko
2019-02-18 14:05         ` [LKP] " Mike Rapoport
2019-02-18 15:20           ` Michal Hocko
2019-02-18 15:20             ` Michal Hocko
2019-02-18 15:22             ` [LKP] " Michal Hocko
2019-02-18 15:22               ` Michal Hocko
2019-02-18 16:48               ` [LKP] " Mike Rapoport
2019-02-18 17:05                 ` Michal Hocko
2019-02-18 17:05                   ` Michal Hocko
2019-02-18 17:48                   ` [LKP] " Mike Rapoport
2019-02-18 17:57                   ` Matthew Wilcox
2019-02-18 17:57                     ` Matthew Wilcox
2019-02-18 18:11                     ` [LKP] " Michal Hocko
2019-02-18 18:11                       ` Michal Hocko
2019-02-18 19:05                       ` [LKP] " Matthew Wilcox
2019-02-18 19:05                         ` Matthew Wilcox
2019-02-18 18:15 ` [RFC PATCH] mm, memory_hotplug: fix off-by-one in is_pageblock_removable Michal Hocko
2019-02-18 18:15   ` Michal Hocko
2019-02-18 18:31   ` Mike Rapoport
2019-02-20  8:33   ` Oscar Salvador
2019-02-20  8:33     ` Oscar Salvador
2019-02-20 12:57   ` Michal Hocko
2019-02-20 12:57     ` Michal Hocko
2019-02-21  3:18     ` [LKP] " Rong Chen
2019-02-21  3:18       ` Rong Chen
2019-02-21  7:25       ` [LKP] " Michal Hocko
2019-02-21  7:25         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190218103013.GK4525@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@01.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=rong.a.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.