Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers
@ 2019-10-09 14:24 David Hildenbrand
  2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand
  2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand
  0 siblings, 2 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-10-09 14:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Alexey Dobriyan, Andrew Morton,
	Aneesh Kumar K.V, Anthony Yznaga, Dan Williams, Michal Hocko,
	Mike Rapoport, Naoya Horiguchi, Pankaj gupta, Qian Cai,
	Stephen Rothwell, Toshiki Fukasawa

This is the follow-up of:
  [PATCH v1] mm: Fix access of uninitialized memmaps in fs/proc/page.c

We have multiple places where we might access uninitialized memmaps and
trigger kernel BUGs. Make sure to only access initialized memmaps.

Some of these places got easier to trigger with:
  [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory
As memmaps are now also poisoned when memory is offlined, before it is
actually removed.

v1 -> v2:
- Drop ZONE_DEVICE support from the /proc/k... files as requested by Michal
- Further simplify the code
- Split up into two patches

David Hildenbrand (2):
  mm: Don't access uninitialized memmaps in fs/proc/page.c
  mm/memory-failure.c: Don't access uninitialized memmaps in
    memory_failure()

 fs/proc/page.c      | 28 ++++++++++++++++------------
 mm/memory-failure.c | 14 ++++++++------
 2 files changed, 24 insertions(+), 18 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c
  2019-10-09 14:24 [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers David Hildenbrand
@ 2019-10-09 14:24 ` David Hildenbrand
  2019-10-09 14:41   ` Michal Hocko
  2019-10-14  8:44   ` David Hildenbrand
  2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand
  1 sibling, 2 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-10-09 14:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Qian Cai, Dan Williams,
	Alexey Dobriyan, Andrew Morton, Stephen Rothwell,
	Toshiki Fukasawa, Pankaj gupta, Mike Rapoport, Anthony Yznaga,
	Michal Hocko, Aneesh Kumar K.V, linux-fsdevel

There are three places where we access uninitialized memmaps, namely:
- /proc/kpagecount
- /proc/kpageflags
- /proc/kpagecgroup

We have initialized memmaps either when the section is online or when
the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain
garbage and in the worst case trigger kernel BUGs, especially with
CONFIG_PAGE_POISONING.

For example, not onlining a DIMM during boot and calling /proc/kpagecount
with CONFIG_PAGE_POISONING:
:/# cat /proc/kpagecount > tmp.test
[   95.600592] BUG: unable to handle page fault for address: fffffffffffffffe
[   95.601238] #PF: supervisor read access in kernel mode
[   95.601675] #PF: error_code(0x0000) - not-present page
[   95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
[   95.602596] Oops: 0000 [#1] SMP NOPTI
[   95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
[   95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
[   95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0
[   95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
[   95.606450] RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
[   95.606904] RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
[   95.607519] RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
[   95.608128] RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
[   95.608731] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
[   95.609327] R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
[   95.609924] FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
[   95.610599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.611083] CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
[   95.611686] Call Trace:
[   95.611906]  proc_reg_read+0x3c/0x60
[   95.612228]  vfs_read+0xc5/0x180
[   95.612505]  ksys_read+0x68/0xe0
[   95.612785]  do_syscall_64+0x5c/0xa0
[   95.613092]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

For now, let's drop support for ZONE_DEVICE from the three pseudo files
in order to fix this. To distinguish offline memory (with garbage memmap)
from ZONE_DEVICE memory with properly initialized memmaps, we would have to
check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage
of both (especially, special casing devmem) is frowned upon and needs to
be reworked. The fundamental issue we have is:

	if (pfn_to_online_page(pfn)) {
		/* memmap initialized */
	} else if (pfn_valid(pfn)) {
		/*
		 * ???
		 * a) offline memory. memmap garbage.
		 * b) devmem: memmap initialized to ZONE_DEVICE.
		 * c) devmem: reserved for driver. memmap garbage.
		 * (d) devmem: memmap currently initializing - garbage)
		 */
	}

We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
in place as that function is also used from memory failure. We now
no longer dump information about pages that are not in use anymore -
offline.

Reported-by: Qian Cai <cai@lca.pw>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/page.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index decd3fe39674..e40dbfe1168e 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -42,10 +42,12 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
 		return -EINVAL;
 
 	while (count > 0) {
-		if (pfn_valid(pfn))
-			ppage = pfn_to_page(pfn);
-		else
-			ppage = NULL;
+		/*
+		 * TODO: ZONE_DEVICE support requires to identify
+		 * memmaps that were actually initialized.
+		 */
+		ppage = pfn_to_online_page(pfn);
+
 		if (!ppage || PageSlab(ppage) || page_has_type(ppage))
 			pcount = 0;
 		else
@@ -218,10 +220,11 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
 		return -EINVAL;
 
 	while (count > 0) {
-		if (pfn_valid(pfn))
-			ppage = pfn_to_page(pfn);
-		else
-			ppage = NULL;
+		/*
+		 * TODO: ZONE_DEVICE support requires to identify
+		 * memmaps that were actually initialized.
+		 */
+		ppage = pfn_to_online_page(pfn);
 
 		if (put_user(stable_page_flags(ppage), out)) {
 			ret = -EFAULT;
@@ -263,10 +266,11 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
 		return -EINVAL;
 
 	while (count > 0) {
-		if (pfn_valid(pfn))
-			ppage = pfn_to_page(pfn);
-		else
-			ppage = NULL;
+		/*
+		 * TODO: ZONE_DEVICE support requires to identify
+		 * memmaps that were actually initialized.
+		 */
+		ppage = pfn_to_online_page(pfn);
 
 		if (ppage)
 			ino = page_cgroup_ino(ppage);
-- 
2.21.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-09 14:24 [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers David Hildenbrand
  2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand
@ 2019-10-09 14:24 ` David Hildenbrand
  2019-10-09 14:43   ` Michal Hocko
                     ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-10-09 14:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Naoya Horiguchi, Andrew Morton,
	Michal Hocko

We should check for pfn_to_online_page() to not access uninitialized
memmaps. Reshuffle the code so we don't have to duplicate the error
message.

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory-failure.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7ef849da8278..e866e6e5660b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
 	if (!sysctl_memory_failure_recovery)
 		panic("Memory failure on page %lx", pfn);
 
-	if (!pfn_valid(pfn)) {
+	p = pfn_to_online_page(pfn);
+	if (!p) {
+		if (pfn_valid(pfn)) {
+			pgmap = get_dev_pagemap(pfn, NULL);
+			if (pgmap)
+				return memory_failure_dev_pagemap(pfn, flags,
+								  pgmap);
+		}
 		pr_err("Memory failure: %#lx: memory outside kernel control\n",
 			pfn);
 		return -ENXIO;
 	}
 
-	pgmap = get_dev_pagemap(pfn, NULL);
-	if (pgmap)
-		return memory_failure_dev_pagemap(pfn, flags, pgmap);
-
-	p = pfn_to_page(pfn);
 	if (PageHuge(p))
 		return memory_failure_hugetlb(pfn, flags);
 	if (TestSetPageHWPoison(p)) {
-- 
2.21.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c
  2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand
@ 2019-10-09 14:41   ` Michal Hocko
  2019-10-14  8:44   ` David Hildenbrand
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2019-10-09 14:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Qian Cai, Dan Williams, Alexey Dobriyan,
	Andrew Morton, Stephen Rothwell, Toshiki Fukasawa, Pankaj gupta,
	Mike Rapoport, Anthony Yznaga, Aneesh Kumar K.V, linux-fsdevel

On Wed 09-10-19 16:24:34, David Hildenbrand wrote:
> There are three places where we access uninitialized memmaps, namely:
> - /proc/kpagecount
> - /proc/kpageflags
> - /proc/kpagecgroup
> 
> We have initialized memmaps either when the section is online or when
> the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain
> garbage and in the worst case trigger kernel BUGs, especially with
> CONFIG_PAGE_POISONING.
> 
> For example, not onlining a DIMM during boot and calling /proc/kpagecount
> with CONFIG_PAGE_POISONING:
> :/# cat /proc/kpagecount > tmp.test
> [   95.600592] BUG: unable to handle page fault for address: fffffffffffffffe
> [   95.601238] #PF: supervisor read access in kernel mode
> [   95.601675] #PF: error_code(0x0000) - not-present page
> [   95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
> [   95.602596] Oops: 0000 [#1] SMP NOPTI
> [   95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
> [   95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [   95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0
> [   95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
> [   95.606450] RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
> [   95.606904] RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
> [   95.607519] RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
> [   95.608128] RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
> [   95.608731] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
> [   95.609327] R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
> [   95.609924] FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
> [   95.610599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   95.611083] CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
> [   95.611686] Call Trace:
> [   95.611906]  proc_reg_read+0x3c/0x60
> [   95.612228]  vfs_read+0xc5/0x180
> [   95.612505]  ksys_read+0x68/0xe0
> [   95.612785]  do_syscall_64+0x5c/0xa0
> [   95.613092]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> For now, let's drop support for ZONE_DEVICE from the three pseudo files
> in order to fix this. To distinguish offline memory (with garbage memmap)
> from ZONE_DEVICE memory with properly initialized memmaps, we would have to
> check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage
> of both (especially, special casing devmem) is frowned upon and needs to
> be reworked. The fundamental issue we have is:
> 
> 	if (pfn_to_online_page(pfn)) {
> 		/* memmap initialized */
> 	} else if (pfn_valid(pfn)) {
> 		/*
> 		 * ???
> 		 * a) offline memory. memmap garbage.
> 		 * b) devmem: memmap initialized to ZONE_DEVICE.
> 		 * c) devmem: reserved for driver. memmap garbage.
> 		 * (d) devmem: memmap currently initializing - garbage)
> 		 */
> 	}
> 
> We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
> in place as that function is also used from memory failure. We now
> no longer dump information about pages that are not in use anymore -
> offline.
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
> Cc: Pankaj gupta <pagupta@redhat.com>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319

usage of pfn_to_online_page is the right way to dereference the pfn
here. My understanding of the zone device internals is limited to see
whether this fixes it as well.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  fs/proc/page.c | 28 ++++++++++++++++------------
>  1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index decd3fe39674..e40dbfe1168e 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -42,10 +42,12 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
>  		return -EINVAL;
>  
>  	while (count > 0) {
> -		if (pfn_valid(pfn))
> -			ppage = pfn_to_page(pfn);
> -		else
> -			ppage = NULL;
> +		/*
> +		 * TODO: ZONE_DEVICE support requires to identify
> +		 * memmaps that were actually initialized.
> +		 */
> +		ppage = pfn_to_online_page(pfn);
> +
>  		if (!ppage || PageSlab(ppage) || page_has_type(ppage))
>  			pcount = 0;
>  		else
> @@ -218,10 +220,11 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
>  		return -EINVAL;
>  
>  	while (count > 0) {
> -		if (pfn_valid(pfn))
> -			ppage = pfn_to_page(pfn);
> -		else
> -			ppage = NULL;
> +		/*
> +		 * TODO: ZONE_DEVICE support requires to identify
> +		 * memmaps that were actually initialized.
> +		 */
> +		ppage = pfn_to_online_page(pfn);
>  
>  		if (put_user(stable_page_flags(ppage), out)) {
>  			ret = -EFAULT;
> @@ -263,10 +266,11 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
>  		return -EINVAL;
>  
>  	while (count > 0) {
> -		if (pfn_valid(pfn))
> -			ppage = pfn_to_page(pfn);
> -		else
> -			ppage = NULL;
> +		/*
> +		 * TODO: ZONE_DEVICE support requires to identify
> +		 * memmaps that were actually initialized.
> +		 */
> +		ppage = pfn_to_online_page(pfn);
>  
>  		if (ppage)
>  			ino = page_cgroup_ino(ppage);
> -- 
> 2.21.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand
@ 2019-10-09 14:43   ` Michal Hocko
  2019-10-10  7:27     ` David Hildenbrand
  2019-10-10  0:26   ` Naoya Horiguchi
  2019-10-14  8:41   ` David Hildenbrand
  2 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-09 14:43 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton

On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
> We should check for pfn_to_online_page() to not access uninitialized
> memmaps. Reshuffle the code so we don't have to duplicate the error
> message.
> 
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory-failure.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7ef849da8278..e866e6e5660b 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>  	if (!sysctl_memory_failure_recovery)
>  		panic("Memory failure on page %lx", pfn);
>  
> -	if (!pfn_valid(pfn)) {
> +	p = pfn_to_online_page(pfn);
> +	if (!p) {
> +		if (pfn_valid(pfn)) {
> +			pgmap = get_dev_pagemap(pfn, NULL);
> +			if (pgmap)
> +				return memory_failure_dev_pagemap(pfn, flags,
> +								  pgmap);
> +		}
>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>  			pfn);
>  		return -ENXIO;

Don't we need that earlier at hwpoison_inject level?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand
  2019-10-09 14:43   ` Michal Hocko
@ 2019-10-10  0:26   ` Naoya Horiguchi
  2019-10-10  7:17     ` David Hildenbrand
  2019-10-14  8:41   ` David Hildenbrand
  2 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2019-10-10  0:26 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko

On Wed, Oct 09, 2019 at 04:24:35PM +0200, David Hildenbrand wrote:
> We should check for pfn_to_online_page() to not access uninitialized
> memmaps. Reshuffle the code so we don't have to duplicate the error
> message.
> 
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory-failure.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7ef849da8278..e866e6e5660b 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>  	if (!sysctl_memory_failure_recovery)
>  		panic("Memory failure on page %lx", pfn);
>  
> -	if (!pfn_valid(pfn)) {
> +	p = pfn_to_online_page(pfn);
> +	if (!p) {
> +		if (pfn_valid(pfn)) {
> +			pgmap = get_dev_pagemap(pfn, NULL);
> +			if (pgmap)
> +				return memory_failure_dev_pagemap(pfn, flags,
> +								  pgmap);
> +		}
>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>  			pfn);
>  		return -ENXIO;
>  	}
>  
> -	pgmap = get_dev_pagemap(pfn, NULL);
> -	if (pgmap)
> -		return memory_failure_dev_pagemap(pfn, flags, pgmap);
> -
> -	p = pfn_to_page(pfn);

This change seems to assume that memory_failure_dev_pagemap() is never
called for online pages. Is it an intended behavior?
Or the concept "online pages" is not applicable to zone device pages?

Thanks,
Naoya Horiguchi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-10  0:26   ` Naoya Horiguchi
@ 2019-10-10  7:17     ` David Hildenbrand
  2019-10-11  6:50       ` Naoya Horiguchi
  2019-10-19  2:05       ` Andrew Morton
  0 siblings, 2 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-10-10  7:17 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko

On 10.10.19 02:26, Naoya Horiguchi wrote:
> On Wed, Oct 09, 2019 at 04:24:35PM +0200, David Hildenbrand wrote:
>> We should check for pfn_to_online_page() to not access uninitialized
>> memmaps. Reshuffle the code so we don't have to duplicate the error
>> message.
>>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/memory-failure.c | 14 ++++++++------
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7ef849da8278..e866e6e5660b 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>  	if (!sysctl_memory_failure_recovery)
>>  		panic("Memory failure on page %lx", pfn);
>>  
>> -	if (!pfn_valid(pfn)) {
>> +	p = pfn_to_online_page(pfn);
>> +	if (!p) {
>> +		if (pfn_valid(pfn)) {
>> +			pgmap = get_dev_pagemap(pfn, NULL);
>> +			if (pgmap)
>> +				return memory_failure_dev_pagemap(pfn, flags,
>> +								  pgmap);
>> +		}
>>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>  			pfn);
>>  		return -ENXIO;
>>  	}
>>  
>> -	pgmap = get_dev_pagemap(pfn, NULL);
>> -	if (pgmap)
>> -		return memory_failure_dev_pagemap(pfn, flags, pgmap);
>> -
>> -	p = pfn_to_page(pfn);
> 
> This change seems to assume that memory_failure_dev_pagemap() is never
> called for online pages. Is it an intended behavior?
> Or the concept "online pages" is not applicable to zone device pages?

Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online
(SECTION_IS_ONLINE). The terminology "online" only applies to pages that
were given to the buddy. And as we support sup-section hotadd for
devmem, we cannot easily make use of the section flag it. I already
proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap
and call it something like pfn_active().

pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we
could use pfn_active() everywhere to test for initialized memmaps (well,
besides some special cases like device reserved memory that does not
span full sub-sections). Until now, nobody volunteered and I have other
things to do.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-09 14:43   ` Michal Hocko
@ 2019-10-10  7:27     ` David Hildenbrand
  2019-10-10  7:35       ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-10-10  7:27 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton

On 09.10.19 16:43, Michal Hocko wrote:
> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
>> We should check for pfn_to_online_page() to not access uninitialized
>> memmaps. Reshuffle the code so we don't have to duplicate the error
>> message.
>>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/memory-failure.c | 14 ++++++++------
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7ef849da8278..e866e6e5660b 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>  	if (!sysctl_memory_failure_recovery)
>>  		panic("Memory failure on page %lx", pfn);
>>  
>> -	if (!pfn_valid(pfn)) {
>> +	p = pfn_to_online_page(pfn);
>> +	if (!p) {
>> +		if (pfn_valid(pfn)) {
>> +			pgmap = get_dev_pagemap(pfn, NULL);
>> +			if (pgmap)
>> +				return memory_failure_dev_pagemap(pfn, flags,
>> +								  pgmap);
>> +		}
>>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>  			pfn);
>>  		return -ENXIO;
> 
> Don't we need that earlier at hwpoison_inject level?
> 

Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
alone would not be sufficient as discussed. We would, again, have to
special-case ZONE_DEVICE via things like get_dev_pagemap() ...

But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:

	/*
	 * Note that the below poison/unpoison interfaces do not involve
	 * hardware status change, hence do not require hardware support.
	 * They are mainly for testing hwpoison in software level.
	 */

So it's not that bad compared to memory_failure() called from real HW or
drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-10  7:27     ` David Hildenbrand
@ 2019-10-10  7:35       ` Michal Hocko
  2019-10-10  7:52         ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-10  7:35 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton

On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
> On 09.10.19 16:43, Michal Hocko wrote:
> > On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
> >> We should check for pfn_to_online_page() to not access uninitialized
> >> memmaps. Reshuffle the code so we don't have to duplicate the error
> >> message.
> >>
> >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  mm/memory-failure.c | 14 ++++++++------
> >>  1 file changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 7ef849da8278..e866e6e5660b 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
> >>  	if (!sysctl_memory_failure_recovery)
> >>  		panic("Memory failure on page %lx", pfn);
> >>  
> >> -	if (!pfn_valid(pfn)) {
> >> +	p = pfn_to_online_page(pfn);
> >> +	if (!p) {
> >> +		if (pfn_valid(pfn)) {
> >> +			pgmap = get_dev_pagemap(pfn, NULL);
> >> +			if (pgmap)
> >> +				return memory_failure_dev_pagemap(pfn, flags,
> >> +								  pgmap);
> >> +		}
> >>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
> >>  			pfn);
> >>  		return -ENXIO;
> > 
> > Don't we need that earlier at hwpoison_inject level?
> > 
> 
> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
> alone would not be sufficient as discussed. We would, again, have to
> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
> 
> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
> 
> 	/*
> 	 * Note that the below poison/unpoison interfaces do not involve
> 	 * hardware status change, hence do not require hardware support.
> 	 * They are mainly for testing hwpoison in software level.
> 	 */
> 
> So it's not that bad compared to memory_failure() called from real HW or
> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()

Yes, this is just a toy. And yes we need to handle zone device pages
here because a) people likely want to test MCE behavior even on these
pages and b) HW can really trigger MCEs there as well. I was just
pointing that the patch is likely incomplete.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-10  7:35       ` Michal Hocko
@ 2019-10-10  7:52         ` David Hildenbrand
  2019-10-10  7:58           ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-10-10  7:52 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton

On 10.10.19 09:35, Michal Hocko wrote:
> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
>> On 09.10.19 16:43, Michal Hocko wrote:
>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
>>>> We should check for pfn_to_online_page() to not access uninitialized
>>>> memmaps. Reshuffle the code so we don't have to duplicate the error
>>>> message.
>>>>
>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  mm/memory-failure.c | 14 ++++++++------
>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 7ef849da8278..e866e6e5660b 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>>>  	if (!sysctl_memory_failure_recovery)
>>>>  		panic("Memory failure on page %lx", pfn);
>>>>  
>>>> -	if (!pfn_valid(pfn)) {
>>>> +	p = pfn_to_online_page(pfn);
>>>> +	if (!p) {
>>>> +		if (pfn_valid(pfn)) {
>>>> +			pgmap = get_dev_pagemap(pfn, NULL);
>>>> +			if (pgmap)
>>>> +				return memory_failure_dev_pagemap(pfn, flags,
>>>> +								  pgmap);
>>>> +		}
>>>>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>>>  			pfn);
>>>>  		return -ENXIO;
>>>
>>> Don't we need that earlier at hwpoison_inject level?
>>>
>>
>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
>> alone would not be sufficient as discussed. We would, again, have to
>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
>>
>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
>>
>> 	/*
>> 	 * Note that the below poison/unpoison interfaces do not involve
>> 	 * hardware status change, hence do not require hardware support.
>> 	 * They are mainly for testing hwpoison in software level.
>> 	 */
>>
>> So it's not that bad compared to memory_failure() called from real HW or
>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
> 
> Yes, this is just a toy. And yes we need to handle zone device pages
> here because a) people likely want to test MCE behavior even on these
> pages and b) HW can really trigger MCEs there as well. I was just
> pointing that the patch is likely incomplete.
> 

I rather think this deserves a separate patch as it is a separate
interface :)

I do wonder why hwpoison_inject() has to perform so much extra work
compared to other memory_failure() users. This smells like legacy
leftovers to me, but I might be wrong. The interface is fairly old,
though. Does anybody know why we need this magic? I can spot quite some
duplicate checks/things getting performed.

Naiive me would just make the interface perform the same as
hard_offline_page_store(). But most probably I am not getting the real
purpose of both different interfaces.

HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have
guessed that fixing this is not urgent.

BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs
fixing to make sure we access initialized memmaps.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-10  7:52         ` David Hildenbrand
@ 2019-10-10  7:58           ` David Hildenbrand
  2019-10-11  6:02             ` Naoya Horiguchi
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-10-10  7:58 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton

On 10.10.19 09:52, David Hildenbrand wrote:
> On 10.10.19 09:35, Michal Hocko wrote:
>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
>>> On 09.10.19 16:43, Michal Hocko wrote:
>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
>>>>> We should check for pfn_to_online_page() to not access uninitialized
>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error
>>>>> message.
>>>>>
>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>>  mm/memory-failure.c | 14 ++++++++------
>>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>> index 7ef849da8278..e866e6e5660b 100644
>>>>> --- a/mm/memory-failure.c
>>>>> +++ b/mm/memory-failure.c
>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>  	if (!sysctl_memory_failure_recovery)
>>>>>  		panic("Memory failure on page %lx", pfn);
>>>>>  
>>>>> -	if (!pfn_valid(pfn)) {
>>>>> +	p = pfn_to_online_page(pfn);
>>>>> +	if (!p) {
>>>>> +		if (pfn_valid(pfn)) {
>>>>> +			pgmap = get_dev_pagemap(pfn, NULL);
>>>>> +			if (pgmap)
>>>>> +				return memory_failure_dev_pagemap(pfn, flags,
>>>>> +								  pgmap);
>>>>> +		}
>>>>>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>>>>  			pfn);
>>>>>  		return -ENXIO;
>>>>
>>>> Don't we need that earlier at hwpoison_inject level?
>>>>
>>>
>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
>>> alone would not be sufficient as discussed. We would, again, have to
>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
>>>
>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
>>>
>>> 	/*
>>> 	 * Note that the below poison/unpoison interfaces do not involve
>>> 	 * hardware status change, hence do not require hardware support.
>>> 	 * They are mainly for testing hwpoison in software level.
>>> 	 */
>>>
>>> So it's not that bad compared to memory_failure() called from real HW or
>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
>>
>> Yes, this is just a toy. And yes we need to handle zone device pages
>> here because a) people likely want to test MCE behavior even on these
>> pages and b) HW can really trigger MCEs there as well. I was just
>> pointing that the patch is likely incomplete.
>>
> 
> I rather think this deserves a separate patch as it is a separate
> interface :)
> 
> I do wonder why hwpoison_inject() has to perform so much extra work
> compared to other memory_failure() users. This smells like legacy
> leftovers to me, but I might be wrong. The interface is fairly old,
> though. Does anybody know why we need this magic? I can spot quite some
> duplicate checks/things getting performed.
> 
> Naiive me would just make the interface perform the same as
> hard_offline_page_store(). But most probably I am not getting the real
> purpose of both different interfaces.
> 
> HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have
> guessed that fixing this is not urgent.
> 
> BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs
> fixing to make sure we access initialized memmaps.
> 

To be more precise, soft_offline_page_store() needs a
pfn_to_online_page() check. Will send a patch.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-10  7:58           ` David Hildenbrand
@ 2019-10-11  6:02             ` Naoya Horiguchi
  2019-10-11 10:13               ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: Naoya Horiguchi @ 2019-10-11  6:02 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton

On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote:
> On 10.10.19 09:52, David Hildenbrand wrote:
> > On 10.10.19 09:35, Michal Hocko wrote:
> >> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
> >>> On 09.10.19 16:43, Michal Hocko wrote:
> >>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
> >>>>> We should check for pfn_to_online_page() to not access uninitialized
> >>>>> memmaps. Reshuffle the code so we don't have to duplicate the error
> >>>>> message.
> >>>>>
> >>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>>>> Cc: Michal Hocko <mhocko@kernel.org>
> >>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>>>> ---
> >>>>>  mm/memory-failure.c | 14 ++++++++------
> >>>>>  1 file changed, 8 insertions(+), 6 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>>>> index 7ef849da8278..e866e6e5660b 100644
> >>>>> --- a/mm/memory-failure.c
> >>>>> +++ b/mm/memory-failure.c
> >>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
> >>>>>  	if (!sysctl_memory_failure_recovery)
> >>>>>  		panic("Memory failure on page %lx", pfn);
> >>>>>  
> >>>>> -	if (!pfn_valid(pfn)) {
> >>>>> +	p = pfn_to_online_page(pfn);
> >>>>> +	if (!p) {
> >>>>> +		if (pfn_valid(pfn)) {
> >>>>> +			pgmap = get_dev_pagemap(pfn, NULL);
> >>>>> +			if (pgmap)
> >>>>> +				return memory_failure_dev_pagemap(pfn, flags,
> >>>>> +								  pgmap);
> >>>>> +		}
> >>>>>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
> >>>>>  			pfn);
> >>>>>  		return -ENXIO;
> >>>>
> >>>> Don't we need that earlier at hwpoison_inject level?
> >>>>
> >>>
> >>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
> >>> alone would not be sufficient as discussed. We would, again, have to
> >>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
> >>>
> >>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
> >>>
> >>> 	/*
> >>> 	 * Note that the below poison/unpoison interfaces do not involve
> >>> 	 * hardware status change, hence do not require hardware support.
> >>> 	 * They are mainly for testing hwpoison in software level.
> >>> 	 */
> >>>
> >>> So it's not that bad compared to memory_failure() called from real HW or
> >>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
> >>
> >> Yes, this is just a toy. And yes we need to handle zone device pages
> >> here because a) people likely want to test MCE behavior even on these
> >> pages and b) HW can really trigger MCEs there as well. I was just
> >> pointing that the patch is likely incomplete.
> >>
> > 
> > I rather think this deserves a separate patch as it is a separate
> > interface :)
> > 
> > I do wonder why hwpoison_inject() has to perform so much extra work
> > compared to other memory_failure() users. This smells like legacy
> > leftovers to me, but I might be wrong. The interface is fairly old,
> > though. Does anybody know why we need this magic? I can spot quite some
> > duplicate checks/things getting performed.

It concerns me too, this *is* an old legacy code. I guess it was left as-is
because no one complained about it.  That's not good, so I'll do some cleanup.

> > 
> > Naiive me would just make the interface perform the same as
> > hard_offline_page_store(). But most probably I am not getting the real
> > purpose of both different interfaces.

Maybe for historical reason, we have these slightly different interfaces:

- corrupt-pfn
  - purely for debugging purpose
  - paired with unpoison-pfn
  - used by in-kernel tool tools/vm/page-types.c
- hard_offline_page
  - paired with soft_offline_page
  - used by other userspace tools like mcelog

But these don't explain why implemented differently, so I think that both
should be written in the same manner.

> > 
> > HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have
> > guessed that fixing this is not urgent.
> > 
> > BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs
> > fixing to make sure we access initialized memmaps.
> > 
> 
> To be more precise, soft_offline_page_store() needs a
> pfn_to_online_page() check. Will send a patch.

Thanks for finding this.

- Naoya Horiguchi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-10  7:17     ` David Hildenbrand
@ 2019-10-11  6:50       ` Naoya Horiguchi
  2019-10-19  2:05       ` Andrew Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Naoya Horiguchi @ 2019-10-11  6:50 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko

On Thu, Oct 10, 2019 at 09:17:42AM +0200, David Hildenbrand wrote:
> On 10.10.19 02:26, Naoya Horiguchi wrote:
> > On Wed, Oct 09, 2019 at 04:24:35PM +0200, David Hildenbrand wrote:
> >> We should check for pfn_to_online_page() to not access uninitialized
> >> memmaps. Reshuffle the code so we don't have to duplicate the error
> >> message.
> >>
> >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  mm/memory-failure.c | 14 ++++++++------
> >>  1 file changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 7ef849da8278..e866e6e5660b 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
> >>  	if (!sysctl_memory_failure_recovery)
> >>  		panic("Memory failure on page %lx", pfn);
> >>  
> >> -	if (!pfn_valid(pfn)) {
> >> +	p = pfn_to_online_page(pfn);
> >> +	if (!p) {
> >> +		if (pfn_valid(pfn)) {
> >> +			pgmap = get_dev_pagemap(pfn, NULL);
> >> +			if (pgmap)
> >> +				return memory_failure_dev_pagemap(pfn, flags,
> >> +								  pgmap);
> >> +		}
> >>  		pr_err("Memory failure: %#lx: memory outside kernel control\n",
> >>  			pfn);
> >>  		return -ENXIO;
> >>  	}
> >>  
> >> -	pgmap = get_dev_pagemap(pfn, NULL);
> >> -	if (pgmap)
> >> -		return memory_failure_dev_pagemap(pfn, flags, pgmap);
> >> -
> >> -	p = pfn_to_page(pfn);
> > 
> > This change seems to assume that memory_failure_dev_pagemap() is never
> > called for online pages. Is it an intended behavior?
> > Or the concept "online pages" is not applicable to zone device pages?
> 
> Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online
> (SECTION_IS_ONLINE). The terminology "online" only applies to pages that
> were given to the buddy. And as we support sup-section hotadd for
> devmem, we cannot easily make use of the section flag it. I already
> proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap
> and call it something like pfn_active().
> 
> pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we
> could use pfn_active() everywhere to test for initialized memmaps (well,
> besides some special cases like device reserved memory that does not
> span full sub-sections). Until now, nobody volunteered and I have other
> things to do.

Thank you for explanation.  I'm not sure how hard now but we try to do this.
You fix the problem from online/offline viewpoint now, and leave remaining
part untouched. I agree with that approach, and the suggested change looks
good to me.

Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-11  6:02             ` Naoya Horiguchi
@ 2019-10-11 10:13               ` David Hildenbrand
  2019-10-14 13:36                 ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2019-10-11 10:13 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton

On 11.10.19 08:02, Naoya Horiguchi wrote:
> On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote:
>> On 10.10.19 09:52, David Hildenbrand wrote:
>>> On 10.10.19 09:35, Michal Hocko wrote:
>>>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
>>>>> On 09.10.19 16:43, Michal Hocko wrote:
>>>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
>>>>>>> We should check for pfn_to_online_page() to not access uninitialized
>>>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error
>>>>>>> message.
>>>>>>>
>>>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>>> ---
>>>>>>>   mm/memory-failure.c | 14 ++++++++------
>>>>>>>   1 file changed, 8 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>> index 7ef849da8278..e866e6e5660b 100644
>>>>>>> --- a/mm/memory-failure.c
>>>>>>> +++ b/mm/memory-failure.c
>>>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>>   	if (!sysctl_memory_failure_recovery)
>>>>>>>   		panic("Memory failure on page %lx", pfn);
>>>>>>>   
>>>>>>> -	if (!pfn_valid(pfn)) {
>>>>>>> +	p = pfn_to_online_page(pfn);
>>>>>>> +	if (!p) {
>>>>>>> +		if (pfn_valid(pfn)) {
>>>>>>> +			pgmap = get_dev_pagemap(pfn, NULL);
>>>>>>> +			if (pgmap)
>>>>>>> +				return memory_failure_dev_pagemap(pfn, flags,
>>>>>>> +								  pgmap);
>>>>>>> +		}
>>>>>>>   		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>>>>>>   			pfn);
>>>>>>>   		return -ENXIO;
>>>>>>
>>>>>> Don't we need that earlier at hwpoison_inject level?
>>>>>>
>>>>>
>>>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
>>>>> alone would not be sufficient as discussed. We would, again, have to
>>>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
>>>>>
>>>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
>>>>>
>>>>> 	/*
>>>>> 	 * Note that the below poison/unpoison interfaces do not involve
>>>>> 	 * hardware status change, hence do not require hardware support.
>>>>> 	 * They are mainly for testing hwpoison in software level.
>>>>> 	 */
>>>>>
>>>>> So it's not that bad compared to memory_failure() called from real HW or
>>>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
>>>>
>>>> Yes, this is just a toy. And yes we need to handle zone device pages
>>>> here because a) people likely want to test MCE behavior even on these
>>>> pages and b) HW can really trigger MCEs there as well. I was just
>>>> pointing that the patch is likely incomplete.
>>>>
>>>
>>> I rather think this deserves a separate patch as it is a separate
>>> interface :)
>>>
>>> I do wonder why hwpoison_inject() has to perform so much extra work
>>> compared to other memory_failure() users. This smells like legacy
>>> leftovers to me, but I might be wrong. The interface is fairly old,
>>> though. Does anybody know why we need this magic? I can spot quite some
>>> duplicate checks/things getting performed.
> 
> It concerns me too, this *is* an old legacy code. I guess it was left as-is
> because no one complained about it.  That's not good, so I'll do some cleanup.

Most of that stuff was introduced in

commit 31d3d3484f9bd263925ecaa341500ac2df3a5d9b
Author: Wu Fengguang <fengguang.wu@intel.com>
Date:   Wed Dec 16 12:19:59 2009 +0100

     HWPOISON: limit hwpoison injector to known page types

     __memory_failure()'s workflow is

             set PG_hwpoison
             //...
             unset PG_hwpoison if didn't pass hwpoison filter

     That could kill unrelated process if it happens to page fault on the
     page with the (temporary) PG_hwpoison. The race should be big enough to
     appear in stress tests.

     Fix it by grabbing the page and checking filter at inject time.  This
     also avoids the very noisy "Injecting memory failure..." messages.


Now, we still have the same "issue" in memory_failure() today:


	if (TestSetPageHWPoison(p)) {
		pr_err("Memory failure: %#lx: already hardware poisoned\n",
			pfn);
		return 0;
	}
[...]
	if (hwpoison_filter(p)) {
		if (TestClearPageHWPoison(p))
			num_poisoned_pages_dec();
		unlock_page(p);
		put_hwpoison_page(p);
		return 0;
	}

However, I don't understand why we need that special handling only for this
debug interface and not the other users.

I'd vote for ripping out that legacy crap (so the interface works correctly
with ZONE_DEVICE) and instead (if really required) rework memory_failure()
to not produce such side effects.

> 
>>>
>>> Naiive me would just make the interface perform the same as
>>> hard_offline_page_store(). But most probably I am not getting the real
>>> purpose of both different interfaces.
> 
> Maybe for historical reason, we have these slightly different interfaces:
> 
> - corrupt-pfn
>    - purely for debugging purpose
>    - paired with unpoison-pfn
>    - used by in-kernel tool tools/vm/page-types.c
> - hard_offline_page
>    - paired with soft_offline_page
>    - used by other userspace tools like mcelog
> 
> But these don't explain why implemented differently, so I think that both
> should be written in the same manner.
> 
>>>
>>> HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have
>>> guessed that fixing this is not urgent.
>>>
>>> BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs
>>> fixing to make sure we access initialized memmaps.
>>>
>>
>> To be more precise, soft_offline_page_store() needs a
>> pfn_to_online_page() check. Will send a patch.
> 
> Thanks for finding this.
> 
> - Naoya Horiguchi
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand
  2019-10-09 14:43   ` Michal Hocko
  2019-10-10  0:26   ` Naoya Horiguchi
@ 2019-10-14  8:41   ` David Hildenbrand
  2 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-10-14  8:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, Naoya Horiguchi, Andrew Morton, Michal Hocko

On 09.10.19 16:24, David Hildenbrand wrote:
> We should check for pfn_to_online_page() to not access uninitialized
> memmaps. Reshuffle the code so we don't have to duplicate the error
> message.
> 
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   mm/memory-failure.c | 14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7ef849da8278..e866e6e5660b 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>   	if (!sysctl_memory_failure_recovery)
>   		panic("Memory failure on page %lx", pfn);
>   
> -	if (!pfn_valid(pfn)) {
> +	p = pfn_to_online_page(pfn);
> +	if (!p) {
> +		if (pfn_valid(pfn)) {
> +			pgmap = get_dev_pagemap(pfn, NULL);
> +			if (pgmap)
> +				return memory_failure_dev_pagemap(pfn, flags,
> +								  pgmap);
> +		}
>   		pr_err("Memory failure: %#lx: memory outside kernel control\n",
>   			pfn);
>   		return -ENXIO;
>   	}
>   
> -	pgmap = get_dev_pagemap(pfn, NULL);
> -	if (pgmap)
> -		return memory_failure_dev_pagemap(pfn, flags, pgmap);
> -
> -	p = pfn_to_page(pfn);
>   	if (PageHuge(p))
>   		return memory_failure_hugetlb(pfn, flags);
>   	if (TestSetPageHWPoison(p)) {
> 

@Andrew, can you add

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319

And

Cc: stable@vger.kernel.org # v4.13+

The stable backports won't be clean cherry-picks AFAIKS, but do-able.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c
  2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand
  2019-10-09 14:41   ` Michal Hocko
@ 2019-10-14  8:44   ` David Hildenbrand
  1 sibling, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-10-14  8:44 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton
  Cc: linux-mm, Qian Cai, Dan Williams, Alexey Dobriyan,
	Stephen Rothwell, Toshiki Fukasawa, Pankaj gupta, Mike Rapoport,
	Anthony Yznaga, Michal Hocko, Aneesh Kumar K.V, linux-fsdevel

On 09.10.19 16:24, David Hildenbrand wrote:
> There are three places where we access uninitialized memmaps, namely:
> - /proc/kpagecount
> - /proc/kpageflags
> - /proc/kpagecgroup
> 
> We have initialized memmaps either when the section is online or when
> the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain
> garbage and in the worst case trigger kernel BUGs, especially with
> CONFIG_PAGE_POISONING.
> 
> For example, not onlining a DIMM during boot and calling /proc/kpagecount
> with CONFIG_PAGE_POISONING:
> :/# cat /proc/kpagecount > tmp.test
> [   95.600592] BUG: unable to handle page fault for address: fffffffffffffffe
> [   95.601238] #PF: supervisor read access in kernel mode
> [   95.601675] #PF: error_code(0x0000) - not-present page
> [   95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
> [   95.602596] Oops: 0000 [#1] SMP NOPTI
> [   95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
> [   95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
> [   95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0
> [   95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
> [   95.606450] RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
> [   95.606904] RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
> [   95.607519] RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
> [   95.608128] RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
> [   95.608731] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
> [   95.609327] R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
> [   95.609924] FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
> [   95.610599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   95.611083] CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
> [   95.611686] Call Trace:
> [   95.611906]  proc_reg_read+0x3c/0x60
> [   95.612228]  vfs_read+0xc5/0x180
> [   95.612505]  ksys_read+0x68/0xe0
> [   95.612785]  do_syscall_64+0x5c/0xa0
> [   95.613092]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> For now, let's drop support for ZONE_DEVICE from the three pseudo files
> in order to fix this. To distinguish offline memory (with garbage memmap)
> from ZONE_DEVICE memory with properly initialized memmaps, we would have to
> check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage
> of both (especially, special casing devmem) is frowned upon and needs to
> be reworked. The fundamental issue we have is:
> 
> 	if (pfn_to_online_page(pfn)) {
> 		/* memmap initialized */
> 	} else if (pfn_valid(pfn)) {
> 		/*
> 		 * ???
> 		 * a) offline memory. memmap garbage.
> 		 * b) devmem: memmap initialized to ZONE_DEVICE.
> 		 * c) devmem: reserved for driver. memmap garbage.
> 		 * (d) devmem: memmap currently initializing - garbage)
> 		 */
> 	}
> 
> We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
> in place as that function is also used from memory failure. We now
> no longer dump information about pages that are not in use anymore -
> offline.
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
> Cc: Pankaj gupta <pagupta@redhat.com>
> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   fs/proc/page.c | 28 ++++++++++++++++------------
>   1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index decd3fe39674..e40dbfe1168e 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -42,10 +42,12 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
>   		return -EINVAL;
>   
>   	while (count > 0) {
> -		if (pfn_valid(pfn))
> -			ppage = pfn_to_page(pfn);
> -		else
> -			ppage = NULL;
> +		/*
> +		 * TODO: ZONE_DEVICE support requires to identify
> +		 * memmaps that were actually initialized.
> +		 */
> +		ppage = pfn_to_online_page(pfn);
> +
>   		if (!ppage || PageSlab(ppage) || page_has_type(ppage))
>   			pcount = 0;
>   		else
> @@ -218,10 +220,11 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
>   		return -EINVAL;
>   
>   	while (count > 0) {
> -		if (pfn_valid(pfn))
> -			ppage = pfn_to_page(pfn);
> -		else
> -			ppage = NULL;
> +		/*
> +		 * TODO: ZONE_DEVICE support requires to identify
> +		 * memmaps that were actually initialized.
> +		 */
> +		ppage = pfn_to_online_page(pfn);
>   
>   		if (put_user(stable_page_flags(ppage), out)) {
>   			ret = -EFAULT;
> @@ -263,10 +266,11 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
>   		return -EINVAL;
>   
>   	while (count > 0) {
> -		if (pfn_valid(pfn))
> -			ppage = pfn_to_page(pfn);
> -		else
> -			ppage = NULL;
> +		/*
> +		 * TODO: ZONE_DEVICE support requires to identify
> +		 * memmaps that were actually initialized.
> +		 */
> +		ppage = pfn_to_online_page(pfn);
>   
>   		if (ppage)
>   			ino = page_cgroup_ino(ppage);
> 

@Andrew, can you add

Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319

And

Cc: stable@vger.kernel.org # v4.13+

Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-11 10:13               ` David Hildenbrand
@ 2019-10-14 13:36                 ` Michal Hocko
  2019-10-15 14:23                   ` Oscar Salvador
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-14 13:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Naoya Horiguchi, linux-kernel, linux-mm, Andrew Morton, Oscar Salvador

[Cc Oscar]

On Fri 11-10-19 12:13:17, David Hildenbrand wrote:
> On 11.10.19 08:02, Naoya Horiguchi wrote:
> > On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote:
> >> On 10.10.19 09:52, David Hildenbrand wrote:
> >>> On 10.10.19 09:35, Michal Hocko wrote:
> >>>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
> >>>>> On 09.10.19 16:43, Michal Hocko wrote:
> >>>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
> >>>>>>> We should check for pfn_to_online_page() to not access uninitialized
> >>>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error
> >>>>>>> message.
> >>>>>>>
> >>>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> >>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>>>>>> Cc: Michal Hocko <mhocko@kernel.org>
> >>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>>>>>> ---
> >>>>>>>   mm/memory-failure.c | 14 ++++++++------
> >>>>>>>   1 file changed, 8 insertions(+), 6 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>>>>>> index 7ef849da8278..e866e6e5660b 100644
> >>>>>>> --- a/mm/memory-failure.c
> >>>>>>> +++ b/mm/memory-failure.c
> >>>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
> >>>>>>>   	if (!sysctl_memory_failure_recovery)
> >>>>>>>   		panic("Memory failure on page %lx", pfn);
> >>>>>>>   
> >>>>>>> -	if (!pfn_valid(pfn)) {
> >>>>>>> +	p = pfn_to_online_page(pfn);
> >>>>>>> +	if (!p) {
> >>>>>>> +		if (pfn_valid(pfn)) {
> >>>>>>> +			pgmap = get_dev_pagemap(pfn, NULL);
> >>>>>>> +			if (pgmap)
> >>>>>>> +				return memory_failure_dev_pagemap(pfn, flags,
> >>>>>>> +								  pgmap);
> >>>>>>> +		}
> >>>>>>>   		pr_err("Memory failure: %#lx: memory outside kernel control\n",
> >>>>>>>   			pfn);
> >>>>>>>   		return -ENXIO;
> >>>>>>
> >>>>>> Don't we need that earlier at hwpoison_inject level?
> >>>>>>
> >>>>>
> >>>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
> >>>>> alone would not be sufficient as discussed. We would, again, have to
> >>>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
> >>>>>
> >>>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
> >>>>>
> >>>>> 	/*
> >>>>> 	 * Note that the below poison/unpoison interfaces do not involve
> >>>>> 	 * hardware status change, hence do not require hardware support.
> >>>>> 	 * They are mainly for testing hwpoison in software level.
> >>>>> 	 */
> >>>>>
> >>>>> So it's not that bad compared to memory_failure() called from real HW or
> >>>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
> >>>>
> >>>> Yes, this is just a toy. And yes we need to handle zone device pages
> >>>> here because a) people likely want to test MCE behavior even on these
> >>>> pages and b) HW can really trigger MCEs there as well. I was just
> >>>> pointing that the patch is likely incomplete.
> >>>>
> >>>
> >>> I rather think this deserves a separate patch as it is a separate
> >>> interface :)
> >>>
> >>> I do wonder why hwpoison_inject() has to perform so much extra work
> >>> compared to other memory_failure() users. This smells like legacy
> >>> leftovers to me, but I might be wrong. The interface is fairly old,
> >>> though. Does anybody know why we need this magic? I can spot quite some
> >>> duplicate checks/things getting performed.
> > 
> > It concerns me too, this *is* an old legacy code. I guess it was left as-is
> > because no one complained about it.  That's not good, so I'll do some cleanup.
> 
> Most of that stuff was introduced in
> 
> commit 31d3d3484f9bd263925ecaa341500ac2df3a5d9b
> Author: Wu Fengguang <fengguang.wu@intel.com>
> Date:   Wed Dec 16 12:19:59 2009 +0100
> 
>      HWPOISON: limit hwpoison injector to known page types
> 
>      __memory_failure()'s workflow is
> 
>              set PG_hwpoison
>              //...
>              unset PG_hwpoison if didn't pass hwpoison filter
> 
>      That could kill unrelated process if it happens to page fault on the
>      page with the (temporary) PG_hwpoison. The race should be big enough to
>      appear in stress tests.
> 
>      Fix it by grabbing the page and checking filter at inject time.  This
>      also avoids the very noisy "Injecting memory failure..." messages.
> 
> 
> Now, we still have the same "issue" in memory_failure() today:
> 
> 
> 	if (TestSetPageHWPoison(p)) {
> 		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> 			pfn);
> 		return 0;
> 	}
> [...]
> 	if (hwpoison_filter(p)) {
> 		if (TestClearPageHWPoison(p))
> 			num_poisoned_pages_dec();
> 		unlock_page(p);
> 		put_hwpoison_page(p);
> 		return 0;
> 	}
> 
> However, I don't understand why we need that special handling only for this
> debug interface and not the other users.
> 
> I'd vote for ripping out that legacy crap (so the interface works correctly
> with ZONE_DEVICE) and instead (if really required) rework memory_failure()
> to not produce such side effects.

I do agree. The two should be really using the same code. My
understanding was that MADV_HWPOISON was there to test the actual MCE
behavior (and the man page seems to agree with that).

Oscar is working on a rewrite. Not sure he has considered this as well.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-14 13:36                 ` Michal Hocko
@ 2019-10-15 14:23                   ` Oscar Salvador
  0 siblings, 0 replies; 19+ messages in thread
From: Oscar Salvador @ 2019-10-15 14:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Hildenbrand, Naoya Horiguchi, linux-kernel, linux-mm,
	Andrew Morton, Oscar Salvador

On Mon, Oct 14, 2019 at 03:36:17PM +0200, Michal Hocko wrote:
> I do agree. The two should be really using the same code. My
> understanding was that MADV_HWPOISON was there to test the actual MCE
> behavior (and the man page seems to agree with that).
> 
> Oscar is working on a rewrite. Not sure he has considered this as well.

Yeah, I came across hwpoison-inject module when doing my re-write,
and I felt like this is begging for a clean up.

Since unpoison_memory needs some adjustments after my re-write, I will go ahead
and clean that up, otherwise it will be inconsistent.

I expect to be ready fo send the v2 by the end of this week.

Thanks
-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()
  2019-10-10  7:17     ` David Hildenbrand
  2019-10-11  6:50       ` Naoya Horiguchi
@ 2019-10-19  2:05       ` Andrew Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2019-10-19  2:05 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Naoya Horiguchi, linux-kernel, linux-mm, Michal Hocko

On Thu, 10 Oct 2019 09:17:42 +0200 David Hildenbrand <david@redhat.com> wrote:

> >> -	pgmap = get_dev_pagemap(pfn, NULL);
> >> -	if (pgmap)
> >> -		return memory_failure_dev_pagemap(pfn, flags, pgmap);
> >> -
> >> -	p = pfn_to_page(pfn);
> > 
> > This change seems to assume that memory_failure_dev_pagemap() is never
> > called for online pages. Is it an intended behavior?
> > Or the concept "online pages" is not applicable to zone device pages?
> 
> Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online
> (SECTION_IS_ONLINE). The terminology "online" only applies to pages that
> were given to the buddy. And as we support sup-section hotadd for
> devmem, we cannot easily make use of the section flag it. I already
> proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap
> and call it something like pfn_active().
> 
> pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we
> could use pfn_active() everywhere to test for initialized memmaps (well,
> besides some special cases like device reserved memory that does not
> span full sub-sections). Until now, nobody volunteered and I have other
> things to do.

Is it worth a code comment or two to make this clearer?


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 14:24 [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers David Hildenbrand
2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand
2019-10-09 14:41   ` Michal Hocko
2019-10-14  8:44   ` David Hildenbrand
2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand
2019-10-09 14:43   ` Michal Hocko
2019-10-10  7:27     ` David Hildenbrand
2019-10-10  7:35       ` Michal Hocko
2019-10-10  7:52         ` David Hildenbrand
2019-10-10  7:58           ` David Hildenbrand
2019-10-11  6:02             ` Naoya Horiguchi
2019-10-11 10:13               ` David Hildenbrand
2019-10-14 13:36                 ` Michal Hocko
2019-10-15 14:23                   ` Oscar Salvador
2019-10-10  0:26   ` Naoya Horiguchi
2019-10-10  7:17     ` David Hildenbrand
2019-10-11  6:50       ` Naoya Horiguchi
2019-10-19  2:05       ` Andrew Morton
2019-10-14  8:41   ` David Hildenbrand

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox