* [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers @ 2019-10-09 14:24 David Hildenbrand 2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand 2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand 0 siblings, 2 replies; 20+ messages in thread From: David Hildenbrand @ 2019-10-09 14:24 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Alexey Dobriyan, Andrew Morton, Aneesh Kumar K.V, Anthony Yznaga, Dan Williams, Michal Hocko, Mike Rapoport, Naoya Horiguchi, Pankaj gupta, Qian Cai, Stephen Rothwell, Toshiki Fukasawa This is the follow-up of: [PATCH v1] mm: Fix access of uninitialized memmaps in fs/proc/page.c We have multiple places where we might access uninitialized memmaps and trigger kernel BUGs. Make sure to only access initialized memmaps. Some of these places got easier to trigger with: [PATCH v6 00/10] mm/memory_hotplug: Shrink zones before removing memory As memmaps are now also poisoned when memory is offlined, before it is actually removed. v1 -> v2: - Drop ZONE_DEVICE support from the /proc/k... files as requested by Michal - Further simplify the code - Split up into two patches David Hildenbrand (2): mm: Don't access uninitialized memmaps in fs/proc/page.c mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() fs/proc/page.c | 28 ++++++++++++++++------------ mm/memory-failure.c | 14 ++++++++------ 2 files changed, 24 insertions(+), 18 deletions(-) -- 2.21.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c 2019-10-09 14:24 [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers David Hildenbrand @ 2019-10-09 14:24 ` David Hildenbrand 2019-10-09 14:41 ` Michal Hocko 2019-10-14 8:44 ` David Hildenbrand 2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand 1 sibling, 2 replies; 20+ messages in thread From: David Hildenbrand @ 2019-10-09 14:24 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Qian Cai, Dan Williams, Alexey Dobriyan, Andrew Morton, Stephen Rothwell, Toshiki Fukasawa, Pankaj gupta, Mike Rapoport, Anthony Yznaga, Michal Hocko, Aneesh Kumar K.V, linux-fsdevel There are three places where we access uninitialized memmaps, namely: - /proc/kpagecount - /proc/kpageflags - /proc/kpagecgroup We have initialized memmaps either when the section is online or when the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain garbage and in the worst case trigger kernel BUGs, especially with CONFIG_PAGE_POISONING. For example, not onlining a DIMM during boot and calling /proc/kpagecount with CONFIG_PAGE_POISONING: :/# cat /proc/kpagecount > tmp.test [ 95.600592] BUG: unable to handle page fault for address: fffffffffffffffe [ 95.601238] #PF: supervisor read access in kernel mode [ 95.601675] #PF: error_code(0x0000) - not-present page [ 95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0 [ 95.602596] Oops: 0000 [#1] SMP NOPTI [ 95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11 [ 95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 [ 95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0 [ 95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480 [ 95.606450] RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202 [ 95.606904] RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000 [ 95.607519] RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000 [ 95.608128] RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000 [ 95.608731] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 [ 95.609327] R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08 [ 95.609924] FS: 00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000 [ 95.610599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.611083] CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0 [ 95.611686] Call Trace: [ 95.611906] proc_reg_read+0x3c/0x60 [ 95.612228] vfs_read+0xc5/0x180 [ 95.612505] ksys_read+0x68/0xe0 [ 95.612785] do_syscall_64+0x5c/0xa0 [ 95.613092] entry_SYSCALL_64_after_hwframe+0x49/0xbe For now, let's drop support for ZONE_DEVICE from the three pseudo files in order to fix this. To distinguish offline memory (with garbage memmap) from ZONE_DEVICE memory with properly initialized memmaps, we would have to check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage of both (especially, special casing devmem) is frowned upon and needs to be reworked. The fundamental issue we have is: if (pfn_to_online_page(pfn)) { /* memmap initialized */ } else if (pfn_valid(pfn)) { /* * ??? * a) offline memory. memmap garbage. * b) devmem: memmap initialized to ZONE_DEVICE. * c) devmem: reserved for driver. memmap garbage. * (d) devmem: memmap currently initializing - garbage) */ } We'll leave the pfn_zone_device_reserved() check in stable_page_flags() in place as that function is also used from memory failure. We now no longer dump information about pages that are not in use anymore - offline. Reported-by: Qian Cai <cai@lca.pw> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com> Cc: Pankaj gupta <pagupta@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> --- fs/proc/page.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/fs/proc/page.c b/fs/proc/page.c index decd3fe39674..e40dbfe1168e 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -42,10 +42,12 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf, return -EINVAL; while (count > 0) { - if (pfn_valid(pfn)) - ppage = pfn_to_page(pfn); - else - ppage = NULL; + /* + * TODO: ZONE_DEVICE support requires to identify + * memmaps that were actually initialized. + */ + ppage = pfn_to_online_page(pfn); + if (!ppage || PageSlab(ppage) || page_has_type(ppage)) pcount = 0; else @@ -218,10 +220,11 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf, return -EINVAL; while (count > 0) { - if (pfn_valid(pfn)) - ppage = pfn_to_page(pfn); - else - ppage = NULL; + /* + * TODO: ZONE_DEVICE support requires to identify + * memmaps that were actually initialized. + */ + ppage = pfn_to_online_page(pfn); if (put_user(stable_page_flags(ppage), out)) { ret = -EFAULT; @@ -263,10 +266,11 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf, return -EINVAL; while (count > 0) { - if (pfn_valid(pfn)) - ppage = pfn_to_page(pfn); - else - ppage = NULL; + /* + * TODO: ZONE_DEVICE support requires to identify + * memmaps that were actually initialized. + */ + ppage = pfn_to_online_page(pfn); if (ppage) ino = page_cgroup_ino(ppage); -- 2.21.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c 2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand @ 2019-10-09 14:41 ` Michal Hocko 2019-10-14 8:44 ` David Hildenbrand 1 sibling, 0 replies; 20+ messages in thread From: Michal Hocko @ 2019-10-09 14:41 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Qian Cai, Dan Williams, Alexey Dobriyan, Andrew Morton, Stephen Rothwell, Toshiki Fukasawa, Pankaj gupta, Mike Rapoport, Anthony Yznaga, Aneesh Kumar K.V, linux-fsdevel On Wed 09-10-19 16:24:34, David Hildenbrand wrote: > There are three places where we access uninitialized memmaps, namely: > - /proc/kpagecount > - /proc/kpageflags > - /proc/kpagecgroup > > We have initialized memmaps either when the section is online or when > the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain > garbage and in the worst case trigger kernel BUGs, especially with > CONFIG_PAGE_POISONING. > > For example, not onlining a DIMM during boot and calling /proc/kpagecount > with CONFIG_PAGE_POISONING: > :/# cat /proc/kpagecount > tmp.test > [ 95.600592] BUG: unable to handle page fault for address: fffffffffffffffe > [ 95.601238] #PF: supervisor read access in kernel mode > [ 95.601675] #PF: error_code(0x0000) - not-present page > [ 95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0 > [ 95.602596] Oops: 0000 [#1] SMP NOPTI > [ 95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11 > [ 95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 > [ 95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0 > [ 95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480 > [ 95.606450] RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202 > [ 95.606904] RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000 > [ 95.607519] RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000 > [ 95.608128] RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000 > [ 95.608731] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 > [ 95.609327] R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08 > [ 95.609924] FS: 00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000 > [ 95.610599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 95.611083] CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0 > [ 95.611686] Call Trace: > [ 95.611906] proc_reg_read+0x3c/0x60 > [ 95.612228] vfs_read+0xc5/0x180 > [ 95.612505] ksys_read+0x68/0xe0 > [ 95.612785] do_syscall_64+0x5c/0xa0 > [ 95.613092] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > For now, let's drop support for ZONE_DEVICE from the three pseudo files > in order to fix this. To distinguish offline memory (with garbage memmap) > from ZONE_DEVICE memory with properly initialized memmaps, we would have to > check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage > of both (especially, special casing devmem) is frowned upon and needs to > be reworked. The fundamental issue we have is: > > if (pfn_to_online_page(pfn)) { > /* memmap initialized */ > } else if (pfn_valid(pfn)) { > /* > * ??? > * a) offline memory. memmap garbage. > * b) devmem: memmap initialized to ZONE_DEVICE. > * c) devmem: reserved for driver. memmap garbage. > * (d) devmem: memmap currently initializing - garbage) > */ > } > > We'll leave the pfn_zone_device_reserved() check in stable_page_flags() > in place as that function is also used from memory failure. We now > no longer dump information about pages that are not in use anymore - > offline. > > Reported-by: Qian Cai <cai@lca.pw> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Alexey Dobriyan <adobriyan@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Stephen Rothwell <sfr@canb.auug.org.au> > Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com> > Cc: Pankaj gupta <pagupta@redhat.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: Anthony Yznaga <anthony.yznaga@oracle.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > Cc: linux-fsdevel@vger.kernel.org > Signed-off-by: David Hildenbrand <david@redhat.com> Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319 usage of pfn_to_online_page is the right way to dereference the pfn here. My understanding of the zone device internals is limited to see whether this fixes it as well. Acked-by: Michal Hocko <mhocko@suse.com> > --- > fs/proc/page.c | 28 ++++++++++++++++------------ > 1 file changed, 16 insertions(+), 12 deletions(-) > > diff --git a/fs/proc/page.c b/fs/proc/page.c > index decd3fe39674..e40dbfe1168e 100644 > --- a/fs/proc/page.c > +++ b/fs/proc/page.c > @@ -42,10 +42,12 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf, > return -EINVAL; > > while (count > 0) { > - if (pfn_valid(pfn)) > - ppage = pfn_to_page(pfn); > - else > - ppage = NULL; > + /* > + * TODO: ZONE_DEVICE support requires to identify > + * memmaps that were actually initialized. > + */ > + ppage = pfn_to_online_page(pfn); > + > if (!ppage || PageSlab(ppage) || page_has_type(ppage)) > pcount = 0; > else > @@ -218,10 +220,11 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf, > return -EINVAL; > > while (count > 0) { > - if (pfn_valid(pfn)) > - ppage = pfn_to_page(pfn); > - else > - ppage = NULL; > + /* > + * TODO: ZONE_DEVICE support requires to identify > + * memmaps that were actually initialized. > + */ > + ppage = pfn_to_online_page(pfn); > > if (put_user(stable_page_flags(ppage), out)) { > ret = -EFAULT; > @@ -263,10 +266,11 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf, > return -EINVAL; > > while (count > 0) { > - if (pfn_valid(pfn)) > - ppage = pfn_to_page(pfn); > - else > - ppage = NULL; > + /* > + * TODO: ZONE_DEVICE support requires to identify > + * memmaps that were actually initialized. > + */ > + ppage = pfn_to_online_page(pfn); > > if (ppage) > ino = page_cgroup_ino(ppage); > -- > 2.21.0 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c 2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand 2019-10-09 14:41 ` Michal Hocko @ 2019-10-14 8:44 ` David Hildenbrand 1 sibling, 0 replies; 20+ messages in thread From: David Hildenbrand @ 2019-10-14 8:44 UTC (permalink / raw) To: linux-kernel, Andrew Morton Cc: linux-mm, Qian Cai, Dan Williams, Alexey Dobriyan, Stephen Rothwell, Toshiki Fukasawa, Pankaj gupta, Mike Rapoport, Anthony Yznaga, Michal Hocko, Aneesh Kumar K.V, linux-fsdevel On 09.10.19 16:24, David Hildenbrand wrote: > There are three places where we access uninitialized memmaps, namely: > - /proc/kpagecount > - /proc/kpageflags > - /proc/kpagecgroup > > We have initialized memmaps either when the section is online or when > the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain > garbage and in the worst case trigger kernel BUGs, especially with > CONFIG_PAGE_POISONING. > > For example, not onlining a DIMM during boot and calling /proc/kpagecount > with CONFIG_PAGE_POISONING: > :/# cat /proc/kpagecount > tmp.test > [ 95.600592] BUG: unable to handle page fault for address: fffffffffffffffe > [ 95.601238] #PF: supervisor read access in kernel mode > [ 95.601675] #PF: error_code(0x0000) - not-present page > [ 95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0 > [ 95.602596] Oops: 0000 [#1] SMP NOPTI > [ 95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11 > [ 95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 > [ 95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0 > [ 95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480 > [ 95.606450] RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202 > [ 95.606904] RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000 > [ 95.607519] RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000 > [ 95.608128] RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000 > [ 95.608731] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 > [ 95.609327] R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08 > [ 95.609924] FS: 00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000 > [ 95.610599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 95.611083] CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0 > [ 95.611686] Call Trace: > [ 95.611906] proc_reg_read+0x3c/0x60 > [ 95.612228] vfs_read+0xc5/0x180 > [ 95.612505] ksys_read+0x68/0xe0 > [ 95.612785] do_syscall_64+0x5c/0xa0 > [ 95.613092] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > For now, let's drop support for ZONE_DEVICE from the three pseudo files > in order to fix this. To distinguish offline memory (with garbage memmap) > from ZONE_DEVICE memory with properly initialized memmaps, we would have to > check get_dev_pagemap() and pfn_zone_device_reserved() right now. The usage > of both (especially, special casing devmem) is frowned upon and needs to > be reworked. The fundamental issue we have is: > > if (pfn_to_online_page(pfn)) { > /* memmap initialized */ > } else if (pfn_valid(pfn)) { > /* > * ??? > * a) offline memory. memmap garbage. > * b) devmem: memmap initialized to ZONE_DEVICE. > * c) devmem: reserved for driver. memmap garbage. > * (d) devmem: memmap currently initializing - garbage) > */ > } > > We'll leave the pfn_zone_device_reserved() check in stable_page_flags() > in place as that function is also used from memory failure. We now > no longer dump information about pages that are not in use anymore - > offline. > > Reported-by: Qian Cai <cai@lca.pw> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Alexey Dobriyan <adobriyan@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Stephen Rothwell <sfr@canb.auug.org.au> > Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com> > Cc: Pankaj gupta <pagupta@redhat.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: Anthony Yznaga <anthony.yznaga@oracle.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > Cc: linux-fsdevel@vger.kernel.org > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > fs/proc/page.c | 28 ++++++++++++++++------------ > 1 file changed, 16 insertions(+), 12 deletions(-) > > diff --git a/fs/proc/page.c b/fs/proc/page.c > index decd3fe39674..e40dbfe1168e 100644 > --- a/fs/proc/page.c > +++ b/fs/proc/page.c > @@ -42,10 +42,12 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf, > return -EINVAL; > > while (count > 0) { > - if (pfn_valid(pfn)) > - ppage = pfn_to_page(pfn); > - else > - ppage = NULL; > + /* > + * TODO: ZONE_DEVICE support requires to identify > + * memmaps that were actually initialized. > + */ > + ppage = pfn_to_online_page(pfn); > + > if (!ppage || PageSlab(ppage) || page_has_type(ppage)) > pcount = 0; > else > @@ -218,10 +220,11 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf, > return -EINVAL; > > while (count > 0) { > - if (pfn_valid(pfn)) > - ppage = pfn_to_page(pfn); > - else > - ppage = NULL; > + /* > + * TODO: ZONE_DEVICE support requires to identify > + * memmaps that were actually initialized. > + */ > + ppage = pfn_to_online_page(pfn); > > if (put_user(stable_page_flags(ppage), out)) { > ret = -EFAULT; > @@ -263,10 +266,11 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf, > return -EINVAL; > > while (count > 0) { > - if (pfn_valid(pfn)) > - ppage = pfn_to_page(pfn); > - else > - ppage = NULL; > + /* > + * TODO: ZONE_DEVICE support requires to identify > + * memmaps that were actually initialized. > + */ > + ppage = pfn_to_online_page(pfn); > > if (ppage) > ino = page_cgroup_ino(ppage); > @Andrew, can you add Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319 And Cc: stable@vger.kernel.org # v4.13+ Thanks! -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-09 14:24 [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers David Hildenbrand 2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand @ 2019-10-09 14:24 ` David Hildenbrand 2019-10-09 14:43 ` Michal Hocko ` (2 more replies) 1 sibling, 3 replies; 20+ messages in thread From: David Hildenbrand @ 2019-10-09 14:24 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Naoya Horiguchi, Andrew Morton, Michal Hocko We should check for pfn_to_online_page() to not access uninitialized memmaps. Reshuffle the code so we don't have to duplicate the error message. Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/memory-failure.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7ef849da8278..e866e6e5660b 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); - if (!pfn_valid(pfn)) { + p = pfn_to_online_page(pfn); + if (!p) { + if (pfn_valid(pfn)) { + pgmap = get_dev_pagemap(pfn, NULL); + if (pgmap) + return memory_failure_dev_pagemap(pfn, flags, + pgmap); + } pr_err("Memory failure: %#lx: memory outside kernel control\n", pfn); return -ENXIO; } - pgmap = get_dev_pagemap(pfn, NULL); - if (pgmap) - return memory_failure_dev_pagemap(pfn, flags, pgmap); - - p = pfn_to_page(pfn); if (PageHuge(p)) return memory_failure_hugetlb(pfn, flags); if (TestSetPageHWPoison(p)) { -- 2.21.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand @ 2019-10-09 14:43 ` Michal Hocko 2019-10-10 7:27 ` David Hildenbrand 2019-10-10 0:26 ` Naoya Horiguchi 2019-10-14 8:41 ` David Hildenbrand 2 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2019-10-09 14:43 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton On Wed 09-10-19 16:24:35, David Hildenbrand wrote: > We should check for pfn_to_online_page() to not access uninitialized > memmaps. Reshuffle the code so we don't have to duplicate the error > message. > > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@kernel.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/memory-failure.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 7ef849da8278..e866e6e5660b 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > if (!sysctl_memory_failure_recovery) > panic("Memory failure on page %lx", pfn); > > - if (!pfn_valid(pfn)) { > + p = pfn_to_online_page(pfn); > + if (!p) { > + if (pfn_valid(pfn)) { > + pgmap = get_dev_pagemap(pfn, NULL); > + if (pgmap) > + return memory_failure_dev_pagemap(pfn, flags, > + pgmap); > + } > pr_err("Memory failure: %#lx: memory outside kernel control\n", > pfn); > return -ENXIO; Don't we need that earlier at hwpoison_inject level? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-09 14:43 ` Michal Hocko @ 2019-10-10 7:27 ` David Hildenbrand 2019-10-10 7:35 ` Michal Hocko 0 siblings, 1 reply; 20+ messages in thread From: David Hildenbrand @ 2019-10-10 7:27 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton On 09.10.19 16:43, Michal Hocko wrote: > On Wed 09-10-19 16:24:35, David Hildenbrand wrote: >> We should check for pfn_to_online_page() to not access uninitialized >> memmaps. Reshuffle the code so we don't have to duplicate the error >> message. >> >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Michal Hocko <mhocko@kernel.org> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/memory-failure.c | 14 ++++++++------ >> 1 file changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 7ef849da8278..e866e6e5660b 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) >> if (!sysctl_memory_failure_recovery) >> panic("Memory failure on page %lx", pfn); >> >> - if (!pfn_valid(pfn)) { >> + p = pfn_to_online_page(pfn); >> + if (!p) { >> + if (pfn_valid(pfn)) { >> + pgmap = get_dev_pagemap(pfn, NULL); >> + if (pgmap) >> + return memory_failure_dev_pagemap(pfn, flags, >> + pgmap); >> + } >> pr_err("Memory failure: %#lx: memory outside kernel control\n", >> pfn); >> return -ENXIO; > > Don't we need that earlier at hwpoison_inject level? > Theoretically yes, this is another instance. But pfn_to_online_page(pfn) alone would not be sufficient as discussed. We would, again, have to special-case ZONE_DEVICE via things like get_dev_pagemap() ... But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: /* * Note that the below poison/unpoison interfaces do not involve * hardware status change, hence do not require hardware support. * They are mainly for testing hwpoison in software level. */ So it's not that bad compared to memory_failure() called from real HW or drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-10 7:27 ` David Hildenbrand @ 2019-10-10 7:35 ` Michal Hocko 2019-10-10 7:52 ` David Hildenbrand 0 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2019-10-10 7:35 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton On Thu 10-10-19 09:27:32, David Hildenbrand wrote: > On 09.10.19 16:43, Michal Hocko wrote: > > On Wed 09-10-19 16:24:35, David Hildenbrand wrote: > >> We should check for pfn_to_online_page() to not access uninitialized > >> memmaps. Reshuffle the code so we don't have to duplicate the error > >> message. > >> > >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> Cc: Michal Hocko <mhocko@kernel.org> > >> Signed-off-by: David Hildenbrand <david@redhat.com> > >> --- > >> mm/memory-failure.c | 14 ++++++++------ > >> 1 file changed, 8 insertions(+), 6 deletions(-) > >> > >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >> index 7ef849da8278..e866e6e5660b 100644 > >> --- a/mm/memory-failure.c > >> +++ b/mm/memory-failure.c > >> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > >> if (!sysctl_memory_failure_recovery) > >> panic("Memory failure on page %lx", pfn); > >> > >> - if (!pfn_valid(pfn)) { > >> + p = pfn_to_online_page(pfn); > >> + if (!p) { > >> + if (pfn_valid(pfn)) { > >> + pgmap = get_dev_pagemap(pfn, NULL); > >> + if (pgmap) > >> + return memory_failure_dev_pagemap(pfn, flags, > >> + pgmap); > >> + } > >> pr_err("Memory failure: %#lx: memory outside kernel control\n", > >> pfn); > >> return -ENXIO; > > > > Don't we need that earlier at hwpoison_inject level? > > > > Theoretically yes, this is another instance. But pfn_to_online_page(pfn) > alone would not be sufficient as discussed. We would, again, have to > special-case ZONE_DEVICE via things like get_dev_pagemap() ... > > But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: > > /* > * Note that the below poison/unpoison interfaces do not involve > * hardware status change, hence do not require hardware support. > * They are mainly for testing hwpoison in software level. > */ > > So it's not that bad compared to memory_failure() called from real HW or > drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() Yes, this is just a toy. And yes we need to handle zone device pages here because a) people likely want to test MCE behavior even on these pages and b) HW can really trigger MCEs there as well. I was just pointing that the patch is likely incomplete. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-10 7:35 ` Michal Hocko @ 2019-10-10 7:52 ` David Hildenbrand 2019-10-10 7:58 ` David Hildenbrand 0 siblings, 1 reply; 20+ messages in thread From: David Hildenbrand @ 2019-10-10 7:52 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton On 10.10.19 09:35, Michal Hocko wrote: > On Thu 10-10-19 09:27:32, David Hildenbrand wrote: >> On 09.10.19 16:43, Michal Hocko wrote: >>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote: >>>> We should check for pfn_to_online_page() to not access uninitialized >>>> memmaps. Reshuffle the code so we don't have to duplicate the error >>>> message. >>>> >>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>> Cc: Michal Hocko <mhocko@kernel.org> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>> --- >>>> mm/memory-failure.c | 14 ++++++++------ >>>> 1 file changed, 8 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>> index 7ef849da8278..e866e6e5660b 100644 >>>> --- a/mm/memory-failure.c >>>> +++ b/mm/memory-failure.c >>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) >>>> if (!sysctl_memory_failure_recovery) >>>> panic("Memory failure on page %lx", pfn); >>>> >>>> - if (!pfn_valid(pfn)) { >>>> + p = pfn_to_online_page(pfn); >>>> + if (!p) { >>>> + if (pfn_valid(pfn)) { >>>> + pgmap = get_dev_pagemap(pfn, NULL); >>>> + if (pgmap) >>>> + return memory_failure_dev_pagemap(pfn, flags, >>>> + pgmap); >>>> + } >>>> pr_err("Memory failure: %#lx: memory outside kernel control\n", >>>> pfn); >>>> return -ENXIO; >>> >>> Don't we need that earlier at hwpoison_inject level? >>> >> >> Theoretically yes, this is another instance. But pfn_to_online_page(pfn) >> alone would not be sufficient as discussed. We would, again, have to >> special-case ZONE_DEVICE via things like get_dev_pagemap() ... >> >> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: >> >> /* >> * Note that the below poison/unpoison interfaces do not involve >> * hardware status change, hence do not require hardware support. >> * They are mainly for testing hwpoison in software level. >> */ >> >> So it's not that bad compared to memory_failure() called from real HW or >> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() > > Yes, this is just a toy. And yes we need to handle zone device pages > here because a) people likely want to test MCE behavior even on these > pages and b) HW can really trigger MCEs there as well. I was just > pointing that the patch is likely incomplete. > I rather think this deserves a separate patch as it is a separate interface :) I do wonder why hwpoison_inject() has to perform so much extra work compared to other memory_failure() users. This smells like legacy leftovers to me, but I might be wrong. The interface is fairly old, though. Does anybody know why we need this magic? I can spot quite some duplicate checks/things getting performed. Naiive me would just make the interface perform the same as hard_offline_page_store(). But most probably I am not getting the real purpose of both different interfaces. HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have guessed that fixing this is not urgent. BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs fixing to make sure we access initialized memmaps. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-10 7:52 ` David Hildenbrand @ 2019-10-10 7:58 ` David Hildenbrand 2019-10-11 6:02 ` Naoya Horiguchi 0 siblings, 1 reply; 20+ messages in thread From: David Hildenbrand @ 2019-10-10 7:58 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-kernel, linux-mm, Naoya Horiguchi, Andrew Morton On 10.10.19 09:52, David Hildenbrand wrote: > On 10.10.19 09:35, Michal Hocko wrote: >> On Thu 10-10-19 09:27:32, David Hildenbrand wrote: >>> On 09.10.19 16:43, Michal Hocko wrote: >>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote: >>>>> We should check for pfn_to_online_page() to not access uninitialized >>>>> memmaps. Reshuffle the code so we don't have to duplicate the error >>>>> message. >>>>> >>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>> Cc: Michal Hocko <mhocko@kernel.org> >>>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>>> --- >>>>> mm/memory-failure.c | 14 ++++++++------ >>>>> 1 file changed, 8 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>> index 7ef849da8278..e866e6e5660b 100644 >>>>> --- a/mm/memory-failure.c >>>>> +++ b/mm/memory-failure.c >>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) >>>>> if (!sysctl_memory_failure_recovery) >>>>> panic("Memory failure on page %lx", pfn); >>>>> >>>>> - if (!pfn_valid(pfn)) { >>>>> + p = pfn_to_online_page(pfn); >>>>> + if (!p) { >>>>> + if (pfn_valid(pfn)) { >>>>> + pgmap = get_dev_pagemap(pfn, NULL); >>>>> + if (pgmap) >>>>> + return memory_failure_dev_pagemap(pfn, flags, >>>>> + pgmap); >>>>> + } >>>>> pr_err("Memory failure: %#lx: memory outside kernel control\n", >>>>> pfn); >>>>> return -ENXIO; >>>> >>>> Don't we need that earlier at hwpoison_inject level? >>>> >>> >>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn) >>> alone would not be sufficient as discussed. We would, again, have to >>> special-case ZONE_DEVICE via things like get_dev_pagemap() ... >>> >>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: >>> >>> /* >>> * Note that the below poison/unpoison interfaces do not involve >>> * hardware status change, hence do not require hardware support. >>> * They are mainly for testing hwpoison in software level. >>> */ >>> >>> So it's not that bad compared to memory_failure() called from real HW or >>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() >> >> Yes, this is just a toy. And yes we need to handle zone device pages >> here because a) people likely want to test MCE behavior even on these >> pages and b) HW can really trigger MCEs there as well. I was just >> pointing that the patch is likely incomplete. >> > > I rather think this deserves a separate patch as it is a separate > interface :) > > I do wonder why hwpoison_inject() has to perform so much extra work > compared to other memory_failure() users. This smells like legacy > leftovers to me, but I might be wrong. The interface is fairly old, > though. Does anybody know why we need this magic? I can spot quite some > duplicate checks/things getting performed. > > Naiive me would just make the interface perform the same as > hard_offline_page_store(). But most probably I am not getting the real > purpose of both different interfaces. > > HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have > guessed that fixing this is not urgent. > > BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs > fixing to make sure we access initialized memmaps. > To be more precise, soft_offline_page_store() needs a pfn_to_online_page() check. Will send a patch. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-10 7:58 ` David Hildenbrand @ 2019-10-11 6:02 ` Naoya Horiguchi 2019-10-11 10:13 ` David Hildenbrand 0 siblings, 1 reply; 20+ messages in thread From: Naoya Horiguchi @ 2019-10-11 6:02 UTC (permalink / raw) To: David Hildenbrand; +Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote: > On 10.10.19 09:52, David Hildenbrand wrote: > > On 10.10.19 09:35, Michal Hocko wrote: > >> On Thu 10-10-19 09:27:32, David Hildenbrand wrote: > >>> On 09.10.19 16:43, Michal Hocko wrote: > >>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote: > >>>>> We should check for pfn_to_online_page() to not access uninitialized > >>>>> memmaps. Reshuffle the code so we don't have to duplicate the error > >>>>> message. > >>>>> > >>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > >>>>> Cc: Andrew Morton <akpm@linux-foundation.org> > >>>>> Cc: Michal Hocko <mhocko@kernel.org> > >>>>> Signed-off-by: David Hildenbrand <david@redhat.com> > >>>>> --- > >>>>> mm/memory-failure.c | 14 ++++++++------ > >>>>> 1 file changed, 8 insertions(+), 6 deletions(-) > >>>>> > >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>>> index 7ef849da8278..e866e6e5660b 100644 > >>>>> --- a/mm/memory-failure.c > >>>>> +++ b/mm/memory-failure.c > >>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > >>>>> if (!sysctl_memory_failure_recovery) > >>>>> panic("Memory failure on page %lx", pfn); > >>>>> > >>>>> - if (!pfn_valid(pfn)) { > >>>>> + p = pfn_to_online_page(pfn); > >>>>> + if (!p) { > >>>>> + if (pfn_valid(pfn)) { > >>>>> + pgmap = get_dev_pagemap(pfn, NULL); > >>>>> + if (pgmap) > >>>>> + return memory_failure_dev_pagemap(pfn, flags, > >>>>> + pgmap); > >>>>> + } > >>>>> pr_err("Memory failure: %#lx: memory outside kernel control\n", > >>>>> pfn); > >>>>> return -ENXIO; > >>>> > >>>> Don't we need that earlier at hwpoison_inject level? > >>>> > >>> > >>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn) > >>> alone would not be sufficient as discussed. We would, again, have to > >>> special-case ZONE_DEVICE via things like get_dev_pagemap() ... > >>> > >>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: > >>> > >>> /* > >>> * Note that the below poison/unpoison interfaces do not involve > >>> * hardware status change, hence do not require hardware support. > >>> * They are mainly for testing hwpoison in software level. > >>> */ > >>> > >>> So it's not that bad compared to memory_failure() called from real HW or > >>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() > >> > >> Yes, this is just a toy. And yes we need to handle zone device pages > >> here because a) people likely want to test MCE behavior even on these > >> pages and b) HW can really trigger MCEs there as well. I was just > >> pointing that the patch is likely incomplete. > >> > > > > I rather think this deserves a separate patch as it is a separate > > interface :) > > > > I do wonder why hwpoison_inject() has to perform so much extra work > > compared to other memory_failure() users. This smells like legacy > > leftovers to me, but I might be wrong. The interface is fairly old, > > though. Does anybody know why we need this magic? I can spot quite some > > duplicate checks/things getting performed. It concerns me too, this *is* an old legacy code. I guess it was left as-is because no one complained about it. That's not good, so I'll do some cleanup. > > > > Naiive me would just make the interface perform the same as > > hard_offline_page_store(). But most probably I am not getting the real > > purpose of both different interfaces. Maybe for historical reason, we have these slightly different interfaces: - corrupt-pfn - purely for debugging purpose - paired with unpoison-pfn - used by in-kernel tool tools/vm/page-types.c - hard_offline_page - paired with soft_offline_page - used by other userspace tools like mcelog But these don't explain why implemented differently, so I think that both should be written in the same manner. > > > > HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have > > guessed that fixing this is not urgent. > > > > BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs > > fixing to make sure we access initialized memmaps. > > > > To be more precise, soft_offline_page_store() needs a > pfn_to_online_page() check. Will send a patch. Thanks for finding this. - Naoya Horiguchi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-11 6:02 ` Naoya Horiguchi @ 2019-10-11 10:13 ` David Hildenbrand 2019-10-14 13:36 ` Michal Hocko 0 siblings, 1 reply; 20+ messages in thread From: David Hildenbrand @ 2019-10-11 10:13 UTC (permalink / raw) To: Naoya Horiguchi; +Cc: Michal Hocko, linux-kernel, linux-mm, Andrew Morton On 11.10.19 08:02, Naoya Horiguchi wrote: > On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote: >> On 10.10.19 09:52, David Hildenbrand wrote: >>> On 10.10.19 09:35, Michal Hocko wrote: >>>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote: >>>>> On 09.10.19 16:43, Michal Hocko wrote: >>>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote: >>>>>>> We should check for pfn_to_online_page() to not access uninitialized >>>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error >>>>>>> message. >>>>>>> >>>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>>>> Cc: Michal Hocko <mhocko@kernel.org> >>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>>>>> --- >>>>>>> mm/memory-failure.c | 14 ++++++++------ >>>>>>> 1 file changed, 8 insertions(+), 6 deletions(-) >>>>>>> >>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>>>> index 7ef849da8278..e866e6e5660b 100644 >>>>>>> --- a/mm/memory-failure.c >>>>>>> +++ b/mm/memory-failure.c >>>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) >>>>>>> if (!sysctl_memory_failure_recovery) >>>>>>> panic("Memory failure on page %lx", pfn); >>>>>>> >>>>>>> - if (!pfn_valid(pfn)) { >>>>>>> + p = pfn_to_online_page(pfn); >>>>>>> + if (!p) { >>>>>>> + if (pfn_valid(pfn)) { >>>>>>> + pgmap = get_dev_pagemap(pfn, NULL); >>>>>>> + if (pgmap) >>>>>>> + return memory_failure_dev_pagemap(pfn, flags, >>>>>>> + pgmap); >>>>>>> + } >>>>>>> pr_err("Memory failure: %#lx: memory outside kernel control\n", >>>>>>> pfn); >>>>>>> return -ENXIO; >>>>>> >>>>>> Don't we need that earlier at hwpoison_inject level? >>>>>> >>>>> >>>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn) >>>>> alone would not be sufficient as discussed. We would, again, have to >>>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ... >>>>> >>>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: >>>>> >>>>> /* >>>>> * Note that the below poison/unpoison interfaces do not involve >>>>> * hardware status change, hence do not require hardware support. >>>>> * They are mainly for testing hwpoison in software level. >>>>> */ >>>>> >>>>> So it's not that bad compared to memory_failure() called from real HW or >>>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() >>>> >>>> Yes, this is just a toy. And yes we need to handle zone device pages >>>> here because a) people likely want to test MCE behavior even on these >>>> pages and b) HW can really trigger MCEs there as well. I was just >>>> pointing that the patch is likely incomplete. >>>> >>> >>> I rather think this deserves a separate patch as it is a separate >>> interface :) >>> >>> I do wonder why hwpoison_inject() has to perform so much extra work >>> compared to other memory_failure() users. This smells like legacy >>> leftovers to me, but I might be wrong. The interface is fairly old, >>> though. Does anybody know why we need this magic? I can spot quite some >>> duplicate checks/things getting performed. > > It concerns me too, this *is* an old legacy code. I guess it was left as-is > because no one complained about it. That's not good, so I'll do some cleanup. Most of that stuff was introduced in commit 31d3d3484f9bd263925ecaa341500ac2df3a5d9b Author: Wu Fengguang <fengguang.wu@intel.com> Date: Wed Dec 16 12:19:59 2009 +0100 HWPOISON: limit hwpoison injector to known page types __memory_failure()'s workflow is set PG_hwpoison //... unset PG_hwpoison if didn't pass hwpoison filter That could kill unrelated process if it happens to page fault on the page with the (temporary) PG_hwpoison. The race should be big enough to appear in stress tests. Fix it by grabbing the page and checking filter at inject time. This also avoids the very noisy "Injecting memory failure..." messages. Now, we still have the same "issue" in memory_failure() today: if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); return 0; } [...] if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); unlock_page(p); put_hwpoison_page(p); return 0; } However, I don't understand why we need that special handling only for this debug interface and not the other users. I'd vote for ripping out that legacy crap (so the interface works correctly with ZONE_DEVICE) and instead (if really required) rework memory_failure() to not produce such side effects. > >>> >>> Naiive me would just make the interface perform the same as >>> hard_offline_page_store(). But most probably I am not getting the real >>> purpose of both different interfaces. > > Maybe for historical reason, we have these slightly different interfaces: > > - corrupt-pfn > - purely for debugging purpose > - paired with unpoison-pfn > - used by in-kernel tool tools/vm/page-types.c > - hard_offline_page > - paired with soft_offline_page > - used by other userspace tools like mcelog > > But these don't explain why implemented differently, so I think that both > should be written in the same manner. > >>> >>> HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have >>> guessed that fixing this is not urgent. >>> >>> BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs >>> fixing to make sure we access initialized memmaps. >>> >> >> To be more precise, soft_offline_page_store() needs a >> pfn_to_online_page() check. Will send a patch. > > Thanks for finding this. > > - Naoya Horiguchi > -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-11 10:13 ` David Hildenbrand @ 2019-10-14 13:36 ` Michal Hocko 2019-10-15 14:23 ` Oscar Salvador 0 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2019-10-14 13:36 UTC (permalink / raw) To: David Hildenbrand Cc: Naoya Horiguchi, linux-kernel, linux-mm, Andrew Morton, Oscar Salvador [Cc Oscar] On Fri 11-10-19 12:13:17, David Hildenbrand wrote: > On 11.10.19 08:02, Naoya Horiguchi wrote: > > On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote: > >> On 10.10.19 09:52, David Hildenbrand wrote: > >>> On 10.10.19 09:35, Michal Hocko wrote: > >>>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote: > >>>>> On 09.10.19 16:43, Michal Hocko wrote: > >>>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote: > >>>>>>> We should check for pfn_to_online_page() to not access uninitialized > >>>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error > >>>>>>> message. > >>>>>>> > >>>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > >>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org> > >>>>>>> Cc: Michal Hocko <mhocko@kernel.org> > >>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com> > >>>>>>> --- > >>>>>>> mm/memory-failure.c | 14 ++++++++------ > >>>>>>> 1 file changed, 8 insertions(+), 6 deletions(-) > >>>>>>> > >>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>>>>> index 7ef849da8278..e866e6e5660b 100644 > >>>>>>> --- a/mm/memory-failure.c > >>>>>>> +++ b/mm/memory-failure.c > >>>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > >>>>>>> if (!sysctl_memory_failure_recovery) > >>>>>>> panic("Memory failure on page %lx", pfn); > >>>>>>> > >>>>>>> - if (!pfn_valid(pfn)) { > >>>>>>> + p = pfn_to_online_page(pfn); > >>>>>>> + if (!p) { > >>>>>>> + if (pfn_valid(pfn)) { > >>>>>>> + pgmap = get_dev_pagemap(pfn, NULL); > >>>>>>> + if (pgmap) > >>>>>>> + return memory_failure_dev_pagemap(pfn, flags, > >>>>>>> + pgmap); > >>>>>>> + } > >>>>>>> pr_err("Memory failure: %#lx: memory outside kernel control\n", > >>>>>>> pfn); > >>>>>>> return -ENXIO; > >>>>>> > >>>>>> Don't we need that earlier at hwpoison_inject level? > >>>>>> > >>>>> > >>>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn) > >>>>> alone would not be sufficient as discussed. We would, again, have to > >>>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ... > >>>>> > >>>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way: > >>>>> > >>>>> /* > >>>>> * Note that the below poison/unpoison interfaces do not involve > >>>>> * hardware status change, hence do not require hardware support. > >>>>> * They are mainly for testing hwpoison in software level. > >>>>> */ > >>>>> > >>>>> So it's not that bad compared to memory_failure() called from real HW or > >>>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store() > >>>> > >>>> Yes, this is just a toy. And yes we need to handle zone device pages > >>>> here because a) people likely want to test MCE behavior even on these > >>>> pages and b) HW can really trigger MCEs there as well. I was just > >>>> pointing that the patch is likely incomplete. > >>>> > >>> > >>> I rather think this deserves a separate patch as it is a separate > >>> interface :) > >>> > >>> I do wonder why hwpoison_inject() has to perform so much extra work > >>> compared to other memory_failure() users. This smells like legacy > >>> leftovers to me, but I might be wrong. The interface is fairly old, > >>> though. Does anybody know why we need this magic? I can spot quite some > >>> duplicate checks/things getting performed. > > > > It concerns me too, this *is* an old legacy code. I guess it was left as-is > > because no one complained about it. That's not good, so I'll do some cleanup. > > Most of that stuff was introduced in > > commit 31d3d3484f9bd263925ecaa341500ac2df3a5d9b > Author: Wu Fengguang <fengguang.wu@intel.com> > Date: Wed Dec 16 12:19:59 2009 +0100 > > HWPOISON: limit hwpoison injector to known page types > > __memory_failure()'s workflow is > > set PG_hwpoison > //... > unset PG_hwpoison if didn't pass hwpoison filter > > That could kill unrelated process if it happens to page fault on the > page with the (temporary) PG_hwpoison. The race should be big enough to > appear in stress tests. > > Fix it by grabbing the page and checking filter at inject time. This > also avoids the very noisy "Injecting memory failure..." messages. > > > Now, we still have the same "issue" in memory_failure() today: > > > if (TestSetPageHWPoison(p)) { > pr_err("Memory failure: %#lx: already hardware poisoned\n", > pfn); > return 0; > } > [...] > if (hwpoison_filter(p)) { > if (TestClearPageHWPoison(p)) > num_poisoned_pages_dec(); > unlock_page(p); > put_hwpoison_page(p); > return 0; > } > > However, I don't understand why we need that special handling only for this > debug interface and not the other users. > > I'd vote for ripping out that legacy crap (so the interface works correctly > with ZONE_DEVICE) and instead (if really required) rework memory_failure() > to not produce such side effects. I do agree. The two should be really using the same code. My understanding was that MADV_HWPOISON was there to test the actual MCE behavior (and the man page seems to agree with that). Oscar is working on a rewrite. Not sure he has considered this as well. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-14 13:36 ` Michal Hocko @ 2019-10-15 14:23 ` Oscar Salvador 0 siblings, 0 replies; 20+ messages in thread From: Oscar Salvador @ 2019-10-15 14:23 UTC (permalink / raw) To: Michal Hocko Cc: David Hildenbrand, Naoya Horiguchi, linux-kernel, linux-mm, Andrew Morton, Oscar Salvador On Mon, Oct 14, 2019 at 03:36:17PM +0200, Michal Hocko wrote: > I do agree. The two should be really using the same code. My > understanding was that MADV_HWPOISON was there to test the actual MCE > behavior (and the man page seems to agree with that). > > Oscar is working on a rewrite. Not sure he has considered this as well. Yeah, I came across hwpoison-inject module when doing my re-write, and I felt like this is begging for a clean up. Since unpoison_memory needs some adjustments after my re-write, I will go ahead and clean that up, otherwise it will be inconsistent. I expect to be ready fo send the v2 by the end of this week. Thanks -- Oscar Salvador SUSE L3 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand 2019-10-09 14:43 ` Michal Hocko @ 2019-10-10 0:26 ` Naoya Horiguchi 2019-10-10 7:17 ` David Hildenbrand 2019-10-14 8:41 ` David Hildenbrand 2 siblings, 1 reply; 20+ messages in thread From: Naoya Horiguchi @ 2019-10-10 0:26 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko On Wed, Oct 09, 2019 at 04:24:35PM +0200, David Hildenbrand wrote: > We should check for pfn_to_online_page() to not access uninitialized > memmaps. Reshuffle the code so we don't have to duplicate the error > message. > > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@kernel.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/memory-failure.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 7ef849da8278..e866e6e5660b 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > if (!sysctl_memory_failure_recovery) > panic("Memory failure on page %lx", pfn); > > - if (!pfn_valid(pfn)) { > + p = pfn_to_online_page(pfn); > + if (!p) { > + if (pfn_valid(pfn)) { > + pgmap = get_dev_pagemap(pfn, NULL); > + if (pgmap) > + return memory_failure_dev_pagemap(pfn, flags, > + pgmap); > + } > pr_err("Memory failure: %#lx: memory outside kernel control\n", > pfn); > return -ENXIO; > } > > - pgmap = get_dev_pagemap(pfn, NULL); > - if (pgmap) > - return memory_failure_dev_pagemap(pfn, flags, pgmap); > - > - p = pfn_to_page(pfn); This change seems to assume that memory_failure_dev_pagemap() is never called for online pages. Is it an intended behavior? Or the concept "online pages" is not applicable to zone device pages? Thanks, Naoya Horiguchi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-10 0:26 ` Naoya Horiguchi @ 2019-10-10 7:17 ` David Hildenbrand 2019-10-11 6:50 ` Naoya Horiguchi 2019-10-19 2:05 ` Andrew Morton 0 siblings, 2 replies; 20+ messages in thread From: David Hildenbrand @ 2019-10-10 7:17 UTC (permalink / raw) To: Naoya Horiguchi; +Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko On 10.10.19 02:26, Naoya Horiguchi wrote: > On Wed, Oct 09, 2019 at 04:24:35PM +0200, David Hildenbrand wrote: >> We should check for pfn_to_online_page() to not access uninitialized >> memmaps. Reshuffle the code so we don't have to duplicate the error >> message. >> >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Michal Hocko <mhocko@kernel.org> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/memory-failure.c | 14 ++++++++------ >> 1 file changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 7ef849da8278..e866e6e5660b 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) >> if (!sysctl_memory_failure_recovery) >> panic("Memory failure on page %lx", pfn); >> >> - if (!pfn_valid(pfn)) { >> + p = pfn_to_online_page(pfn); >> + if (!p) { >> + if (pfn_valid(pfn)) { >> + pgmap = get_dev_pagemap(pfn, NULL); >> + if (pgmap) >> + return memory_failure_dev_pagemap(pfn, flags, >> + pgmap); >> + } >> pr_err("Memory failure: %#lx: memory outside kernel control\n", >> pfn); >> return -ENXIO; >> } >> >> - pgmap = get_dev_pagemap(pfn, NULL); >> - if (pgmap) >> - return memory_failure_dev_pagemap(pfn, flags, pgmap); >> - >> - p = pfn_to_page(pfn); > > This change seems to assume that memory_failure_dev_pagemap() is never > called for online pages. Is it an intended behavior? > Or the concept "online pages" is not applicable to zone device pages? Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online (SECTION_IS_ONLINE). The terminology "online" only applies to pages that were given to the buddy. And as we support sup-section hotadd for devmem, we cannot easily make use of the section flag it. I already proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap and call it something like pfn_active(). pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we could use pfn_active() everywhere to test for initialized memmaps (well, besides some special cases like device reserved memory that does not span full sub-sections). Until now, nobody volunteered and I have other things to do. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-10 7:17 ` David Hildenbrand @ 2019-10-11 6:50 ` Naoya Horiguchi 2019-10-19 2:05 ` Andrew Morton 1 sibling, 0 replies; 20+ messages in thread From: Naoya Horiguchi @ 2019-10-11 6:50 UTC (permalink / raw) To: David Hildenbrand; +Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko On Thu, Oct 10, 2019 at 09:17:42AM +0200, David Hildenbrand wrote: > On 10.10.19 02:26, Naoya Horiguchi wrote: > > On Wed, Oct 09, 2019 at 04:24:35PM +0200, David Hildenbrand wrote: > >> We should check for pfn_to_online_page() to not access uninitialized > >> memmaps. Reshuffle the code so we don't have to duplicate the error > >> message. > >> > >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> Cc: Michal Hocko <mhocko@kernel.org> > >> Signed-off-by: David Hildenbrand <david@redhat.com> > >> --- > >> mm/memory-failure.c | 14 ++++++++------ > >> 1 file changed, 8 insertions(+), 6 deletions(-) > >> > >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >> index 7ef849da8278..e866e6e5660b 100644 > >> --- a/mm/memory-failure.c > >> +++ b/mm/memory-failure.c > >> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > >> if (!sysctl_memory_failure_recovery) > >> panic("Memory failure on page %lx", pfn); > >> > >> - if (!pfn_valid(pfn)) { > >> + p = pfn_to_online_page(pfn); > >> + if (!p) { > >> + if (pfn_valid(pfn)) { > >> + pgmap = get_dev_pagemap(pfn, NULL); > >> + if (pgmap) > >> + return memory_failure_dev_pagemap(pfn, flags, > >> + pgmap); > >> + } > >> pr_err("Memory failure: %#lx: memory outside kernel control\n", > >> pfn); > >> return -ENXIO; > >> } > >> > >> - pgmap = get_dev_pagemap(pfn, NULL); > >> - if (pgmap) > >> - return memory_failure_dev_pagemap(pfn, flags, pgmap); > >> - > >> - p = pfn_to_page(pfn); > > > > This change seems to assume that memory_failure_dev_pagemap() is never > > called for online pages. Is it an intended behavior? > > Or the concept "online pages" is not applicable to zone device pages? > > Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online > (SECTION_IS_ONLINE). The terminology "online" only applies to pages that > were given to the buddy. And as we support sup-section hotadd for > devmem, we cannot easily make use of the section flag it. I already > proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap > and call it something like pfn_active(). > > pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we > could use pfn_active() everywhere to test for initialized memmaps (well, > besides some special cases like device reserved memory that does not > span full sub-sections). Until now, nobody volunteered and I have other > things to do. Thank you for explanation. I'm not sure how hard now but we try to do this. You fix the problem from online/offline viewpoint now, and leave remaining part untouched. I agree with that approach, and the suggested change looks good to me. Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-10 7:17 ` David Hildenbrand 2019-10-11 6:50 ` Naoya Horiguchi @ 2019-10-19 2:05 ` Andrew Morton 2019-10-21 9:44 ` David Hildenbrand 1 sibling, 1 reply; 20+ messages in thread From: Andrew Morton @ 2019-10-19 2:05 UTC (permalink / raw) To: David Hildenbrand; +Cc: Naoya Horiguchi, linux-kernel, linux-mm, Michal Hocko On Thu, 10 Oct 2019 09:17:42 +0200 David Hildenbrand <david@redhat.com> wrote: > >> - pgmap = get_dev_pagemap(pfn, NULL); > >> - if (pgmap) > >> - return memory_failure_dev_pagemap(pfn, flags, pgmap); > >> - > >> - p = pfn_to_page(pfn); > > > > This change seems to assume that memory_failure_dev_pagemap() is never > > called for online pages. Is it an intended behavior? > > Or the concept "online pages" is not applicable to zone device pages? > > Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online > (SECTION_IS_ONLINE). The terminology "online" only applies to pages that > were given to the buddy. And as we support sup-section hotadd for > devmem, we cannot easily make use of the section flag it. I already > proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap > and call it something like pfn_active(). > > pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we > could use pfn_active() everywhere to test for initialized memmaps (well, > besides some special cases like device reserved memory that does not > span full sub-sections). Until now, nobody volunteered and I have other > things to do. Is it worth a code comment or two to make this clearer? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-19 2:05 ` Andrew Morton @ 2019-10-21 9:44 ` David Hildenbrand 0 siblings, 0 replies; 20+ messages in thread From: David Hildenbrand @ 2019-10-21 9:44 UTC (permalink / raw) To: Andrew Morton; +Cc: Naoya Horiguchi, linux-kernel, linux-mm, Michal Hocko On 19.10.19 04:05, Andrew Morton wrote: > On Thu, 10 Oct 2019 09:17:42 +0200 David Hildenbrand <david@redhat.com> wrote: > >>>> - pgmap = get_dev_pagemap(pfn, NULL); >>>> - if (pgmap) >>>> - return memory_failure_dev_pagemap(pfn, flags, pgmap); >>>> - >>>> - p = pfn_to_page(pfn); >>> >>> This change seems to assume that memory_failure_dev_pagemap() is never >>> called for online pages. Is it an intended behavior? >>> Or the concept "online pages" is not applicable to zone device pages? >> >> Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online >> (SECTION_IS_ONLINE). The terminology "online" only applies to pages that >> were given to the buddy. And as we support sup-section hotadd for >> devmem, we cannot easily make use of the section flag it. I already >> proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap >> and call it something like pfn_active(). >> >> pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we >> could use pfn_active() everywhere to test for initialized memmaps (well, >> besides some special cases like device reserved memory that does not >> span full sub-sections). Until now, nobody volunteered and I have other >> things to do. > > Is it worth a code comment or two to make this clearer? You mean something like /* Only pages managed by the buddy are online (not ZONE_DEVICE). */ ? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() 2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand 2019-10-09 14:43 ` Michal Hocko 2019-10-10 0:26 ` Naoya Horiguchi @ 2019-10-14 8:41 ` David Hildenbrand 2 siblings, 0 replies; 20+ messages in thread From: David Hildenbrand @ 2019-10-14 8:41 UTC (permalink / raw) To: linux-kernel; +Cc: linux-mm, Naoya Horiguchi, Andrew Morton, Michal Hocko On 09.10.19 16:24, David Hildenbrand wrote: > We should check for pfn_to_online_page() to not access uninitialized > memmaps. Reshuffle the code so we don't have to duplicate the error > message. > > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@kernel.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/memory-failure.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 7ef849da8278..e866e6e5660b 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags) > if (!sysctl_memory_failure_recovery) > panic("Memory failure on page %lx", pfn); > > - if (!pfn_valid(pfn)) { > + p = pfn_to_online_page(pfn); > + if (!p) { > + if (pfn_valid(pfn)) { > + pgmap = get_dev_pagemap(pfn, NULL); > + if (pgmap) > + return memory_failure_dev_pagemap(pfn, flags, > + pgmap); > + } > pr_err("Memory failure: %#lx: memory outside kernel control\n", > pfn); > return -ENXIO; > } > > - pgmap = get_dev_pagemap(pfn, NULL); > - if (pgmap) > - return memory_failure_dev_pagemap(pfn, flags, pgmap); > - > - p = pfn_to_page(pfn); > if (PageHuge(p)) > return memory_failure_hugetlb(pfn, flags); > if (TestSetPageHWPoison(p)) { > @Andrew, can you add Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e86b319 And Cc: stable@vger.kernel.org # v4.13+ The stable backports won't be clean cherry-picks AFAIKS, but do-able. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2019-10-21 9:44 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-09 14:24 [PATCH v2 0/2] mm: Don't access uninitialized memmaps in PFN walkers David Hildenbrand 2019-10-09 14:24 ` [PATCH v2 1/2] mm: Don't access uninitialized memmaps in fs/proc/page.c David Hildenbrand 2019-10-09 14:41 ` Michal Hocko 2019-10-14 8:44 ` David Hildenbrand 2019-10-09 14:24 ` [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure() David Hildenbrand 2019-10-09 14:43 ` Michal Hocko 2019-10-10 7:27 ` David Hildenbrand 2019-10-10 7:35 ` Michal Hocko 2019-10-10 7:52 ` David Hildenbrand 2019-10-10 7:58 ` David Hildenbrand 2019-10-11 6:02 ` Naoya Horiguchi 2019-10-11 10:13 ` David Hildenbrand 2019-10-14 13:36 ` Michal Hocko 2019-10-15 14:23 ` Oscar Salvador 2019-10-10 0:26 ` Naoya Horiguchi 2019-10-10 7:17 ` David Hildenbrand 2019-10-11 6:50 ` Naoya Horiguchi 2019-10-19 2:05 ` Andrew Morton 2019-10-21 9:44 ` David Hildenbrand 2019-10-14 8:41 ` David Hildenbrand
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.