* [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
@ 2020-03-25 3:19 Aneesh Kumar K.V
2020-03-25 7:06 ` Baoquan He
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-25 3:19 UTC (permalink / raw)
To: linux-mm, akpm
Cc: linux-kernel, mpe, linuxppc-dev, Aneesh Kumar K.V, Baoquan He,
Sachin Sant
Fixes the below crash
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
section_deactivate+0x220/0x240
__remove_pages+0x118/0x170
arch_remove_memory+0x3c/0x150
memunmap_pages+0x1cc/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x2f8/0x3e0
device_release_driver_internal+0x168/0x270
unbind_store+0x130/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x68/0x80
kernfs_fop_write+0x100/0x290
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x240
ksys_write+0x7c/0x140
system_call+0x5c/0x68
With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
section_mem_map is set to NULL after depopulate_section_mem(). This
was done so that pfn_page() can work correctly with kernel config that disables
SPARSEMEM_VMEMMAP. With that config pfn_to_page does
__section_mem_map_addr(__sec) + __pfn;
where
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
check the pfn validity (pfn_valid()). Since section_deactivate release
mem_section->usage if a section is fully deactivated, pfn_valid() check after
a subsection_deactivate cause a kernel crash.
static inline int pfn_valid(unsigned long pfn)
{
...
return early_section(ms) || pfn_section_valid(ms, pfn);
}
where
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
return test_bit(idx, ms->usage->subsection_map);
}
Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Cc: Baoquan He <bhe@redhat.com>
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
mm/sparse.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/sparse.c b/mm/sparse.c
index aadb7298dcef..3012d1f3771a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+ /* Mark the section invalid */
+ ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
if (section_is_early && memmap)
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-25 3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
@ 2020-03-25 7:06 ` Baoquan He
2020-03-25 7:37 ` Baoquan He
2020-03-26 0:38 ` Andrew Morton
2020-03-26 9:40 ` Michal Hocko
2 siblings, 1 reply; 10+ messages in thread
From: Baoquan He @ 2020-03-25 7:06 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
dan.j.williams
On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:
> Fixes the below crash
>
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> ...
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
> section_deactivate+0x220/0x240
> __remove_pages+0x118/0x170
> arch_remove_memory+0x3c/0x150
> memunmap_pages+0x1cc/0x2f0
> devm_action_release+0x30/0x50
> release_nodes+0x2f8/0x3e0
> device_release_driver_internal+0x168/0x270
> unbind_store+0x130/0x170
> drv_attr_store+0x44/0x60
> sysfs_kf_write+0x68/0x80
> kernfs_fop_write+0x100/0x290
> __vfs_write+0x3c/0x70
> vfs_write+0xcc/0x240
> ksys_write+0x7c/0x140
> system_call+0x5c/0x68
>
> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> section_mem_map is set to NULL after depopulate_section_mem(). This
> was done so that pfn_page() can work correctly with kernel config that disables
> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
>
> __section_mem_map_addr(__sec) + __pfn;
> where
>
> static inline struct page *__section_mem_map_addr(struct mem_section *section)
> {
> unsigned long map = section->section_mem_map;
> map &= SECTION_MAP_MASK;
> return (struct page *)map;
> }
>
> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
> check the pfn validity (pfn_valid()). Since section_deactivate release
> mem_section->usage if a section is fully deactivated, pfn_valid() check after
> a subsection_deactivate cause a kernel crash.
>
> static inline int pfn_valid(unsigned long pfn)
> {
> ...
> return early_section(ms) || pfn_section_valid(ms, pfn);
> }
>
> where
>
> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> {
> int idx = subsection_map_index(pfn);
>
> return test_bit(idx, ms->usage->subsection_map);
> }
>
> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
>
> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> Cc: Baoquan He <bhe@redhat.com>
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Maybe add Sachin's Tested-by, Sachin has tested and confirmed this fix
works.
> ---
> mm/sparse.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->usage = NULL;
> }
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> + /* Mark the section invalid */
> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
Not sure if we should add checking in valid_section() or pfn_valid(),
e.g check ms->usage validation too. Otherwise, this fix looks good to
me.
Reviewed-by: Baoquan He <bhe@redhat.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-25 7:06 ` Baoquan He
@ 2020-03-25 7:37 ` Baoquan He
2020-03-25 8:12 ` Aneesh Kumar K.V
0 siblings, 1 reply; 10+ messages in thread
From: Baoquan He @ 2020-03-25 7:37 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
dan.j.williams
On 03/25/20 at 03:06pm, Baoquan He wrote:
> On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:
> > mm/sparse.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index aadb7298dcef..3012d1f3771a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > ms->usage = NULL;
> > }
> > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > + /* Mark the section invalid */
> > + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>
> Not sure if we should add checking in valid_section() or pfn_valid(),
> e.g check ms->usage validation too. Otherwise, this fix looks good to
> me.
With SPASEMEM_VMEMAP enabled, we should do validation check on ms->usage
before checking any subsection is valid. Since now we do have case
in which ms->usage is released, people still try to check it.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f0a2c184eb9a..d79bd938852e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1306,6 +1306,8 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
+ if (!ms->usage)
+ return 0;
return test_bit(idx, ms->usage->subsection_map);
}
#else
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-25 7:37 ` Baoquan He
@ 2020-03-25 8:12 ` Aneesh Kumar K.V
2020-03-25 8:36 ` Baoquan He
0 siblings, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-25 8:12 UTC (permalink / raw)
To: Baoquan He
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
dan.j.williams
On 3/25/20 1:07 PM, Baoquan He wrote:
> On 03/25/20 at 03:06pm, Baoquan He wrote:
>> On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:
>
>>> mm/sparse.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index aadb7298dcef..3012d1f3771a 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>> ms->usage = NULL;
>>> }
>>> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>>> + /* Mark the section invalid */
>>> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>>
>> Not sure if we should add checking in valid_section() or pfn_valid(),
>> e.g check ms->usage validation too. Otherwise, this fix looks good to
>> me.
>
> With SPASEMEM_VMEMAP enabled, we should do validation check on ms->usage
> before checking any subsection is valid. Since now we do have case
> in which ms->usage is released, people still try to check it.
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f0a2c184eb9a..d79bd938852e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1306,6 +1306,8 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> {
> int idx = subsection_map_index(pfn);
>
> + if (!ms->usage)
> + return 0;
> return test_bit(idx, ms->usage->subsection_map);
> }
> #else
>
We always check for section valid, before we check if pfn_section_valid().
static inline int pfn_valid(unsigned long pfn)
struct mem_section *ms;
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
ms = __nr_to_section(pfn_to_section_nr(pfn));
if (!valid_section(ms))
return 0;
/*
* Traditionally early sections always returned pfn_valid() for
* the entire section-sized span.
*/
return early_section(ms) || pfn_section_valid(ms, pfn);
}
IMHO adding that if (!ms->usage) is redundant.
-aneesh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-25 8:12 ` Aneesh Kumar K.V
@ 2020-03-25 8:36 ` Baoquan He
0 siblings, 0 replies; 10+ messages in thread
From: Baoquan He @ 2020-03-25 8:36 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
dan.j.williams, mhocko, david
On 03/25/20 at 01:42pm, Aneesh Kumar K.V wrote:
> On 3/25/20 1:07 PM, Baoquan He wrote:
> > On 03/25/20 at 03:06pm, Baoquan He wrote:
> > > On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:
> >
> > > > mm/sparse.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index aadb7298dcef..3012d1f3771a 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > > > ms->usage = NULL;
> > > > }
> > > > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > > > + /* Mark the section invalid */
> > > > + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> > >
> > > Not sure if we should add checking in valid_section() or pfn_valid(),
> > > e.g check ms->usage validation too. Otherwise, this fix looks good to
> > > me.
> >
> > With SPASEMEM_VMEMAP enabled, we should do validation check on ms->usage
> > before checking any subsection is valid. Since now we do have case
> > in which ms->usage is released, people still try to check it.
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index f0a2c184eb9a..d79bd938852e 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1306,6 +1306,8 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> > {
> > int idx = subsection_map_index(pfn);
> > + if (!ms->usage)
> > + return 0;
> > return test_bit(idx, ms->usage->subsection_map);
> > }
> > #else
> >
>
> We always check for section valid, before we check if pfn_section_valid().
>
> static inline int pfn_valid(unsigned long pfn)
>
> struct mem_section *ms;
>
> if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> return 0;
> ms = __nr_to_section(pfn_to_section_nr(pfn));
> if (!valid_section(ms))
> return 0;
> /*
> * Traditionally early sections always returned pfn_valid() for
> * the entire section-sized span.
> */
> return early_section(ms) || pfn_section_valid(ms, pfn);
> }
>
>
> IMHO adding that if (!ms->usage) is redundant.
Yeah, I tend to agree. Consider this happens in the only small window
between ms->usage releasing and ms->section_mem_map releasing when
removing a section. Just thought adding this check to enhance it even
though we have had your fix, because we only check ms->section_mem_map
in valid_section(). Anyway, your fix looks good to me, see if other
people have any comment.
Thanks
Baoquan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-25 3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
2020-03-25 7:06 ` Baoquan He
@ 2020-03-26 0:38 ` Andrew Morton
2020-03-26 9:40 ` Michal Hocko
2 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2020-03-26 0:38 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, linux-kernel, mpe, linuxppc-dev, Baoquan He,
Sachin Sant, Pankaj Gupta, David Hildenbrand, Michal Hocko,
Wei Yang, Oscar Salvador, Mike Rapoport
On Wed, 25 Mar 2020 08:49:14 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
> Fixes the below crash
(cc's added)
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> ...
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
> section_deactivate+0x220/0x240
> __remove_pages+0x118/0x170
> arch_remove_memory+0x3c/0x150
> memunmap_pages+0x1cc/0x2f0
> devm_action_release+0x30/0x50
> release_nodes+0x2f8/0x3e0
> device_release_driver_internal+0x168/0x270
> unbind_store+0x130/0x170
> drv_attr_store+0x44/0x60
> sysfs_kf_write+0x68/0x80
> kernfs_fop_write+0x100/0x290
> __vfs_write+0x3c/0x70
> vfs_write+0xcc/0x240
> ksys_write+0x7c/0x140
> system_call+0x5c/0x68
>
> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> section_mem_map is set to NULL after depopulate_section_mem(). This
> was done so that pfn_page() can work correctly with kernel config that disables
> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
>
> __section_mem_map_addr(__sec) + __pfn;
> where
>
> static inline struct page *__section_mem_map_addr(struct mem_section *section)
> {
> unsigned long map = section->section_mem_map;
> map &= SECTION_MAP_MASK;
> return (struct page *)map;
> }
>
> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
> check the pfn validity (pfn_valid()). Since section_deactivate release
> mem_section->usage if a section is fully deactivated, pfn_valid() check after
> a subsection_deactivate cause a kernel crash.
>
> static inline int pfn_valid(unsigned long pfn)
> {
> ...
> return early_section(ms) || pfn_section_valid(ms, pfn);
> }
>
> where
>
> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> {
> int idx = subsection_map_index(pfn);
>
> return test_bit(idx, ms->usage->subsection_map);
> }
>
> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
>
> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
d41e2f3bd546 had cc:stable, so I shall add cc:stable to this one as well.
> Cc: Baoquan He <bhe@redhat.com>
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> mm/sparse.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->usage = NULL;
> }
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> + /* Mark the section invalid */
> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> }
>
> if (section_is_early && memmap)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-25 3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
2020-03-25 7:06 ` Baoquan He
2020-03-26 0:38 ` Andrew Morton
@ 2020-03-26 9:40 ` Michal Hocko
2020-03-26 9:56 ` Aneesh Kumar K.V
2 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-03-26 9:40 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant
On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
> Fixes the below crash
>
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> ...
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
> section_deactivate+0x220/0x240
It would be great to match this to the specific source code.
> __remove_pages+0x118/0x170
> arch_remove_memory+0x3c/0x150
> memunmap_pages+0x1cc/0x2f0
> devm_action_release+0x30/0x50
> release_nodes+0x2f8/0x3e0
> device_release_driver_internal+0x168/0x270
> unbind_store+0x130/0x170
> drv_attr_store+0x44/0x60
> sysfs_kf_write+0x68/0x80
> kernfs_fop_write+0x100/0x290
> __vfs_write+0x3c/0x70
> vfs_write+0xcc/0x240
> ksys_write+0x7c/0x140
> system_call+0x5c/0x68
>
> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> section_mem_map is set to NULL after depopulate_section_mem(). This
> was done so that pfn_page() can work correctly with kernel config that disables
> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
>
> __section_mem_map_addr(__sec) + __pfn;
> where
>
> static inline struct page *__section_mem_map_addr(struct mem_section *section)
> {
> unsigned long map = section->section_mem_map;
> map &= SECTION_MAP_MASK;
> return (struct page *)map;
> }
>
> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
> check the pfn validity (pfn_valid()). Since section_deactivate release
> mem_section->usage if a section is fully deactivated, pfn_valid() check after
> a subsection_deactivate cause a kernel crash.
>
> static inline int pfn_valid(unsigned long pfn)
> {
> ...
> return early_section(ms) || pfn_section_valid(ms, pfn);
> }
>
> where
>
> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> {
> int idx = subsection_map_index(pfn);
>
> return test_bit(idx, ms->usage->subsection_map);
> }
>
> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
I am sorry, I haven't noticed that during the review of the commit
mentioned above. This is all subtle as hell, I have to say.
Why do we have to free usage before deactivaing section memmap? Now that
we have a late section_mem_map reset shouldn't we tear down the usage in
the same branch?
> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> Cc: Baoquan He <bhe@redhat.com>
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> mm/sparse.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->usage = NULL;
> }
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> + /* Mark the section invalid */
> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
Btw. this comment is not really helping at all.
/*
* section->usage is gone and VMEMMAP's pfn_valid depens
* on it (see pfn_section_valid)
*/
> }
>
> if (section_is_early && memmap)
> --
> 2.25.1
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-26 9:40 ` Michal Hocko
@ 2020-03-26 9:56 ` Aneesh Kumar K.V
2020-03-26 10:16 ` Michal Hocko
0 siblings, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-26 9:56 UTC (permalink / raw)
To: Michal Hocko
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant
On 3/26/20 3:10 PM, Michal Hocko wrote:
> On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
>> Fixes the below crash
>>
>> BUG: Kernel NULL pointer dereference on read at 0x00000000
>> Faulting instruction address: 0xc000000000c3447c
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
>> ...
>> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
>> LR [c000000000088354] vmemmap_free+0x144/0x320
>> Call Trace:
>> section_deactivate+0x220/0x240
>
> It would be great to match this to the specific source code.
The crash is due to NULL dereference at
test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL;
that is explained in later part of the commit.
>
>> __remove_pages+0x118/0x170
>> arch_remove_memory+0x3c/0x150
>> memunmap_pages+0x1cc/0x2f0
>> devm_action_release+0x30/0x50
>> release_nodes+0x2f8/0x3e0
>> device_release_driver_internal+0x168/0x270
>> unbind_store+0x130/0x170
>> drv_attr_store+0x44/0x60
>> sysfs_kf_write+0x68/0x80
>> kernfs_fop_write+0x100/0x290
>> __vfs_write+0x3c/0x70
>> vfs_write+0xcc/0x240
>> ksys_write+0x7c/0x140
>> system_call+0x5c/0x68
>>
>> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
>> section_mem_map is set to NULL after depopulate_section_mem(). This
>> was done so that pfn_page() can work correctly with kernel config that disables
>> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
>>
>> __section_mem_map_addr(__sec) + __pfn;
>> where
>>
>> static inline struct page *__section_mem_map_addr(struct mem_section *section)
>> {
>> unsigned long map = section->section_mem_map;
>> map &= SECTION_MAP_MASK;
>> return (struct page *)map;
>> }
>>
>> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
>> check the pfn validity (pfn_valid()). Since section_deactivate release
>> mem_section->usage if a section is fully deactivated, pfn_valid() check after
>> a subsection_deactivate cause a kernel crash.
>>
>> static inline int pfn_valid(unsigned long pfn)
>> {
>> ...
>> return early_section(ms) || pfn_section_valid(ms, pfn);
>> }
>>
>> where
>>
>> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>> {
>
>> int idx = subsection_map_index(pfn);
>>
>> return test_bit(idx, ms->usage->subsection_map);
>> }
>>
>> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
>
> I am sorry, I haven't noticed that during the review of the commit
> mentioned above. This is all subtle as hell, I have to say.
>
> Why do we have to free usage before deactivaing section memmap? Now that
> we have a late section_mem_map reset shouldn't we tear down the usage in
> the same branch?
>
We still need to make the section invalid before we call into
depopulate_section_memmap(). Because architecture like powerpc can share
vmemmap area across sections (16MB mapping of vmemmap area) and we use
vmemmap_popluated() to make that decision.
>> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
>> Cc: Baoquan He <bhe@redhat.com>
>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>> mm/sparse.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index aadb7298dcef..3012d1f3771a 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>> ms->usage = NULL;
>> }
>> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>> + /* Mark the section invalid */
>> + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>
> Btw. this comment is not really helping at all.
That is marking the section invalid so that
static inline int valid_section(struct mem_section *section)
{
return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
}
returns false.
> /*
> * section->usage is gone and VMEMMAP's pfn_valid depens
> * on it (see pfn_section_valid)
> */
>> }
>>
>> if (section_is_early && memmap)
>> --
>> 2.25.1
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-26 9:56 ` Aneesh Kumar K.V
@ 2020-03-26 10:16 ` Michal Hocko
2020-03-26 10:50 ` Michal Hocko
0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-03-26 10:16 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant
On Thu 26-03-20 15:26:22, Aneesh Kumar K.V wrote:
> On 3/26/20 3:10 PM, Michal Hocko wrote:
> > On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
> > > Fixes the below crash
> > >
> > > BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > Faulting instruction address: 0xc000000000c3447c
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> > > CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> > > ...
> > > NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> > > LR [c000000000088354] vmemmap_free+0x144/0x320
> > > Call Trace:
> > > section_deactivate+0x220/0x240
> >
> > It would be great to match this to the specific source code.
>
> The crash is due to NULL dereference at
>
> test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL;
It would be nice to call that out here as well
[...]
> > Why do we have to free usage before deactivaing section memmap? Now that
> > we have a late section_mem_map reset shouldn't we tear down the usage in
> > the same branch?
> >
>
> We still need to make the section invalid before we call into
> depopulate_section_memmap(). Because architecture like powerpc can share
> vmemmap area across sections (16MB mapping of vmemmap area) and we use
> vmemmap_popluated() to make that decision.
This should be noted in a comment as well.
> > > Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> > > Cc: Baoquan He <bhe@redhat.com>
> > > Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > ---
> > > mm/sparse.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index aadb7298dcef..3012d1f3771a 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > > ms->usage = NULL;
> > > }
> > > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > > + /* Mark the section invalid */
> > > + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> >
> > Btw. this comment is not really helping at all.
>
> That is marking the section invalid so that
>
> static inline int valid_section(struct mem_section *section)
> {
> return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
> }
>
>
> returns false.
Yes that is obvious once you are clear where to look. I was really
hoping for a comment that would simply point you to the right
direcection without chasing SECTION_HAS_MEM_MAP usage. This code is
subtle and useful comments, even when they state something that is
obvious to you _right_now_, can be really helpful.
Thanks!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
2020-03-26 10:16 ` Michal Hocko
@ 2020-03-26 10:50 ` Michal Hocko
0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2020-03-26 10:50 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant
On Thu 26-03-20 11:16:33, Michal Hocko wrote:
> On Thu 26-03-20 15:26:22, Aneesh Kumar K.V wrote:
> > On 3/26/20 3:10 PM, Michal Hocko wrote:
> > > On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
> > > > Fixes the below crash
> > > >
> > > > BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > > Faulting instruction address: 0xc000000000c3447c
> > > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> > > > CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> > > > ...
> > > > NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> > > > LR [c000000000088354] vmemmap_free+0x144/0x320
> > > > Call Trace:
> > > > section_deactivate+0x220/0x240
> > >
> > > It would be great to match this to the specific source code.
> >
> > The crash is due to NULL dereference at
> >
> > test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL;
>
> It would be nice to call that out here as well
>
> [...]
> > > Why do we have to free usage before deactivaing section memmap? Now that
> > > we have a late section_mem_map reset shouldn't we tear down the usage in
> > > the same branch?
> > >
> >
> > We still need to make the section invalid before we call into
> > depopulate_section_memmap(). Because architecture like powerpc can share
> > vmemmap area across sections (16MB mapping of vmemmap area) and we use
> > vmemmap_popluated() to make that decision.
>
> This should be noted in a comment as well.
>
> > > > Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> > > > Cc: Baoquan He <bhe@redhat.com>
> > > > Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > > ---
> > > > mm/sparse.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index aadb7298dcef..3012d1f3771a 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > > > ms->usage = NULL;
> > > > }
> > > > memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > > > + /* Mark the section invalid */
> > > > + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> > >
> > > Btw. this comment is not really helping at all.
> >
> > That is marking the section invalid so that
> >
> > static inline int valid_section(struct mem_section *section)
> > {
> > return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
> > }
> >
> >
> > returns false.
>
> Yes that is obvious once you are clear where to look. I was really
> hoping for a comment that would simply point you to the right
> direcection without chasing SECTION_HAS_MEM_MAP usage. This code is
> subtle and useful comments, even when they state something that is
> obvious to you _right_now_, can be really helpful.
Btw. forgot to add. With the improved comment feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-03-26 10:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
2020-03-25 7:06 ` Baoquan He
2020-03-25 7:37 ` Baoquan He
2020-03-25 8:12 ` Aneesh Kumar K.V
2020-03-25 8:36 ` Baoquan He
2020-03-26 0:38 ` Andrew Morton
2020-03-26 9:40 ` Michal Hocko
2020-03-26 9:56 ` Aneesh Kumar K.V
2020-03-26 10:16 ` Michal Hocko
2020-03-26 10:50 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).