linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
@ 2020-03-25  3:19 Aneesh Kumar K.V
  2020-03-25  7:06 ` Baoquan He
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-25  3:19 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: linux-kernel, mpe, linuxppc-dev, Aneesh Kumar K.V, Baoquan He,
	Sachin Sant

Fixes the below crash

BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
 section_deactivate+0x220/0x240
 __remove_pages+0x118/0x170
 arch_remove_memory+0x3c/0x150
 memunmap_pages+0x1cc/0x2f0
 devm_action_release+0x30/0x50
 release_nodes+0x2f8/0x3e0
 device_release_driver_internal+0x168/0x270
 unbind_store+0x130/0x170
 drv_attr_store+0x44/0x60
 sysfs_kf_write+0x68/0x80
 kernfs_fop_write+0x100/0x290
 __vfs_write+0x3c/0x70
 vfs_write+0xcc/0x240
 ksys_write+0x7c/0x140
 system_call+0x5c/0x68

With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
section_mem_map is set to NULL after depopulate_section_mem(). This
was done so that pfn_page() can work correctly with kernel config that disables
SPARSEMEM_VMEMMAP. With that config pfn_to_page does

	__section_mem_map_addr(__sec) + __pfn;
where

static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
	unsigned long map = section->section_mem_map;
	map &= SECTION_MAP_MASK;
	return (struct page *)map;
}

Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
check the pfn validity (pfn_valid()). Since section_deactivate release
mem_section->usage if a section is fully deactivated, pfn_valid() check after
a subsection_deactivate cause a kernel crash.

static inline int pfn_valid(unsigned long pfn)
{
...
	return early_section(ms) || pfn_section_valid(ms, pfn);
}

where

static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
	int idx = subsection_map_index(pfn);

	return test_bit(idx, ms->usage->subsection_map);
}

Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.

Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Cc: Baoquan He <bhe@redhat.com>
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/sparse.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index aadb7298dcef..3012d1f3771a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 			ms->usage = NULL;
 		}
 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+		/* Mark the section invalid */
+		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
 	}
 
 	if (section_is_early && memmap)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-25  3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
@ 2020-03-25  7:06 ` Baoquan He
  2020-03-25  7:37   ` Baoquan He
  2020-03-26  0:38 ` Andrew Morton
  2020-03-26  9:40 ` Michal Hocko
  2 siblings, 1 reply; 10+ messages in thread
From: Baoquan He @ 2020-03-25  7:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
	dan.j.williams

On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:
> Fixes the below crash
> 
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> ...
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
>  section_deactivate+0x220/0x240
>  __remove_pages+0x118/0x170
>  arch_remove_memory+0x3c/0x150
>  memunmap_pages+0x1cc/0x2f0
>  devm_action_release+0x30/0x50
>  release_nodes+0x2f8/0x3e0
>  device_release_driver_internal+0x168/0x270
>  unbind_store+0x130/0x170
>  drv_attr_store+0x44/0x60
>  sysfs_kf_write+0x68/0x80
>  kernfs_fop_write+0x100/0x290
>  __vfs_write+0x3c/0x70
>  vfs_write+0xcc/0x240
>  ksys_write+0x7c/0x140
>  system_call+0x5c/0x68
> 
> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> section_mem_map is set to NULL after depopulate_section_mem(). This
> was done so that pfn_page() can work correctly with kernel config that disables
> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
> 
> 	__section_mem_map_addr(__sec) + __pfn;
> where
> 
> static inline struct page *__section_mem_map_addr(struct mem_section *section)
> {
> 	unsigned long map = section->section_mem_map;
> 	map &= SECTION_MAP_MASK;
> 	return (struct page *)map;
> }
> 
> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
> check the pfn validity (pfn_valid()). Since section_deactivate release
> mem_section->usage if a section is fully deactivated, pfn_valid() check after
> a subsection_deactivate cause a kernel crash.
> 
> static inline int pfn_valid(unsigned long pfn)
> {
> ...
> 	return early_section(ms) || pfn_section_valid(ms, pfn);
> }
> 
> where
> 
> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> {
> 	int idx = subsection_map_index(pfn);
> 
> 	return test_bit(idx, ms->usage->subsection_map);
> }
> 
> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
> 
> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> Cc: Baoquan He <bhe@redhat.com>
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Maybe add Sachin's Tested-by, Sachin has tested and confirmed this fix
works.

> ---
>  mm/sparse.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			ms->usage = NULL;
>  		}
>  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +		/* Mark the section invalid */
> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;

Not sure if we should add checking in valid_section() or pfn_valid(),
e.g check ms->usage validation too. Otherwise, this fix looks good to
me.

Reviewed-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-25  7:06 ` Baoquan He
@ 2020-03-25  7:37   ` Baoquan He
  2020-03-25  8:12     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 10+ messages in thread
From: Baoquan He @ 2020-03-25  7:37 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
	dan.j.williams

On 03/25/20 at 03:06pm, Baoquan He wrote:
> On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:

> >  mm/sparse.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index aadb7298dcef..3012d1f3771a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >  			ms->usage = NULL;
> >  		}
> >  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > +		/* Mark the section invalid */
> > +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> 
> Not sure if we should add checking in valid_section() or pfn_valid(),
> e.g check ms->usage validation too. Otherwise, this fix looks good to
> me.

With SPASEMEM_VMEMAP enabled, we should do validation check on ms->usage
before checking any subsection is valid. Since now we do have case
in which ms->usage is released, people still try to check it.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f0a2c184eb9a..d79bd938852e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1306,6 +1306,8 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
 {
 	int idx = subsection_map_index(pfn);
 
+	if (!ms->usage)
+		return 0;
 	return test_bit(idx, ms->usage->subsection_map);
 }
 #else


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-25  7:37   ` Baoquan He
@ 2020-03-25  8:12     ` Aneesh Kumar K.V
  2020-03-25  8:36       ` Baoquan He
  0 siblings, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-25  8:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
	dan.j.williams

On 3/25/20 1:07 PM, Baoquan He wrote:
> On 03/25/20 at 03:06pm, Baoquan He wrote:
>> On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:
> 
>>>   mm/sparse.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index aadb7298dcef..3012d1f3771a 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>>   			ms->usage = NULL;
>>>   		}
>>>   		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>>> +		/* Mark the section invalid */
>>> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>>
>> Not sure if we should add checking in valid_section() or pfn_valid(),
>> e.g check ms->usage validation too. Otherwise, this fix looks good to
>> me.
> 
> With SPASEMEM_VMEMAP enabled, we should do validation check on ms->usage
> before checking any subsection is valid. Since now we do have case
> in which ms->usage is released, people still try to check it.
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f0a2c184eb9a..d79bd938852e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1306,6 +1306,8 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>   {
>   	int idx = subsection_map_index(pfn);
>   
> +	if (!ms->usage)
> +		return 0;
>   	return test_bit(idx, ms->usage->subsection_map);
>   }
>   #else
> 

We always check for section valid, before we check if pfn_section_valid().

static inline int pfn_valid(unsigned long pfn)

	struct mem_section *ms;

	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
		return 0;
	ms = __nr_to_section(pfn_to_section_nr(pfn));
	if (!valid_section(ms))
		return 0;
	/*
	 * Traditionally early sections always returned pfn_valid() for
	 * the entire section-sized span.
	 */
	return early_section(ms) || pfn_section_valid(ms, pfn);
}


IMHO adding that if (!ms->usage) is redundant.

-aneesh



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-25  8:12     ` Aneesh Kumar K.V
@ 2020-03-25  8:36       ` Baoquan He
  0 siblings, 0 replies; 10+ messages in thread
From: Baoquan He @ 2020-03-25  8:36 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Sachin Sant,
	dan.j.williams, mhocko, david

On 03/25/20 at 01:42pm, Aneesh Kumar K.V wrote:
> On 3/25/20 1:07 PM, Baoquan He wrote:
> > On 03/25/20 at 03:06pm, Baoquan He wrote:
> > > On 03/25/20 at 08:49am, Aneesh Kumar K.V wrote:
> > 
> > > >   mm/sparse.c | 2 ++
> > > >   1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index aadb7298dcef..3012d1f3771a 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > > >   			ms->usage = NULL;
> > > >   		}
> > > >   		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > > > +		/* Mark the section invalid */
> > > > +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> > > 
> > > Not sure if we should add checking in valid_section() or pfn_valid(),
> > > e.g check ms->usage validation too. Otherwise, this fix looks good to
> > > me.
> > 
> > With SPASEMEM_VMEMAP enabled, we should do validation check on ms->usage
> > before checking any subsection is valid. Since now we do have case
> > in which ms->usage is released, people still try to check it.
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index f0a2c184eb9a..d79bd938852e 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -1306,6 +1306,8 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> >   {
> >   	int idx = subsection_map_index(pfn);
> > +	if (!ms->usage)
> > +		return 0;
> >   	return test_bit(idx, ms->usage->subsection_map);
> >   }
> >   #else
> > 
> 
> We always check for section valid, before we check if pfn_section_valid().
> 
> static inline int pfn_valid(unsigned long pfn)
> 
> 	struct mem_section *ms;
> 
> 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> 		return 0;
> 	ms = __nr_to_section(pfn_to_section_nr(pfn));
> 	if (!valid_section(ms))
> 		return 0;
> 	/*
> 	 * Traditionally early sections always returned pfn_valid() for
> 	 * the entire section-sized span.
> 	 */
> 	return early_section(ms) || pfn_section_valid(ms, pfn);
> }
> 
> 
> IMHO adding that if (!ms->usage) is redundant.

Yeah, I tend to agree. Consider this happens in the only small window
between ms->usage releasing and ms->section_mem_map releasing when
removing a section. Just thought adding this check to enhance it even
though we have had your fix, because we only check ms->section_mem_map
in valid_section(). Anyway, your fix looks good to me, see if other
people have any comment.

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-25  3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
  2020-03-25  7:06 ` Baoquan He
@ 2020-03-26  0:38 ` Andrew Morton
  2020-03-26  9:40 ` Michal Hocko
  2 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2020-03-26  0:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, linux-kernel, mpe, linuxppc-dev, Baoquan He,
	Sachin Sant, Pankaj Gupta, David Hildenbrand, Michal Hocko,
	Wei Yang, Oscar Salvador, Mike Rapoport

On Wed, 25 Mar 2020 08:49:14 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:

> Fixes the below crash

(cc's added)

> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> ...
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
>  section_deactivate+0x220/0x240
>  __remove_pages+0x118/0x170
>  arch_remove_memory+0x3c/0x150
>  memunmap_pages+0x1cc/0x2f0
>  devm_action_release+0x30/0x50
>  release_nodes+0x2f8/0x3e0
>  device_release_driver_internal+0x168/0x270
>  unbind_store+0x130/0x170
>  drv_attr_store+0x44/0x60
>  sysfs_kf_write+0x68/0x80
>  kernfs_fop_write+0x100/0x290
>  __vfs_write+0x3c/0x70
>  vfs_write+0xcc/0x240
>  ksys_write+0x7c/0x140
>  system_call+0x5c/0x68
> 
> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> section_mem_map is set to NULL after depopulate_section_mem(). This
> was done so that pfn_page() can work correctly with kernel config that disables
> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
> 
> 	__section_mem_map_addr(__sec) + __pfn;
> where
> 
> static inline struct page *__section_mem_map_addr(struct mem_section *section)
> {
> 	unsigned long map = section->section_mem_map;
> 	map &= SECTION_MAP_MASK;
> 	return (struct page *)map;
> }
> 
> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
> check the pfn validity (pfn_valid()). Since section_deactivate release
> mem_section->usage if a section is fully deactivated, pfn_valid() check after
> a subsection_deactivate cause a kernel crash.
> 
> static inline int pfn_valid(unsigned long pfn)
> {
> ...
> 	return early_section(ms) || pfn_section_valid(ms, pfn);
> }
> 
> where
> 
> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> {
> 	int idx = subsection_map_index(pfn);
> 
> 	return test_bit(idx, ms->usage->subsection_map);
> }
> 
> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
> 
> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")

d41e2f3bd546 had cc:stable, so I shall add cc:stable to this one as well.

> Cc: Baoquan He <bhe@redhat.com>
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/sparse.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			ms->usage = NULL;
>  		}
>  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +		/* Mark the section invalid */
> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>  	}
>  
>  	if (section_is_early && memmap)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-25  3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
  2020-03-25  7:06 ` Baoquan He
  2020-03-26  0:38 ` Andrew Morton
@ 2020-03-26  9:40 ` Michal Hocko
  2020-03-26  9:56   ` Aneesh Kumar K.V
  2 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-03-26  9:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant

On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
> Fixes the below crash
> 
> BUG: Kernel NULL pointer dereference on read at 0x00000000
> Faulting instruction address: 0xc000000000c3447c
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> ...
> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> LR [c000000000088354] vmemmap_free+0x144/0x320
> Call Trace:
>  section_deactivate+0x220/0x240

It would be great to match this to the specific source code.

>  __remove_pages+0x118/0x170
>  arch_remove_memory+0x3c/0x150
>  memunmap_pages+0x1cc/0x2f0
>  devm_action_release+0x30/0x50
>  release_nodes+0x2f8/0x3e0
>  device_release_driver_internal+0x168/0x270
>  unbind_store+0x130/0x170
>  drv_attr_store+0x44/0x60
>  sysfs_kf_write+0x68/0x80
>  kernfs_fop_write+0x100/0x290
>  __vfs_write+0x3c/0x70
>  vfs_write+0xcc/0x240
>  ksys_write+0x7c/0x140
>  system_call+0x5c/0x68
> 
> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> section_mem_map is set to NULL after depopulate_section_mem(). This
> was done so that pfn_page() can work correctly with kernel config that disables
> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
> 
> 	__section_mem_map_addr(__sec) + __pfn;
> where
> 
> static inline struct page *__section_mem_map_addr(struct mem_section *section)
> {
> 	unsigned long map = section->section_mem_map;
> 	map &= SECTION_MAP_MASK;
> 	return (struct page *)map;
> }
> 
> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
> check the pfn validity (pfn_valid()). Since section_deactivate release
> mem_section->usage if a section is fully deactivated, pfn_valid() check after
> a subsection_deactivate cause a kernel crash.
> 
> static inline int pfn_valid(unsigned long pfn)
> {
> ...
> 	return early_section(ms) || pfn_section_valid(ms, pfn);
> }
> 
> where
> 
> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
> {

> 	int idx = subsection_map_index(pfn);
> 
> 	return test_bit(idx, ms->usage->subsection_map);
> }
> 
> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.

I am sorry, I haven't noticed that during the review of the commit
mentioned above. This is all subtle as hell, I have to say. 

Why do we have to free usage before deactivaing section memmap? Now that
we have a late section_mem_map reset shouldn't we tear down the usage in
the same branch?

> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> Cc: Baoquan He <bhe@redhat.com>
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/sparse.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index aadb7298dcef..3012d1f3771a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			ms->usage = NULL;
>  		}
>  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> +		/* Mark the section invalid */
> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;

Btw. this comment is not really helping at all.
		/*
		 * section->usage is gone and VMEMMAP's pfn_valid depens
		 * on it (see pfn_section_valid)
		 */
>  	}
>  
>  	if (section_is_early && memmap)
> -- 
> 2.25.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-26  9:40 ` Michal Hocko
@ 2020-03-26  9:56   ` Aneesh Kumar K.V
  2020-03-26 10:16     ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2020-03-26  9:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant

On 3/26/20 3:10 PM, Michal Hocko wrote:
> On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
>> Fixes the below crash
>>
>> BUG: Kernel NULL pointer dereference on read at 0x00000000
>> Faulting instruction address: 0xc000000000c3447c
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
>> ...
>> NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
>> LR [c000000000088354] vmemmap_free+0x144/0x320
>> Call Trace:
>>   section_deactivate+0x220/0x240
> 
> It would be great to match this to the specific source code.

The crash is due to NULL dereference at

test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL;

that is explained in later part of the commit.
> 
>>   __remove_pages+0x118/0x170
>>   arch_remove_memory+0x3c/0x150
>>   memunmap_pages+0x1cc/0x2f0
>>   devm_action_release+0x30/0x50
>>   release_nodes+0x2f8/0x3e0
>>   device_release_driver_internal+0x168/0x270
>>   unbind_store+0x130/0x170
>>   drv_attr_store+0x44/0x60
>>   sysfs_kf_write+0x68/0x80
>>   kernfs_fop_write+0x100/0x290
>>   __vfs_write+0x3c/0x70
>>   vfs_write+0xcc/0x240
>>   ksys_write+0x7c/0x140
>>   system_call+0x5c/0x68
>>
>> With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
>> section_mem_map is set to NULL after depopulate_section_mem(). This
>> was done so that pfn_page() can work correctly with kernel config that disables
>> SPARSEMEM_VMEMMAP. With that config pfn_to_page does
>>
>> 	__section_mem_map_addr(__sec) + __pfn;
>> where
>>
>> static inline struct page *__section_mem_map_addr(struct mem_section *section)
>> {
>> 	unsigned long map = section->section_mem_map;
>> 	map &= SECTION_MAP_MASK;
>> 	return (struct page *)map;
>> }
>>
>> Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
>> check the pfn validity (pfn_valid()). Since section_deactivate release
>> mem_section->usage if a section is fully deactivated, pfn_valid() check after
>> a subsection_deactivate cause a kernel crash.
>>
>> static inline int pfn_valid(unsigned long pfn)
>> {
>> ...
>> 	return early_section(ms) || pfn_section_valid(ms, pfn);
>> }
>>
>> where
>>
>> static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>> {
> 
>> 	int idx = subsection_map_index(pfn);
>>
>> 	return test_bit(idx, ms->usage->subsection_map);
>> }
>>
>> Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
> 
> I am sorry, I haven't noticed that during the review of the commit
> mentioned above. This is all subtle as hell, I have to say.
> 
> Why do we have to free usage before deactivaing section memmap? Now that
> we have a late section_mem_map reset shouldn't we tear down the usage in
> the same branch?
> 

We still need to make the section invalid before we call into 
depopulate_section_memmap(). Because architecture like powerpc can share 
vmemmap area across sections (16MB mapping of vmemmap area) and we use 
vmemmap_popluated() to make that decision.



>> Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
>> Cc: Baoquan He <bhe@redhat.com>
>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   mm/sparse.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index aadb7298dcef..3012d1f3771a 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>   			ms->usage = NULL;
>>   		}
>>   		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>> +		/* Mark the section invalid */
>> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> 
> Btw. this comment is not really helping at all.

That is marking the section invalid so that

static inline int valid_section(struct mem_section *section)
{
	return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
}


returns false.

> 		/*
> 		 * section->usage is gone and VMEMMAP's pfn_valid depens
> 		 * on it (see pfn_section_valid)
> 		 */
>>   	}
>>   
>>   	if (section_is_early && memmap)
>> -- 
>> 2.25.1
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-26  9:56   ` Aneesh Kumar K.V
@ 2020-03-26 10:16     ` Michal Hocko
  2020-03-26 10:50       ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2020-03-26 10:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant

On Thu 26-03-20 15:26:22, Aneesh Kumar K.V wrote:
> On 3/26/20 3:10 PM, Michal Hocko wrote:
> > On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
> > > Fixes the below crash
> > > 
> > > BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > Faulting instruction address: 0xc000000000c3447c
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> > > CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> > > ...
> > > NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> > > LR [c000000000088354] vmemmap_free+0x144/0x320
> > > Call Trace:
> > >   section_deactivate+0x220/0x240
> > 
> > It would be great to match this to the specific source code.
> 
> The crash is due to NULL dereference at
> 
> test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL;

It would be nice to call that out here as well

[...]
> > Why do we have to free usage before deactivaing section memmap? Now that
> > we have a late section_mem_map reset shouldn't we tear down the usage in
> > the same branch?
> > 
> 
> We still need to make the section invalid before we call into
> depopulate_section_memmap(). Because architecture like powerpc can share
> vmemmap area across sections (16MB mapping of vmemmap area) and we use
> vmemmap_popluated() to make that decision.

This should be noted in a comment as well.

> > > Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> > > Cc: Baoquan He <bhe@redhat.com>
> > > Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > ---
> > >   mm/sparse.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index aadb7298dcef..3012d1f3771a 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > >   			ms->usage = NULL;
> > >   		}
> > >   		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > > +		/* Mark the section invalid */
> > > +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> > 
> > Btw. this comment is not really helping at all.
> 
> That is marking the section invalid so that
> 
> static inline int valid_section(struct mem_section *section)
> {
> 	return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
> }
> 
> 
> returns false.

Yes that is obvious once you are clear where to look. I was really
hoping for a comment that would simply point you to the right
direcection without chasing SECTION_HAS_MEM_MAP usage. This code is
subtle and useful comments, even when they state something that is
obvious to you _right_now_, can be really helpful.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check
  2020-03-26 10:16     ` Michal Hocko
@ 2020-03-26 10:50       ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2020-03-26 10:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, akpm, linux-kernel, mpe, linuxppc-dev, Baoquan He, Sachin Sant

On Thu 26-03-20 11:16:33, Michal Hocko wrote:
> On Thu 26-03-20 15:26:22, Aneesh Kumar K.V wrote:
> > On 3/26/20 3:10 PM, Michal Hocko wrote:
> > > On Wed 25-03-20 08:49:14, Aneesh Kumar K.V wrote:
> > > > Fixes the below crash
> > > > 
> > > > BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > > Faulting instruction address: 0xc000000000c3447c
> > > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> > > > CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
> > > > ...
> > > > NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
> > > > LR [c000000000088354] vmemmap_free+0x144/0x320
> > > > Call Trace:
> > > >   section_deactivate+0x220/0x240
> > > 
> > > It would be great to match this to the specific source code.
> > 
> > The crash is due to NULL dereference at
> > 
> > test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL;
> 
> It would be nice to call that out here as well
> 
> [...]
> > > Why do we have to free usage before deactivaing section memmap? Now that
> > > we have a late section_mem_map reset shouldn't we tear down the usage in
> > > the same branch?
> > > 
> > 
> > We still need to make the section invalid before we call into
> > depopulate_section_memmap(). Because architecture like powerpc can share
> > vmemmap area across sections (16MB mapping of vmemmap area) and we use
> > vmemmap_popluated() to make that decision.
> 
> This should be noted in a comment as well.
> 
> > > > Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
> > > > Cc: Baoquan He <bhe@redhat.com>
> > > > Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > > ---
> > > >   mm/sparse.c | 2 ++
> > > >   1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index aadb7298dcef..3012d1f3771a 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -781,6 +781,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > > >   			ms->usage = NULL;
> > > >   		}
> > > >   		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> > > > +		/* Mark the section invalid */
> > > > +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
> > > 
> > > Btw. this comment is not really helping at all.
> > 
> > That is marking the section invalid so that
> > 
> > static inline int valid_section(struct mem_section *section)
> > {
> > 	return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
> > }
> > 
> > 
> > returns false.
> 
> Yes that is obvious once you are clear where to look. I was really
> hoping for a comment that would simply point you to the right
> direcection without chasing SECTION_HAS_MEM_MAP usage. This code is
> subtle and useful comments, even when they state something that is
> obvious to you _right_now_, can be really helpful.

Btw. forgot to add. With the improved comment feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-26 10:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25  3:19 [PATCH] mm/sparse: Fix kernel crash with pfn_section_valid check Aneesh Kumar K.V
2020-03-25  7:06 ` Baoquan He
2020-03-25  7:37   ` Baoquan He
2020-03-25  8:12     ` Aneesh Kumar K.V
2020-03-25  8:36       ` Baoquan He
2020-03-26  0:38 ` Andrew Morton
2020-03-26  9:40 ` Michal Hocko
2020-03-26  9:56   ` Aneesh Kumar K.V
2020-03-26 10:16     ` Michal Hocko
2020-03-26 10:50       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).