linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fixes "mm/sparsemem: support sub-section hotplug"
@ 2020-02-06 23:16 Wei Yang
  2020-02-06 23:16 ` [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP Wei Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-06 23:16 UTC (permalink / raw)
  To: akpm, osalvador, dan.j.williams
  Cc: linux-mm, linux-kernel, bhe, david, Wei Yang

During reviewing David's code, I found current sub-section hotplug is
broken in several places.

The following link fix one potential issue, while David and I still spot
some suspicious points.

https://lkml.org/lkml/2020/2/6/334

The problem here is mainly about the memmap manipulation:

  * memmap would be overwrite if not use SPARSEMEM_VMEMMAP
  * only need to adjust memmap for SPARSEMEM_VMEMMAP

Wei Yang (3):
  mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP
  mm/sparsemem: get physical address to page struct instead of virtual
    address to pfn
  mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP

 mm/sparse.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP
  2020-02-06 23:16 [PATCH 0/3] Fixes "mm/sparsemem: support sub-section hotplug" Wei Yang
@ 2020-02-06 23:16 ` Wei Yang
  2020-02-07  2:00   ` Dan Williams
  2020-02-06 23:16 ` [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn Wei Yang
  2020-02-06 23:16 ` [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP Wei Yang
  2 siblings, 1 reply; 24+ messages in thread
From: Wei Yang @ 2020-02-06 23:16 UTC (permalink / raw)
  To: akpm, osalvador, dan.j.williams
  Cc: linux-mm, linux-kernel, bhe, david, Wei Yang

Only when SPARSEMEM_VMEMMAP is set, memmap returned from
section_activate() points to sub-section page struct. Otherwise, memmap
already points to the whole section page struct.

This means only for SPARSEMEM_VMEMMAP, we need to adjust memmap for
sub-section case.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
---
 mm/sparse.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 586d85662978..b5da121bdd6e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -886,7 +886,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	section_mark_present(ms);
 
 	/* Align memmap to section boundary in the subsection case */
-	if (section_nr_to_pfn(section_nr) != start_pfn)
+	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
+		section_nr_to_pfn(section_nr) != start_pfn)
 		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
 	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-06 23:16 [PATCH 0/3] Fixes "mm/sparsemem: support sub-section hotplug" Wei Yang
  2020-02-06 23:16 ` [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP Wei Yang
@ 2020-02-06 23:16 ` Wei Yang
  2020-02-07  2:19   ` Dan Williams
  2020-02-07  4:11   ` Baoquan He
  2020-02-06 23:16 ` [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP Wei Yang
  2 siblings, 2 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-06 23:16 UTC (permalink / raw)
  To: akpm, osalvador, dan.j.williams
  Cc: linux-mm, linux-kernel, bhe, david, Wei Yang

memmap should be the physical address to page struct instead of virtual
address to pfn.

Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
this point.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
---
 mm/sparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index b5da121bdd6e..56816f653588 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 	/* Align memmap to section boundary in the subsection case */
 	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
 		section_nr_to_pfn(section_nr) != start_pfn)
-		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
+		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
 	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
 
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP
  2020-02-06 23:16 [PATCH 0/3] Fixes "mm/sparsemem: support sub-section hotplug" Wei Yang
  2020-02-06 23:16 ` [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP Wei Yang
  2020-02-06 23:16 ` [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn Wei Yang
@ 2020-02-06 23:16 ` Wei Yang
  2020-02-07  2:06   ` Dan Williams
  2 siblings, 1 reply; 24+ messages in thread
From: Wei Yang @ 2020-02-06 23:16 UTC (permalink / raw)
  To: akpm, osalvador, dan.j.williams
  Cc: linux-mm, linux-kernel, bhe, david, Wei Yang

In case of SPARSEMEM, populate_section_memmap() would allocate memmap
for the whole section, even we just want a sub-section. This would lead
to memmap overwrite if we a sub-section to an already populated section.

Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
---
 mm/sparse.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 56816f653588..c75ca40db513 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
 	if (nr_pages < PAGES_PER_SECTION && early_section(ms))
 		return pfn_to_page(pfn);
 
+	/*
+	 * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
+	 * whole section, even for a sub-section.
+	 *
+	 * Return its memmap if already populated to avoid memmap overwrite.
+	 */
+	if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
+		valid_section(ms))
+		return __section_mem_map_addr(ms);
+
 	memmap = populate_section_memmap(pfn, nr_pages, nid, altmap);
 	if (!memmap) {
 		section_deactivate(pfn, nr_pages, altmap);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP
  2020-02-06 23:16 ` [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP Wei Yang
@ 2020-02-07  2:00   ` Dan Williams
  2020-02-07 11:06     ` Wei Yang
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2020-02-07  2:00 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, Baoquan He, David Hildenbrand

On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>
> Only when SPARSEMEM_VMEMMAP is set, memmap returned from
> section_activate() points to sub-section page struct. Otherwise, memmap
> already points to the whole section page struct.
>
> This means only for SPARSEMEM_VMEMMAP, we need to adjust memmap for
> sub-section case.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/sparse.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 586d85662978..b5da121bdd6e 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -886,7 +886,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>         section_mark_present(ms);
>
>         /* Align memmap to section boundary in the subsection case */
> -       if (section_nr_to_pfn(section_nr) != start_pfn)
> +       if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> +               section_nr_to_pfn(section_nr) != start_pfn)

Aren't we assured that start_pfn is always section aligned in the
SPARSEMEM case? That's the role of check_pfn_span(). Does the change
have a runtime impact or is this just theoretical?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP
  2020-02-06 23:16 ` [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP Wei Yang
@ 2020-02-07  2:06   ` Dan Williams
  2020-02-07  3:50     ` Baoquan He
  2020-02-07 11:02     ` Wei Yang
  0 siblings, 2 replies; 24+ messages in thread
From: Dan Williams @ 2020-02-07  2:06 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, Baoquan He, David Hildenbrand

On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>
> In case of SPARSEMEM, populate_section_memmap() would allocate memmap
> for the whole section, even we just want a sub-section. This would lead
> to memmap overwrite if we a sub-section to an already populated section.
>
> Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/sparse.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 56816f653588..c75ca40db513 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>         if (nr_pages < PAGES_PER_SECTION && early_section(ms))
>                 return pfn_to_page(pfn);
>
> +       /*
> +        * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
> +        * whole section, even for a sub-section.
> +        *
> +        * Return its memmap if already populated to avoid memmap overwrite.
> +        */
> +       if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> +               valid_section(ms))
> +               return __section_mem_map_addr(ms);

Again, is check_pfn_span() failing to prevent this path?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-06 23:16 ` [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn Wei Yang
@ 2020-02-07  2:19   ` Dan Williams
  2020-02-07  3:10     ` Baoquan He
                       ` (2 more replies)
  2020-02-07  4:11   ` Baoquan He
  1 sibling, 3 replies; 24+ messages in thread
From: Dan Williams @ 2020-02-07  2:19 UTC (permalink / raw)
  To: Wei Yang
  Cc: Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, Baoquan He, David Hildenbrand

On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>
> memmap should be the physical address to page struct instead of virtual
> address to pfn.
>
> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> this point.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5da121bdd6e..56816f653588 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>         /* Align memmap to section boundary in the subsection case */
>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>                 section_nr_to_pfn(section_nr) != start_pfn)
> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));

Yes, this looks obviously correct. This might be tripping up
makedumpfile. Do you see any practical effects of this bug? The kernel
mostly avoids ->section_mem_map in the vmemmap case and in the
!vmemmap case section_nr_to_pfn(section_nr) should always equal
start_pfn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  2:19   ` Dan Williams
@ 2020-02-07  3:10     ` Baoquan He
  2020-02-07  3:21       ` Dan Williams
  2020-02-07 10:51     ` Wei Yang
  2020-02-07 11:26     ` Wei Yang
  2 siblings, 1 reply; 24+ messages in thread
From: Baoquan He @ 2020-02-07  3:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

Hi Dan,

On 02/06/20 at 06:19pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index b5da121bdd6e..56816f653588 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >         /* Align memmap to section boundary in the subsection case */
> >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >                 section_nr_to_pfn(section_nr) != start_pfn)
> > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> 
> Yes, this looks obviously correct. This might be tripping up
> makedumpfile. Do you see any practical effects of this bug? The kernel
> mostly avoids ->section_mem_map in the vmemmap case and in the
> !vmemmap case section_nr_to_pfn(section_nr) should always equal
> start_pfn.

The practical effects is that the memmap for the first unaligned section will be lost
when destroy namespace to hot remove it. Because we encode the ->section_mem_map
into mem_section, and get memmap from the related mem_section to free it in
section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
with memmap.

By the way, sub-section support is only valid in vmemmap case, right?
Seems yes from code, but I don't find any document to prove it.

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  3:10     ` Baoquan He
@ 2020-02-07  3:21       ` Dan Williams
  2020-02-07  3:36         ` Baoquan He
  2020-02-07 12:14         ` Wei Yang
  0 siblings, 2 replies; 24+ messages in thread
From: Dan Williams @ 2020-02-07  3:21 UTC (permalink / raw)
  To: Baoquan He
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
>
> Hi Dan,
>
> On 02/06/20 at 06:19pm, Dan Williams wrote:
> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index b5da121bdd6e..56816f653588 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > >         /* Align memmap to section boundary in the subsection case */
> > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > >                 section_nr_to_pfn(section_nr) != start_pfn)
> > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >
> > Yes, this looks obviously correct. This might be tripping up
> > makedumpfile. Do you see any practical effects of this bug? The kernel
> > mostly avoids ->section_mem_map in the vmemmap case and in the
> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > start_pfn.
>
> The practical effects is that the memmap for the first unaligned section will be lost
> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> into mem_section, and get memmap from the related mem_section to free it in
> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> with memmap.

Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?

> By the way, sub-section support is only valid in vmemmap case, right?

Yes.

> Seems yes from code, but I don't find any document to prove it.

check_pfn_span() enforces this requirement.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  3:21       ` Dan Williams
@ 2020-02-07  3:36         ` Baoquan He
  2020-02-07  4:05           ` Dan Williams
  2020-02-07 11:13           ` Wei Yang
  2020-02-07 12:14         ` Wei Yang
  1 sibling, 2 replies; 24+ messages in thread
From: Baoquan He @ 2020-02-07  3:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On 02/06/20 at 07:21pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
> >
> > Hi Dan,
> >
> > On 02/06/20 at 06:19pm, Dan Williams wrote:
> > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index b5da121bdd6e..56816f653588 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > > >         /* Align memmap to section boundary in the subsection case */
> > > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > > >                 section_nr_to_pfn(section_nr) != start_pfn)
> > > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> > >
> > > Yes, this looks obviously correct. This might be tripping up
> > > makedumpfile. Do you see any practical effects of this bug? The kernel
> > > mostly avoids ->section_mem_map in the vmemmap case and in the
> > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > > start_pfn.
> >
> > The practical effects is that the memmap for the first unaligned section will be lost
> > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> > into mem_section, and get memmap from the related mem_section to free it in
> > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> > with memmap.
> 
> Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?

I think no, the lost memmap should only happen in vmemmap case.

> 
> > By the way, sub-section support is only valid in vmemmap case, right?
> 
> Yes.
> 
> > Seems yes from code, but I don't find any document to prove it.
> 
> check_pfn_span() enforces this requirement.

Thanks for your confirmation. Do you mind if I add some document
sentences somewhere make clear this?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP
  2020-02-07  2:06   ` Dan Williams
@ 2020-02-07  3:50     ` Baoquan He
  2020-02-07 11:02     ` Wei Yang
  1 sibling, 0 replies; 24+ messages in thread
From: Baoquan He @ 2020-02-07  3:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On 02/06/20 at 06:06pm, Dan Williams wrote:
> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> >
> > In case of SPARSEMEM, populate_section_memmap() would allocate memmap
> > for the whole section, even we just want a sub-section. This would lead
> > to memmap overwrite if we a sub-section to an already populated section.
> >
> > Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.
> >
> > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> > CC: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  mm/sparse.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 56816f653588..c75ca40db513 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
> >         if (nr_pages < PAGES_PER_SECTION && early_section(ms))
> >                 return pfn_to_page(pfn);
> >
> > +       /*
> > +        * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
> > +        * whole section, even for a sub-section.
> > +        *
> > +        * Return its memmap if already populated to avoid memmap overwrite.
> > +        */
> > +       if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > +               valid_section(ms))
> > +               return __section_mem_map_addr(ms);
> 
> Again, is check_pfn_span() failing to prevent this path?

The answer should be yes, this patch is not needed.

> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  3:36         ` Baoquan He
@ 2020-02-07  4:05           ` Dan Williams
  2020-02-07 11:13           ` Wei Yang
  1 sibling, 0 replies; 24+ messages in thread
From: Dan Williams @ 2020-02-07  4:05 UTC (permalink / raw)
  To: Baoquan He
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On Thu, Feb 6, 2020 at 7:36 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 02/06/20 at 07:21pm, Dan Williams wrote:
> > On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
> > >
> > > Hi Dan,
> > >
> > > On 02/06/20 at 06:19pm, Dan Williams wrote:
> > > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> > > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > > index b5da121bdd6e..56816f653588 100644
> > > > > --- a/mm/sparse.c
> > > > > +++ b/mm/sparse.c
> > > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> > > > >         /* Align memmap to section boundary in the subsection case */
> > > > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> > > > >                 section_nr_to_pfn(section_nr) != start_pfn)
> > > > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> > > > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> > > >
> > > > Yes, this looks obviously correct. This might be tripping up
> > > > makedumpfile. Do you see any practical effects of this bug? The kernel
> > > > mostly avoids ->section_mem_map in the vmemmap case and in the
> > > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> > > > start_pfn.
> > >
> > > The practical effects is that the memmap for the first unaligned section will be lost
> > > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> > > into mem_section, and get memmap from the related mem_section to free it in
> > > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> > > with memmap.
> >
> > Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
> I think no, the lost memmap should only happen in vmemmap case.
>
> >
> > > By the way, sub-section support is only valid in vmemmap case, right?
> >
> > Yes.
> >
> > > Seems yes from code, but I don't find any document to prove it.
> >
> > check_pfn_span() enforces this requirement.
>
> Thanks for your confirmation. Do you mind if I add some document
> sentences somewhere make clear this?
>

Sure, I'd be happy to review as well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-06 23:16 ` [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn Wei Yang
  2020-02-07  2:19   ` Dan Williams
@ 2020-02-07  4:11   ` Baoquan He
  2020-02-07 10:53     ` Wei Yang
  1 sibling, 1 reply; 24+ messages in thread
From: Baoquan He @ 2020-02-07  4:11 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, osalvador, dan.j.williams, linux-mm, linux-kernel, david

On 02/07/20 at 07:16am, Wei Yang wrote:
> memmap should be the physical address to page struct instead of virtual
> address to pfn.

Maybe not, memmap stores a virtual address.

> 
> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> this point.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index b5da121bdd6e..56816f653588 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>  	/* Align memmap to section boundary in the subsection case */
>  	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>  		section_nr_to_pfn(section_nr) != start_pfn)
> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));

With Dan's confirmation, sub-section is only valid in vmemmap case. I
think the old if (section_nr_to_pfn(section_nr) != start_pfn) is enough
to filter out non vmemmap case. So only below code is good:

 +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));

>  	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>  
>  	return 0;
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  2:19   ` Dan Williams
  2020-02-07  3:10     ` Baoquan He
@ 2020-02-07 10:51     ` Wei Yang
  2020-02-07 11:26     ` Wei Yang
  2 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-07 10:51 UTC (permalink / raw)
  To: Dan Williams
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, Baoquan He, David Hildenbrand

On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>>
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>         /* Align memmap to section boundary in the subsection case */
>>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>                 section_nr_to_pfn(section_nr) != start_pfn)
>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>Yes, this looks obviously correct. This might be tripping up
>makedumpfile. Do you see any practical effects of this bug? The kernel
>mostly avoids ->section_mem_map in the vmemmap case and in the
>!vmemmap case section_nr_to_pfn(section_nr) should always equal
>start_pfn.

To summarize:

 * vmemmap, ->section_mem_map is not used mostly
 * !vmemmap, we are sure range is section aligned

Sounds we don't need to handle this?

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  4:11   ` Baoquan He
@ 2020-02-07 10:53     ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-07 10:53 UTC (permalink / raw)
  To: Baoquan He
  Cc: Wei Yang, akpm, osalvador, dan.j.williams, linux-mm, linux-kernel, david

On Fri, Feb 07, 2020 at 12:11:34PM +0800, Baoquan He wrote:
>On 02/07/20 at 07:16am, Wei Yang wrote:
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>
>Maybe not, memmap stores a virtual address.
>
>> 
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>> 
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>  	/* Align memmap to section boundary in the subsection case */
>>  	if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>  		section_nr_to_pfn(section_nr) != start_pfn)
>> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>With Dan's confirmation, sub-section is only valid in vmemmap case. I
>think the old if (section_nr_to_pfn(section_nr) != start_pfn) is enough
>to filter out non vmemmap case. So only below code is good:
>
> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>

You mean replace pfn_to_kaddr with pfn_to_page ?

>>  	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>>  
>>  	return 0;
>> -- 
>> 2.17.1
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP
  2020-02-07  2:06   ` Dan Williams
  2020-02-07  3:50     ` Baoquan He
@ 2020-02-07 11:02     ` Wei Yang
  1 sibling, 0 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-07 11:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, Baoquan He, David Hildenbrand

On Thu, Feb 06, 2020 at 06:06:54PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>
>> In case of SPARSEMEM, populate_section_memmap() would allocate memmap
>> for the whole section, even we just want a sub-section. This would lead
>> to memmap overwrite if we a sub-section to an already populated section.
>>
>> Just return the populated memmap for non-SPARSEMEM_VMEMMAP case.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 56816f653588..c75ca40db513 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -836,6 +836,16 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
>>         if (nr_pages < PAGES_PER_SECTION && early_section(ms))
>>                 return pfn_to_page(pfn);
>>
>> +       /*
>> +        * If it is not SPARSEMEM_VMEMMAP, we always populate memmap for the
>> +        * whole section, even for a sub-section.
>> +        *
>> +        * Return its memmap if already populated to avoid memmap overwrite.
>> +        */
>> +       if (!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> +               valid_section(ms))
>> +               return __section_mem_map_addr(ms);
>
>Again, is check_pfn_span() failing to prevent this path?

Oh, you are right. Thanks

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP
  2020-02-07  2:00   ` Dan Williams
@ 2020-02-07 11:06     ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-07 11:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, Baoquan He, David Hildenbrand

On Thu, Feb 06, 2020 at 06:00:13PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>
>> Only when SPARSEMEM_VMEMMAP is set, memmap returned from
>> section_activate() points to sub-section page struct. Otherwise, memmap
>> already points to the whole section page struct.
>>
>> This means only for SPARSEMEM_VMEMMAP, we need to adjust memmap for
>> sub-section case.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 586d85662978..b5da121bdd6e 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -886,7 +886,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>         section_mark_present(ms);
>>
>>         /* Align memmap to section boundary in the subsection case */
>> -       if (section_nr_to_pfn(section_nr) != start_pfn)
>> +       if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> +               section_nr_to_pfn(section_nr) != start_pfn)
>
>Aren't we assured that start_pfn is always section aligned in the
>SPARSEMEM case? That's the role of check_pfn_span(). Does the change
>have a runtime impact or is this just theoretical?

You are right, I missed this point.

Thanks

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  3:36         ` Baoquan He
  2020-02-07  4:05           ` Dan Williams
@ 2020-02-07 11:13           ` Wei Yang
  1 sibling, 0 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-07 11:13 UTC (permalink / raw)
  To: Baoquan He
  Cc: Dan Williams, Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On Fri, Feb 07, 2020 at 11:36:36AM +0800, Baoquan He wrote:
>On 02/06/20 at 07:21pm, Dan Williams wrote:
>> On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
>> >
>> > Hi Dan,
>> >
>> > On 02/06/20 at 06:19pm, Dan Williams wrote:
>> > > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>> > > > diff --git a/mm/sparse.c b/mm/sparse.c
>> > > > index b5da121bdd6e..56816f653588 100644
>> > > > --- a/mm/sparse.c
>> > > > +++ b/mm/sparse.c
>> > > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> > > >         /* Align memmap to section boundary in the subsection case */
>> > > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> > > >                 section_nr_to_pfn(section_nr) != start_pfn)
>> > > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> > > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>> > >
>> > > Yes, this looks obviously correct. This might be tripping up
>> > > makedumpfile. Do you see any practical effects of this bug? The kernel
>> > > mostly avoids ->section_mem_map in the vmemmap case and in the
>> > > !vmemmap case section_nr_to_pfn(section_nr) should always equal
>> > > start_pfn.
>> >
>> > The practical effects is that the memmap for the first unaligned section will be lost
>> > when destroy namespace to hot remove it. Because we encode the ->section_mem_map
>> > into mem_section, and get memmap from the related mem_section to free it in
>> > section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
>> > with memmap.
>> 
>> Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
>I think no, the lost memmap should only happen in vmemmap case.
>
>> 
>> > By the way, sub-section support is only valid in vmemmap case, right?
>> 
>> Yes.
>> 
>> > Seems yes from code, but I don't find any document to prove it.
>> 
>> check_pfn_span() enforces this requirement.

I saw this function, but those combination of vmemmap and !vmemmap make my
brain not work properly.

>
>Thanks for your confirmation. Do you mind if I add some document
>sentences somewhere make clear this?

Thanks, hope this would help the future audience.


-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  2:19   ` Dan Williams
  2020-02-07  3:10     ` Baoquan He
  2020-02-07 10:51     ` Wei Yang
@ 2020-02-07 11:26     ` Wei Yang
  2020-02-09 13:50       ` Baoquan He
  2 siblings, 1 reply; 24+ messages in thread
From: Wei Yang @ 2020-02-07 11:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, Baoquan He, David Hildenbrand

On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>
>> memmap should be the physical address to page struct instead of virtual
>> address to pfn.
>>
>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>> this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index b5da121bdd6e..56816f653588 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>         /* Align memmap to section boundary in the subsection case */
>>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>                 section_nr_to_pfn(section_nr) != start_pfn)
>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>Yes, this looks obviously correct. This might be tripping up
>makedumpfile. Do you see any practical effects of this bug? The kernel
>mostly avoids ->section_mem_map in the vmemmap case and in the
>!vmemmap case section_nr_to_pfn(section_nr) should always equal
>start_pfn.

I took another look into the code. Looks there is no practical effect after
this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
the real memmap.

But leave a inconsistent data in section_mem_map is not a good practice.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07  3:21       ` Dan Williams
  2020-02-07  3:36         ` Baoquan He
@ 2020-02-07 12:14         ` Wei Yang
  2020-02-07 16:44           ` Dan Williams
  1 sibling, 1 reply; 24+ messages in thread
From: Wei Yang @ 2020-02-07 12:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Baoquan He, Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On Thu, Feb 06, 2020 at 07:21:49PM -0800, Dan Williams wrote:
>On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
>>
>> Hi Dan,
>>
>> On 02/06/20 at 06:19pm, Dan Williams wrote:
>> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>> > > diff --git a/mm/sparse.c b/mm/sparse.c
>> > > index b5da121bdd6e..56816f653588 100644
>> > > --- a/mm/sparse.c
>> > > +++ b/mm/sparse.c
>> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>> > >         /* Align memmap to section boundary in the subsection case */
>> > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>> > >                 section_nr_to_pfn(section_nr) != start_pfn)
>> > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>> >
>> > Yes, this looks obviously correct. This might be tripping up
>> > makedumpfile. Do you see any practical effects of this bug? The kernel
>> > mostly avoids ->section_mem_map in the vmemmap case and in the
>> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
>> > start_pfn.
>>
>> The practical effects is that the memmap for the first unaligned section will be lost
>> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
>> into mem_section, and get memmap from the related mem_section to free it in
>> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
>> with memmap.
>
>Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
>
>> By the way, sub-section support is only valid in vmemmap case, right?
>
>Yes.

Just one question from curiosity. Why we don't want sub-section for !vmemmap
case? Because it will wast memory for memmap?

>
>> Seems yes from code, but I don't find any document to prove it.
>
>check_pfn_span() enforces this requirement.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07 12:14         ` Wei Yang
@ 2020-02-07 16:44           ` Dan Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2020-02-07 16:44 UTC (permalink / raw)
  To: Wei Yang
  Cc: Baoquan He, Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On Fri, Feb 7, 2020 at 4:15 AM Wei Yang <richard.weiyang@gmail.com> wrote:
>
> On Thu, Feb 06, 2020 at 07:21:49PM -0800, Dan Williams wrote:
> >On Thu, Feb 6, 2020 at 7:10 PM Baoquan He <bhe@redhat.com> wrote:
> >>
> >> Hi Dan,
> >>
> >> On 02/06/20 at 06:19pm, Dan Williams wrote:
> >> > On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> >> > > diff --git a/mm/sparse.c b/mm/sparse.c
> >> > > index b5da121bdd6e..56816f653588 100644
> >> > > --- a/mm/sparse.c
> >> > > +++ b/mm/sparse.c
> >> > > @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >> > >         /* Align memmap to section boundary in the subsection case */
> >> > >         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >> > >                 section_nr_to_pfn(section_nr) != start_pfn)
> >> > > -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> >> > > +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >> >
> >> > Yes, this looks obviously correct. This might be tripping up
> >> > makedumpfile. Do you see any practical effects of this bug? The kernel
> >> > mostly avoids ->section_mem_map in the vmemmap case and in the
> >> > !vmemmap case section_nr_to_pfn(section_nr) should always equal
> >> > start_pfn.
> >>
> >> The practical effects is that the memmap for the first unaligned section will be lost
> >> when destroy namespace to hot remove it. Because we encode the ->section_mem_map
> >> into mem_section, and get memmap from the related mem_section to free it in
> >> section_deactivate(). In fact in vmemmap, we don't need to encode the ->section_mem_map
> >> with memmap.
> >
> >Right, but can you actually trigger that in the SPARSEMEM_VMEMMAP=n case?
> >
> >> By the way, sub-section support is only valid in vmemmap case, right?
> >
> >Yes.
>
> Just one question from curiosity. Why we don't want sub-section for !vmemmap
> case? Because it will wast memory for memmap?

The effort and maintenance burden outweighs the benefit.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-07 11:26     ` Wei Yang
@ 2020-02-09 13:50       ` Baoquan He
  2020-02-09 14:14         ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Baoquan He @ 2020-02-09 13:50 UTC (permalink / raw)
  To: Wei Yang
  Cc: Dan Williams, Wei Yang, Andrew Morton, Oscar Salvador, Linux MM,
	Linux Kernel Mailing List, David Hildenbrand

On 02/07/20 at 11:26am, Wei Yang wrote:
> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
> >On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
> >>
> >> memmap should be the physical address to page struct instead of virtual
> >> address to pfn.
> >>
> >> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
> >> this point.
> >>
> >> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> CC: Dan Williams <dan.j.williams@intel.com>
> >> ---
> >>  mm/sparse.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/sparse.c b/mm/sparse.c
> >> index b5da121bdd6e..56816f653588 100644
> >> --- a/mm/sparse.c
> >> +++ b/mm/sparse.c
> >> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
> >>         /* Align memmap to section boundary in the subsection case */
> >>         if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
> >>                 section_nr_to_pfn(section_nr) != start_pfn)
> >> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> >> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> >
> >Yes, this looks obviously correct. This might be tripping up
> >makedumpfile. Do you see any practical effects of this bug? The kernel
> >mostly avoids ->section_mem_map in the vmemmap case and in the
> >!vmemmap case section_nr_to_pfn(section_nr) should always equal
> >start_pfn.
> 
> I took another look into the code. Looks there is no practical effect after
> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
> the real memmap.
> 
> But leave a inconsistent data in section_mem_map is not a good practice.

Yeah, it does has no pratical effect. I tried to create sub-section
alighed namespace, then trigger crash, makedumpfile isn't impacted.
Because pmem memory is only added, but not onlined. We don't report it
to kdump, makedumpfile will ignore it.

I think it's worth fixing it to encode a correct memmap address. We
don't know if in the future this will break anything.

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-09 13:50       ` Baoquan He
@ 2020-02-09 14:14         ` David Hildenbrand
  2020-02-10  0:36           ` Wei Yang
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2020-02-09 14:14 UTC (permalink / raw)
  To: Baoquan He
  Cc: Wei Yang, Dan Williams, Wei Yang, Andrew Morton, Oscar Salvador,
	Linux MM, Linux Kernel Mailing List, David Hildenbrand



> Am 09.02.2020 um 14:50 schrieb Baoquan He <bhe@redhat.com>:
> 
> On 02/07/20 at 11:26am, Wei Yang wrote:
>>> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>>> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>>> 
>>>> memmap should be the physical address to page struct instead of virtual
>>>> address to pfn.
>>>> 
>>>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>>>> this point.
>>>> 
>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>> CC: Dan Williams <dan.j.williams@intel.com>
>>>> ---
>>>> mm/sparse.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>> index b5da121bdd6e..56816f653588 100644
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>>        /* Align memmap to section boundary in the subsection case */
>>>>        if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>>>                section_nr_to_pfn(section_nr) != start_pfn)
>>>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>> 
>>> Yes, this looks obviously correct. This might be tripping up
>>> makedumpfile. Do you see any practical effects of this bug? The kernel
>>> mostly avoids ->section_mem_map in the vmemmap case and in the
>>> !vmemmap case section_nr_to_pfn(section_nr) should always equal
>>> start_pfn.
>> 
>> I took another look into the code. Looks there is no practical effect after
>> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
>> the real memmap.
>> 
>> But leave a inconsistent data in section_mem_map is not a good practice.
> 
> Yeah, it does has no pratical effect. I tried to create sub-section
> alighed namespace, then trigger crash, makedumpfile isn't impacted.
> Because pmem memory is only added, but not onlined. We don't report it
> to kdump, makedumpfile will ignore it.
> 
> I think it's worth fixing it to encode a correct memmap address. We
> don't know if in the future this will break anything.

We can have system memory and devmem overlap within a section (which is still buggy and to be fixed in other regard - e.g., pfn_to_online_page() does not work correctly).

E.g., 64 mb of (boot) system memory in a section. Then you can hot-add devmem that spans the remaining 64 mb of that section.

So some of that memory will be kdumped - and should be fixed if broken.

Cheers


> 
> Thanks
> Baoquan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn
  2020-02-09 14:14         ` David Hildenbrand
@ 2020-02-10  0:36           ` Wei Yang
  0 siblings, 0 replies; 24+ messages in thread
From: Wei Yang @ 2020-02-10  0:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Baoquan He, Wei Yang, Dan Williams, Wei Yang, Andrew Morton,
	Oscar Salvador, Linux MM, Linux Kernel Mailing List

On Sun, Feb 09, 2020 at 03:14:28PM +0100, David Hildenbrand wrote:
>
>
>> Am 09.02.2020 um 14:50 schrieb Baoquan He <bhe@redhat.com>:
>> 
>> On 02/07/20 at 11:26am, Wei Yang wrote:
>>>> On Thu, Feb 06, 2020 at 06:19:46PM -0800, Dan Williams wrote:
>>>> On Thu, Feb 6, 2020 at 3:17 PM Wei Yang <richardw.yang@linux.intel.com> wrote:
>>>>> 
>>>>> memmap should be the physical address to page struct instead of virtual
>>>>> address to pfn.
>>>>> 
>>>>> Since we call this only for SPARSEMEM_VMEMMAP, pfn_to_page() is valid at
>>>>> this point.
>>>>> 
>>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>>> CC: Dan Williams <dan.j.williams@intel.com>
>>>>> ---
>>>>> mm/sparse.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>>> index b5da121bdd6e..56816f653588 100644
>>>>> --- a/mm/sparse.c
>>>>> +++ b/mm/sparse.c
>>>>> @@ -888,7 +888,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>>>        /* Align memmap to section boundary in the subsection case */
>>>>>        if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) &&
>>>>>                section_nr_to_pfn(section_nr) != start_pfn)
>>>>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>>> +               memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>>> 
>>>> Yes, this looks obviously correct. This might be tripping up
>>>> makedumpfile. Do you see any practical effects of this bug? The kernel
>>>> mostly avoids ->section_mem_map in the vmemmap case and in the
>>>> !vmemmap case section_nr_to_pfn(section_nr) should always equal
>>>> start_pfn.
>>> 
>>> I took another look into the code. Looks there is no practical effect after
>>> this. Because in the vmemmap case, we don't need ->section_mem_map to retrieve
>>> the real memmap.
>>> 
>>> But leave a inconsistent data in section_mem_map is not a good practice.
>> 
>> Yeah, it does has no pratical effect. I tried to create sub-section
>> alighed namespace, then trigger crash, makedumpfile isn't impacted.
>> Because pmem memory is only added, but not onlined. We don't report it
>> to kdump, makedumpfile will ignore it.
>> 
>> I think it's worth fixing it to encode a correct memmap address. We
>> don't know if in the future this will break anything.
>
>We can have system memory and devmem overlap within a section (which is still buggy and to be fixed in other regard - e.g., pfn_to_online_page() does not work correctly).
>
>E.g., 64 mb of (boot) system memory in a section. Then you can hot-add devmem that spans the remaining 64 mb of that section.
>
>So some of that memory will be kdumped - and should be fixed if broken.
>
>Cheers

Thanks for the explanation, I will add this in the changelog.

>
>
>> 
>> Thanks
>> Baoquan

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-02-10  0:36 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-06 23:16 [PATCH 0/3] Fixes "mm/sparsemem: support sub-section hotplug" Wei Yang
2020-02-06 23:16 ` [PATCH 1/3] mm/sparsemem: adjust memmap only for SPARSEMEM_VMEMMAP Wei Yang
2020-02-07  2:00   ` Dan Williams
2020-02-07 11:06     ` Wei Yang
2020-02-06 23:16 ` [PATCH 2/3] mm/sparsemem: get physical address to page struct instead of virtual address to pfn Wei Yang
2020-02-07  2:19   ` Dan Williams
2020-02-07  3:10     ` Baoquan He
2020-02-07  3:21       ` Dan Williams
2020-02-07  3:36         ` Baoquan He
2020-02-07  4:05           ` Dan Williams
2020-02-07 11:13           ` Wei Yang
2020-02-07 12:14         ` Wei Yang
2020-02-07 16:44           ` Dan Williams
2020-02-07 10:51     ` Wei Yang
2020-02-07 11:26     ` Wei Yang
2020-02-09 13:50       ` Baoquan He
2020-02-09 14:14         ` David Hildenbrand
2020-02-10  0:36           ` Wei Yang
2020-02-07  4:11   ` Baoquan He
2020-02-07 10:53     ` Wei Yang
2020-02-06 23:16 ` [PATCH 3/3] mm/sparsemem: avoid memmap overwrite for non-SPARSEMEM_VMEMMAP Wei Yang
2020-02-07  2:06   ` Dan Williams
2020-02-07  3:50     ` Baoquan He
2020-02-07 11:02     ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).