linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/sparse: Consistently do not zero memmap
@ 2019-10-30 13:11 Vincent Whitchurch
  2019-10-30 13:29 ` Michal Hocko
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Vincent Whitchurch @ 2019-10-30 13:11 UTC (permalink / raw)
  To: akpm
  Cc: osalvador, mhocko, pasha.tatashin, linux-mm, linux-kernel,
	Vincent Whitchurch

sparsemem without VMEMMAP has two allocation paths to allocate the
memory needed for its memmap (done in sparse_mem_map_populate()).

In one allocation path (sparse_buffer_alloc() succeeds), the memory is
not zeroed (since it was previously allocated with
memblock_alloc_try_nid_raw()).

In the other allocation path (sparse_buffer_alloc() fails and
sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
memory is zeroed.

AFAICS this difference does not appear to be on purpose.  If the code is
supposed to work with non-initialized memory (__init_single_page() takes
care of zeroing the struct pages which are actually used), we should
consistently not zero the memory, to avoid masking bugs.

(I noticed this because on my ARM64 platform, with 1 GiB of memory the
 first [and only] section is allocated from the zeroing path while with
 2 GiB of memory the first 1 GiB section is allocated from the
 non-zeroing path.)

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
---
 mm/sparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index f6891c1992b1..01e467adc219 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -458,7 +458,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
 	if (map)
 		return map;
 
-	map = memblock_alloc_try_nid(size,
+	map = memblock_alloc_try_nid_raw(size,
 					  PAGE_SIZE, addr,
 					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 	if (!map)
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 13:11 [PATCH] mm/sparse: Consistently do not zero memmap Vincent Whitchurch
@ 2019-10-30 13:29 ` Michal Hocko
  2019-10-30 14:02   ` Vincent Whitchurch
  2019-10-30 13:31 ` David Hildenbrand
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-30 13:29 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: akpm, osalvador, pasha.tatashin, linux-mm, linux-kernel,
	Vincent Whitchurch

On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> sparsemem without VMEMMAP has two allocation paths to allocate the
> memory needed for its memmap (done in sparse_mem_map_populate()).
> 
> In one allocation path (sparse_buffer_alloc() succeeds), the memory is
> not zeroed (since it was previously allocated with
> memblock_alloc_try_nid_raw()).
> 
> In the other allocation path (sparse_buffer_alloc() fails and
> sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
> memory is zeroed.
> 
> AFAICS this difference does not appear to be on purpose.  If the code is
> supposed to work with non-initialized memory (__init_single_page() takes
> care of zeroing the struct pages which are actually used), we should
> consistently not zero the memory, to avoid masking bugs.

You are right that this is not intentional.

> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>  first [and only] section is allocated from the zeroing path while with
>  2 GiB of memory the first 1 GiB section is allocated from the
>  non-zeroing path.)

Do I get it right that sparse_buffer_init couldn't allocate memmap for
the full node for some reason and so sparse_init_nid would have to
allocate one for each memory section?

> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>

Anyway the patch is OK. Even though this is not a bug strictly speaking
it is certainly a suboptimal behavior because zeroying takes time so
I would flag this for a stable tree 4.19+. There is no clear Fixes tag
to apply (35fd1eb1e8212 would get closest I guess).

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f6891c1992b1..01e467adc219 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -458,7 +458,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>  	if (map)
>  		return map;
>  
> -	map = memblock_alloc_try_nid(size,
> +	map = memblock_alloc_try_nid_raw(size,
>  					  PAGE_SIZE, addr,
>  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	if (!map)
> -- 
> 2.20.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 13:11 [PATCH] mm/sparse: Consistently do not zero memmap Vincent Whitchurch
  2019-10-30 13:29 ` Michal Hocko
@ 2019-10-30 13:31 ` David Hildenbrand
  2019-10-30 13:38 ` Oscar Salvador
  2019-10-30 15:21 ` Pavel Tatashin
  3 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2019-10-30 13:31 UTC (permalink / raw)
  To: Vincent Whitchurch, akpm
  Cc: osalvador, mhocko, pasha.tatashin, linux-mm, linux-kernel,
	Vincent Whitchurch

On 30.10.19 14:11, Vincent Whitchurch wrote:
> sparsemem without VMEMMAP has two allocation paths to allocate the
> memory needed for its memmap (done in sparse_mem_map_populate()).
> 
> In one allocation path (sparse_buffer_alloc() succeeds), the memory is
> not zeroed (since it was previously allocated with
> memblock_alloc_try_nid_raw()).
> 
> In the other allocation path (sparse_buffer_alloc() fails and
> sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
> memory is zeroed.
> 
> AFAICS this difference does not appear to be on purpose.  If the code is
> supposed to work with non-initialized memory (__init_single_page() takes
> care of zeroing the struct pages which are actually used), we should
> consistently not zero the memory, to avoid masking bugs.

I agree

Acked-by: David Hildenbrand <david@redhat.com>

> 
> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>   first [and only] section is allocated from the zeroing path while with
>   2 GiB of memory the first 1 GiB section is allocated from the
>   non-zeroing path.)
> 
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
> ---
>   mm/sparse.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f6891c1992b1..01e467adc219 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -458,7 +458,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>   	if (map)
>   		return map;
>   
> -	map = memblock_alloc_try_nid(size,
> +	map = memblock_alloc_try_nid_raw(size,
>   					  PAGE_SIZE, addr,
>   					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>   	if (!map)
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 13:11 [PATCH] mm/sparse: Consistently do not zero memmap Vincent Whitchurch
  2019-10-30 13:29 ` Michal Hocko
  2019-10-30 13:31 ` David Hildenbrand
@ 2019-10-30 13:38 ` Oscar Salvador
  2019-10-30 15:21 ` Pavel Tatashin
  3 siblings, 0 replies; 19+ messages in thread
From: Oscar Salvador @ 2019-10-30 13:38 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: akpm, mhocko, pasha.tatashin, linux-mm, linux-kernel, Vincent Whitchurch

On Wed, Oct 30, 2019 at 02:11:22PM +0100, Vincent Whitchurch wrote:
> sparsemem without VMEMMAP has two allocation paths to allocate the
> memory needed for its memmap (done in sparse_mem_map_populate()).
> 
> In one allocation path (sparse_buffer_alloc() succeeds), the memory is
> not zeroed (since it was previously allocated with
> memblock_alloc_try_nid_raw()).
> 
> In the other allocation path (sparse_buffer_alloc() fails and
> sparse_mem_map_populate() falls back to memblock_alloc_try_nid()), the
> memory is zeroed.
> 
> AFAICS this difference does not appear to be on purpose.  If the code is
> supposed to work with non-initialized memory (__init_single_page() takes
> care of zeroing the struct pages which are actually used), we should
> consistently not zero the memory, to avoid masking bugs.
> 
> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>  first [and only] section is allocated from the zeroing path while with
>  2 GiB of memory the first 1 GiB section is allocated from the
>  non-zeroing path.)
> 
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>

Good catch

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index f6891c1992b1..01e467adc219 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -458,7 +458,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>  	if (map)
>  		return map;
>  
> -	map = memblock_alloc_try_nid(size,
> +	map = memblock_alloc_try_nid_raw(size,
>  					  PAGE_SIZE, addr,
>  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	if (!map)
> -- 
> 2.20.0
> 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 13:29 ` Michal Hocko
@ 2019-10-30 14:02   ` Vincent Whitchurch
  2019-10-30 14:12     ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Vincent Whitchurch @ 2019-10-30 14:02 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, osalvador, linux-mm, linux-kernel

On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> >  first [and only] section is allocated from the zeroing path while with
> >  2 GiB of memory the first 1 GiB section is allocated from the
> >  non-zeroing path.)
> 
> Do I get it right that sparse_buffer_init couldn't allocate memmap for
> the full node for some reason and so sparse_init_nid would have to
> allocate one for each memory section?

Not quite.  The sparsemap_buf is successfully allocated with the correct
size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
allocate the same size from it.

The reason it fails is that sparse_buffer_alloc() for some reason wants
to return a pointer which is aligned to the allocation size.  But the
sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
enough space to align it.

I don't understand the reason for this alignment requirement since the
fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 14:02   ` Vincent Whitchurch
@ 2019-10-30 14:12     ` Michal Hocko
  2019-10-30 15:20       ` Pavel Tatashin
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-30 14:12 UTC (permalink / raw)
  To: Vincent Whitchurch, Pavel Tatashin
  Cc: akpm, osalvador, linux-mm, linux-kernel

[Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
 but it used your old email address]

On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > >  first [and only] section is allocated from the zeroing path while with
> > >  2 GiB of memory the first 1 GiB section is allocated from the
> > >  non-zeroing path.)
> > 
> > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > the full node for some reason and so sparse_init_nid would have to
> > allocate one for each memory section?
> 
> Not quite.  The sparsemap_buf is successfully allocated with the correct
> size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> allocate the same size from it.
> 
> The reason it fails is that sparse_buffer_alloc() for some reason wants
> to return a pointer which is aligned to the allocation size.  But the
> sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> enough space to align it.
> 
> I don't understand the reason for this alignment requirement since the
> fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?

I am not 100% sure TBH. Aligning makes some sense when mapping the
memmaps to page tables but that would suggest that sparse_buffer_init
is using a wrong alignment then. It is quite wasteful to allocate
alarge misaligned block like that.

Your patch still makes sense but this is something to look into.

Pavel?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 14:12     ` Michal Hocko
@ 2019-10-30 15:20       ` Pavel Tatashin
  2019-10-30 15:31         ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2019-10-30 15:20 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vincent Whitchurch, akpm, osalvador, linux-mm, linux-kernel

On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
>  but it used your old email address]
>
> On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > >  first [and only] section is allocated from the zeroing path while with
> > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > >  non-zeroing path.)
> > >
> > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > the full node for some reason and so sparse_init_nid would have to
> > > allocate one for each memory section?
> >
> > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > allocate the same size from it.
> >
> > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > to return a pointer which is aligned to the allocation size.  But the
> > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > enough space to align it.
> >
> > I don't understand the reason for this alignment requirement since the
> > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
>
> I am not 100% sure TBH. Aligning makes some sense when mapping the
> memmaps to page tables but that would suggest that sparse_buffer_init
> is using a wrong alignment then. It is quite wasteful to allocate
> alarge misaligned block like that.
>
> Your patch still makes sense but this is something to look into.
>
> Pavel?

I remember thinking about this large alignment, as it looked out of
place to me also.
It was there to keep memmap in single chunks on larger x86 machines.
Perhaps it can be revisited now.

The patch that introduced this alignment:
9bdac914240759457175ac0d6529a37d2820bc4d

vmemmap_alloc_block_buf
+       ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);

Pasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 13:11 [PATCH] mm/sparse: Consistently do not zero memmap Vincent Whitchurch
                   ` (2 preceding siblings ...)
  2019-10-30 13:38 ` Oscar Salvador
@ 2019-10-30 15:21 ` Pavel Tatashin
  3 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2019-10-30 15:21 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: Andrew Morton, Oscar Salvador, Michal Hocko, Pavel Tatashin,
	linux-mm, LKML, Vincent Whitchurch

> (I noticed this because on my ARM64 platform, with 1 GiB of memory the
>  first [and only] section is allocated from the zeroing path while with
>  2 GiB of memory the first 1 GiB section is allocated from the
>  non-zeroing path.)
>
> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>

LGTM, Thank you!
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 15:20       ` Pavel Tatashin
@ 2019-10-30 15:31         ` Michal Hocko
  2019-10-30 16:53           ` Pavel Tatashin
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-30 15:31 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Vincent Whitchurch, akpm, osalvador, linux-mm, linux-kernel

On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> >  but it used your old email address]
> >
> > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > >  first [and only] section is allocated from the zeroing path while with
> > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > >  non-zeroing path.)
> > > >
> > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > the full node for some reason and so sparse_init_nid would have to
> > > > allocate one for each memory section?
> > >
> > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > allocate the same size from it.
> > >
> > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > to return a pointer which is aligned to the allocation size.  But the
> > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > enough space to align it.
> > >
> > > I don't understand the reason for this alignment requirement since the
> > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> >
> > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > memmaps to page tables but that would suggest that sparse_buffer_init
> > is using a wrong alignment then. It is quite wasteful to allocate
> > alarge misaligned block like that.
> >
> > Your patch still makes sense but this is something to look into.
> >
> > Pavel?
> 
> I remember thinking about this large alignment, as it looked out of
> place to me also.
> It was there to keep memmap in single chunks on larger x86 machines.
> Perhaps it can be revisited now.

Don't we need 2MB aligned memmaps for their PMD mappings?

> The patch that introduced this alignment:
> 9bdac914240759457175ac0d6529a37d2820bc4d
> 
> vmemmap_alloc_block_buf
> +       ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
> 
> Pasha

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 15:31         ` Michal Hocko
@ 2019-10-30 16:53           ` Pavel Tatashin
  2019-10-30 17:31             ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2019-10-30 16:53 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vincent Whitchurch, akpm, osalvador, linux-mm, linux-kernel

On Wed, Oct 30, 2019 at 11:31 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> > On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> > >  but it used your old email address]
> > >
> > > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > > >  first [and only] section is allocated from the zeroing path while with
> > > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > > >  non-zeroing path.)
> > > > >
> > > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > > the full node for some reason and so sparse_init_nid would have to
> > > > > allocate one for each memory section?
> > > >
> > > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > > allocate the same size from it.
> > > >
> > > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > > to return a pointer which is aligned to the allocation size.  But the
> > > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > > enough space to align it.
> > > >
> > > > I don't understand the reason for this alignment requirement since the
> > > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> > >
> > > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > > memmaps to page tables but that would suggest that sparse_buffer_init
> > > is using a wrong alignment then. It is quite wasteful to allocate
> > > alarge misaligned block like that.
> > >
> > > Your patch still makes sense but this is something to look into.
> > >
> > > Pavel?
> >
> > I remember thinking about this large alignment, as it looked out of
> > place to me also.
> > It was there to keep memmap in single chunks on larger x86 machines.
> > Perhaps it can be revisited now.
>
> Don't we need 2MB aligned memmaps for their PMD mappings?

Yes, PMD_SIZE should be the alignment here. It just does not make
sense to align to size.

Pasha

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 16:53           ` Pavel Tatashin
@ 2019-10-30 17:31             ` Michal Hocko
  2019-10-30 17:53               ` Pavel Tatashin
  2019-10-31  7:25               ` Michal Hocko
  0 siblings, 2 replies; 19+ messages in thread
From: Michal Hocko @ 2019-10-30 17:31 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Vincent Whitchurch, akpm, osalvador, linux-mm, linux-kernel

On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
> On Wed, Oct 30, 2019 at 11:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> > > On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> > > >  but it used your old email address]
> > > >
> > > > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > > > >  first [and only] section is allocated from the zeroing path while with
> > > > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > > > >  non-zeroing path.)
> > > > > >
> > > > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > > > the full node for some reason and so sparse_init_nid would have to
> > > > > > allocate one for each memory section?
> > > > >
> > > > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > > > allocate the same size from it.
> > > > >
> > > > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > > > to return a pointer which is aligned to the allocation size.  But the
> > > > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > > > enough space to align it.
> > > > >
> > > > > I don't understand the reason for this alignment requirement since the
> > > > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> > > >
> > > > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > > > memmaps to page tables but that would suggest that sparse_buffer_init
> > > > is using a wrong alignment then. It is quite wasteful to allocate
> > > > alarge misaligned block like that.
> > > >
> > > > Your patch still makes sense but this is something to look into.
> > > >
> > > > Pavel?
> > >
> > > I remember thinking about this large alignment, as it looked out of
> > > place to me also.
> > > It was there to keep memmap in single chunks on larger x86 machines.
> > > Perhaps it can be revisited now.
> >
> > Don't we need 2MB aligned memmaps for their PMD mappings?
> 
> Yes, PMD_SIZE should be the alignment here. It just does not make
> sense to align to size.

What about this? It still aligns to the size but that should be
correctly done to the section size level.

diff --git a/mm/sparse.c b/mm/sparse.c
index 72f010d9bff5..ab1e6175ac9a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
 	if (map)
 		return map;
 
-	map = memblock_alloc_try_nid(size,
-					  PAGE_SIZE, addr,
+	map = memblock_alloc_try_nid(size, size, addr,
 					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 	if (!map)
 		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
@@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
 {
 	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
 	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
+	/*
+	 * Pre-allocated buffer is mainly used by __populate_section_memmap
+	 * and we want it to be properly aligned to the section size - this is
+	 * especially the case for VMEMMAP which maps memmap to PMDs
+	 */
 	sparsemap_buf =
-		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
+		memblock_alloc_try_nid_raw(size, section_map_size(),
 						addr,
 						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 	sparsemap_buf_end = sparsemap_buf + size;

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 17:31             ` Michal Hocko
@ 2019-10-30 17:53               ` Pavel Tatashin
  2019-10-30 18:01                 ` Michal Hocko
  2019-10-31  7:25               ` Michal Hocko
  1 sibling, 1 reply; 19+ messages in thread
From: Pavel Tatashin @ 2019-10-30 17:53 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vincent Whitchurch, akpm, osalvador, linux-mm, linux-kernel

On Wed, Oct 30, 2019 at 1:31 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
> > On Wed, Oct 30, 2019 at 11:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 30-10-19 11:20:44, Pavel Tatashin wrote:
> > > > On Wed, Oct 30, 2019 at 10:13 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > [Add Pavel - the email thread starts http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
> > > > >  but it used your old email address]
> > > > >
> > > > > On Wed 30-10-19 15:02:16, Vincent Whitchurch wrote:
> > > > > > On Wed, Oct 30, 2019 at 02:29:58PM +0100, Michal Hocko wrote:
> > > > > > > On Wed 30-10-19 14:11:22, Vincent Whitchurch wrote:
> > > > > > > > (I noticed this because on my ARM64 platform, with 1 GiB of memory the
> > > > > > > >  first [and only] section is allocated from the zeroing path while with
> > > > > > > >  2 GiB of memory the first 1 GiB section is allocated from the
> > > > > > > >  non-zeroing path.)
> > > > > > >
> > > > > > > Do I get it right that sparse_buffer_init couldn't allocate memmap for
> > > > > > > the full node for some reason and so sparse_init_nid would have to
> > > > > > > allocate one for each memory section?
> > > > > >
> > > > > > Not quite.  The sparsemap_buf is successfully allocated with the correct
> > > > > > size in sparse_buffer_init(), but sparse_buffer_alloc() fails to
> > > > > > allocate the same size from it.
> > > > > >
> > > > > > The reason it fails is that sparse_buffer_alloc() for some reason wants
> > > > > > to return a pointer which is aligned to the allocation size.  But the
> > > > > > sparsemap_buf was only allocated with PAGE_SIZE alignment so there's not
> > > > > > enough space to align it.
> > > > > >
> > > > > > I don't understand the reason for this alignment requirement since the
> > > > > > fallback path also allocates with PAGE_SIZE alignment.  I'm guessing the
> > > > > > alignment is for the VMEMAP code which also uses sparse_buffer_alloc()?
> > > > >
> > > > > I am not 100% sure TBH. Aligning makes some sense when mapping the
> > > > > memmaps to page tables but that would suggest that sparse_buffer_init
> > > > > is using a wrong alignment then. It is quite wasteful to allocate
> > > > > alarge misaligned block like that.
> > > > >
> > > > > Your patch still makes sense but this is something to look into.
> > > > >
> > > > > Pavel?
> > > >
> > > > I remember thinking about this large alignment, as it looked out of
> > > > place to me also.
> > > > It was there to keep memmap in single chunks on larger x86 machines.
> > > > Perhaps it can be revisited now.
> > >
> > > Don't we need 2MB aligned memmaps for their PMD mappings?
> >
> > Yes, PMD_SIZE should be the alignment here. It just does not make
> > sense to align to size.
>
> What about this? It still aligns to the size but that should be
> correctly done to the section size level.
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 72f010d9bff5..ab1e6175ac9a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>         if (map)
>                 return map;
>
> -       map = memblock_alloc_try_nid(size,
> -                                         PAGE_SIZE, addr,
> +       map = memblock_alloc_try_nid(size, size, addr,
>                                           MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>         if (!map)
>                 panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>  {
>         phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
>         WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
> +       /*
> +        * Pre-allocated buffer is mainly used by __populate_section_memmap
> +        * and we want it to be properly aligned to the section size - this is
> +        * especially the case for VMEMMAP which maps memmap to PMDs
> +        */
>         sparsemap_buf =
> -               memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> +               memblock_alloc_try_nid_raw(size, section_map_size(),
>                                                 addr,
>                                                 MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>         sparsemap_buf_end = sparsemap_buf + size;

This looks good, I think we should also change alignment in fallback
of vmemmap_alloc_block() to be
section_map_size().

+++ b/mm/sparse-vmemmap.c
@@ -65,9 +65,10 @@ void * __meminit vmemmap_alloc_block(unsigned long
size, int node)
                        warned = true;
                }
                return NULL;
-       } else
-               return __earlyonly_bootmem_alloc(node, size, size,
+       } else {
+               return __earlyonly_bootmem_alloc(node, size, section_map_size(),
                                __pa(MAX_DMA_ADDRESS));
+       }
 }

Pasha


>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 17:53               ` Pavel Tatashin
@ 2019-10-30 18:01                 ` Michal Hocko
  2019-10-30 18:23                   ` Pavel Tatashin
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-30 18:01 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Vincent Whitchurch, akpm, osalvador, linux-mm, linux-kernel

On Wed 30-10-19 13:53:52, Pavel Tatashin wrote:
> On Wed, Oct 30, 2019 at 1:31 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
[...]
> > > Yes, PMD_SIZE should be the alignment here. It just does not make
> > > sense to align to size.
> >
> > What about this? It still aligns to the size but that should be
> > correctly done to the section size level.
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 72f010d9bff5..ab1e6175ac9a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> >         if (map)
> >                 return map;
> >
> > -       map = memblock_alloc_try_nid(size,
> > -                                         PAGE_SIZE, addr,
> > +       map = memblock_alloc_try_nid(size, size, addr,
> >                                           MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >         if (!map)
> >                 panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> >  {
> >         phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> >         WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
> > +       /*
> > +        * Pre-allocated buffer is mainly used by __populate_section_memmap
> > +        * and we want it to be properly aligned to the section size - this is
> > +        * especially the case for VMEMMAP which maps memmap to PMDs
> > +        */
> >         sparsemap_buf =
> > -               memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > +               memblock_alloc_try_nid_raw(size, section_map_size(),
> >                                                 addr,
> >                                                 MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >         sparsemap_buf_end = sparsemap_buf + size;
> 
> This looks good, I think we should also change alignment in fallback
> of vmemmap_alloc_block() to be
> section_map_size().
> 
> +++ b/mm/sparse-vmemmap.c
> @@ -65,9 +65,10 @@ void * __meminit vmemmap_alloc_block(unsigned long
> size, int node)
>                         warned = true;
>                 }
>                 return NULL;
> -       } else
> -               return __earlyonly_bootmem_alloc(node, size, size,
> +       } else {
> +               return __earlyonly_bootmem_alloc(node, size, section_map_size(),
>                                 __pa(MAX_DMA_ADDRESS));
> +       }
>  }

Are you sure? Doesn't this provide the proper alignement already? Most
callers do PAGE_SIZE vmemmap_populate_hugepages PMD_SIZE so the
resulting alignment looks good to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 18:01                 ` Michal Hocko
@ 2019-10-30 18:23                   ` Pavel Tatashin
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Tatashin @ 2019-10-30 18:23 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vincent Whitchurch, akpm, osalvador, linux-mm, linux-kernel

On Wed, Oct 30, 2019 at 2:01 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 30-10-19 13:53:52, Pavel Tatashin wrote:
> > On Wed, Oct 30, 2019 at 1:31 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Wed 30-10-19 12:53:41, Pavel Tatashin wrote:
> [...]
> > > > Yes, PMD_SIZE should be the alignment here. It just does not make
> > > > sense to align to size.
> > >
> > > What about this? It still aligns to the size but that should be
> > > correctly done to the section size level.
> > >
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > >         if (map)
> > >                 return map;
> > >
> > > -       map = memblock_alloc_try_nid(size,
> > > -                                         PAGE_SIZE, addr,
> > > +       map = memblock_alloc_try_nid(size, size, addr,
> > >                                           MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >         if (!map)
> > >                 panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > >  {
> > >         phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > >         WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
> > > +       /*
> > > +        * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > +        * and we want it to be properly aligned to the section size - this is
> > > +        * especially the case for VMEMMAP which maps memmap to PMDs
> > > +        */
> > >         sparsemap_buf =
> > > -               memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > +               memblock_alloc_try_nid_raw(size, section_map_size(),
> > >                                                 addr,
> > >                                                 MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >         sparsemap_buf_end = sparsemap_buf + size;
> >
> > This looks good, I think we should also change alignment in fallback
> > of vmemmap_alloc_block() to be
> > section_map_size().
> >
> > +++ b/mm/sparse-vmemmap.c
> > @@ -65,9 +65,10 @@ void * __meminit vmemmap_alloc_block(unsigned long
> > size, int node)
> >                         warned = true;
> >                 }
> >                 return NULL;
> > -       } else
> > -               return __earlyonly_bootmem_alloc(node, size, size,
> > +       } else {
> > +               return __earlyonly_bootmem_alloc(node, size, section_map_size(),
> >                                 __pa(MAX_DMA_ADDRESS));
> > +       }
> >  }
>
> Are you sure? Doesn't this provide the proper alignement already? Most
> callers do PAGE_SIZE vmemmap_populate_hugepages PMD_SIZE so the
> resulting alignment looks good to me.

Nevermind, you are right. I tracked only one path, and forgot about
the normal PAGE_SIZE path.

Thank you,
Pasha

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-30 17:31             ` Michal Hocko
  2019-10-30 17:53               ` Pavel Tatashin
@ 2019-10-31  7:25               ` Michal Hocko
  2019-11-04 15:51                 ` Vincent Whitchurch
  1 sibling, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-10-31  7:25 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: Pavel Tatashin, akpm, osalvador, linux-mm, linux-kernel

Vincent, could you give this a try please? It would be even better if
you could add some debugging to measure the overhead. Let me know if you
need any help with a debugging patch.

Thanks!

On Wed 30-10-19 18:31:23, Michal Hocko wrote:
[...]
> What about this? It still aligns to the size but that should be
> correctly done to the section size level.
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 72f010d9bff5..ab1e6175ac9a 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
>  	if (map)
>  		return map;
>  
> -	map = memblock_alloc_try_nid(size,
> -					  PAGE_SIZE, addr,
> +	map = memblock_alloc_try_nid(size, size, addr,
>  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	if (!map)
>  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
>  {
>  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
>  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> +	/*
> +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> +	 * and we want it to be properly aligned to the section size - this is
> +	 * especially the case for VMEMMAP which maps memmap to PMDs
> +	 */
>  	sparsemap_buf =
> -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> +		memblock_alloc_try_nid_raw(size, section_map_size(),
>  						addr,
>  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  	sparsemap_buf_end = sparsemap_buf + size;
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-10-31  7:25               ` Michal Hocko
@ 2019-11-04 15:51                 ` Vincent Whitchurch
  2019-11-05  8:43                   ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Vincent Whitchurch @ 2019-11-04 15:51 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Pavel Tatashin, akpm, osalvador, linux-mm, linux-kernel

On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> [...]
> > What about this? It still aligns to the size but that should be
> > correctly done to the section size level.
> > 
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 72f010d9bff5..ab1e6175ac9a 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> >  	if (map)
> >  		return map;
> >  
> > -	map = memblock_alloc_try_nid(size,
> > -					  PAGE_SIZE, addr,
> > +	map = memblock_alloc_try_nid(size, size, addr,
> >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >  	if (!map)
> >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> >  {
> >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > +	/*
> > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > +	 * and we want it to be properly aligned to the section size - this is
> > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > +	 */
> >  	sparsemap_buf =
> > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> >  						addr,
> >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >  	sparsemap_buf_end = sparsemap_buf + size;
>
> Vincent, could you give this a try please? It would be even better if
> you could add some debugging to measure the overhead. Let me know if you
> need any help with a debugging patch.

I've tested this patch and it works on my platform:  The allocations
from sparse_buffer_alloc() now succeed and the fallback path is not
taken.

I'm not sure what kind of overhead you're interested in.  The memory
overhead of the initial allocations (as measured via the "Memory: XX/YY
available" prints and MemTotal), is identical with and without this
patch.  I don't see any difference in the time taken for the
initializations either.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-11-04 15:51                 ` Vincent Whitchurch
@ 2019-11-05  8:43                   ` Michal Hocko
  2019-11-15 23:55                     ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-11-05  8:43 UTC (permalink / raw)
  To: Vincent Whitchurch
  Cc: Pavel Tatashin, akpm, osalvador, linux-mm, linux-kernel

On Mon 04-11-19 16:51:26, Vincent Whitchurch wrote:
> On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> > On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> > [...]
> > > What about this? It still aligns to the size but that should be
> > > correctly done to the section size level.
> > > 
> > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > --- a/mm/sparse.c
> > > +++ b/mm/sparse.c
> > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > >  	if (map)
> > >  		return map;
> > >  
> > > -	map = memblock_alloc_try_nid(size,
> > > -					  PAGE_SIZE, addr,
> > > +	map = memblock_alloc_try_nid(size, size, addr,
> > >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >  	if (!map)
> > >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > >  {
> > >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > > +	/*
> > > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > +	 * and we want it to be properly aligned to the section size - this is
> > > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > > +	 */
> > >  	sparsemap_buf =
> > > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> > >  						addr,
> > >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > >  	sparsemap_buf_end = sparsemap_buf + size;
> >
> > Vincent, could you give this a try please? It would be even better if
> > you could add some debugging to measure the overhead. Let me know if you
> > need any help with a debugging patch.
> 
> I've tested this patch and it works on my platform:  The allocations
> from sparse_buffer_alloc() now succeed and the fallback path is not
> taken.

Thanks a lot. I will try to prepare the full patch with a proper
changelog sometimes this week.

> I'm not sure what kind of overhead you're interested in.

The memory overhead when the sparsemap_buf PAGE_SIZE alignment is in
place. In other words (ptr - sparsemap_buf) in sparse_buffer_alloc. If
the sparsemap_buf was properly aligned then the diff should be 0.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-11-05  8:43                   ` Michal Hocko
@ 2019-11-15 23:55                     ` Andrew Morton
  2019-11-18 11:28                       ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2019-11-15 23:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vincent Whitchurch, Pavel Tatashin, osalvador, linux-mm, linux-kernel

On Tue, 5 Nov 2019 09:43:52 +0100 Michal Hocko <mhocko@kernel.org> wrote:

> On Mon 04-11-19 16:51:26, Vincent Whitchurch wrote:
> > On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> > > On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> > > [...]
> > > > What about this? It still aligns to the size but that should be
> > > > correctly done to the section size level.
> > > > 
> > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > > --- a/mm/sparse.c
> > > > +++ b/mm/sparse.c
> > > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > > >  	if (map)
> > > >  		return map;
> > > >  
> > > > -	map = memblock_alloc_try_nid(size,
> > > > -					  PAGE_SIZE, addr,
> > > > +	map = memblock_alloc_try_nid(size, size, addr,
> > > >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > >  	if (!map)
> > > >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > > >  {
> > > >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > > >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > > > +	/*
> > > > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > > +	 * and we want it to be properly aligned to the section size - this is
> > > > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > > > +	 */
> > > >  	sparsemap_buf =
> > > > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> > > >  						addr,
> > > >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > >  	sparsemap_buf_end = sparsemap_buf + size;
> > >
> > > Vincent, could you give this a try please? It would be even better if
> > > you could add some debugging to measure the overhead. Let me know if you
> > > need any help with a debugging patch.
> > 
> > I've tested this patch and it works on my platform:  The allocations
> > from sparse_buffer_alloc() now succeed and the fallback path is not
> > taken.
> 
> Thanks a lot. I will try to prepare the full patch with a proper
> changelog sometimes this week.
> 

We're late in -rc7.  Should we run with Vincent's original for now?

And I'm wondering why this is -stable -material?  You said

: Anyway the patch is OK.  Even though this is not a bug strictly
: speaking it is certainly a suboptimal behavior because zeroying takes
: time so I would flag this for a stable tree 4.19+.  There is no clear
: Fixes tag to apply (35fd1eb1e8212 would get closest I guess).

I'm not seeing any description of any runtime effect of the bug at
present.  When would unzeroed sparsemem pageframes cause a problem? 
Could they be visible during deferred initialization or mem hotadd?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/sparse: Consistently do not zero memmap
  2019-11-15 23:55                     ` Andrew Morton
@ 2019-11-18 11:28                       ` Michal Hocko
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2019-11-18 11:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vincent Whitchurch, Pavel Tatashin, osalvador, linux-mm, linux-kernel

On Fri 15-11-19 15:55:35, Andrew Morton wrote:
> On Tue, 5 Nov 2019 09:43:52 +0100 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Mon 04-11-19 16:51:26, Vincent Whitchurch wrote:
> > > On Thu, Oct 31, 2019 at 08:25:55AM +0100, Michal Hocko wrote:
> > > > On Wed 30-10-19 18:31:23, Michal Hocko wrote:
> > > > [...]
> > > > > What about this? It still aligns to the size but that should be
> > > > > correctly done to the section size level.
> > > > > 
> > > > > diff --git a/mm/sparse.c b/mm/sparse.c
> > > > > index 72f010d9bff5..ab1e6175ac9a 100644
> > > > > --- a/mm/sparse.c
> > > > > +++ b/mm/sparse.c
> > > > > @@ -456,8 +456,7 @@ struct page __init *__populate_section_memmap(unsigned long pfn,
> > > > >  	if (map)
> > > > >  		return map;
> > > > >  
> > > > > -	map = memblock_alloc_try_nid(size,
> > > > > -					  PAGE_SIZE, addr,
> > > > > +	map = memblock_alloc_try_nid(size, size, addr,
> > > > >  					  MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > > >  	if (!map)
> > > > >  		panic("%s: Failed to allocate %lu bytes align=0x%lx nid=%d from=%pa\n",
> > > > > @@ -474,8 +473,13 @@ static void __init sparse_buffer_init(unsigned long size, int nid)
> > > > >  {
> > > > >  	phys_addr_t addr = __pa(MAX_DMA_ADDRESS);
> > > > >  	WARN_ON(sparsemap_buf);	/* forgot to call sparse_buffer_fini()? */
> > > > > +	/*
> > > > > +	 * Pre-allocated buffer is mainly used by __populate_section_memmap
> > > > > +	 * and we want it to be properly aligned to the section size - this is
> > > > > +	 * especially the case for VMEMMAP which maps memmap to PMDs
> > > > > +	 */
> > > > >  	sparsemap_buf =
> > > > > -		memblock_alloc_try_nid_raw(size, PAGE_SIZE,
> > > > > +		memblock_alloc_try_nid_raw(size, section_map_size(),
> > > > >  						addr,
> > > > >  						MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> > > > >  	sparsemap_buf_end = sparsemap_buf + size;
> > > >
> > > > Vincent, could you give this a try please? It would be even better if
> > > > you could add some debugging to measure the overhead. Let me know if you
> > > > need any help with a debugging patch.
> > > 
> > > I've tested this patch and it works on my platform:  The allocations
> > > from sparse_buffer_alloc() now succeed and the fallback path is not
> > > taken.
> > 
> > Thanks a lot. I will try to prepare the full patch with a proper
> > changelog sometimes this week.
> > 
> 
> We're late in -rc7.  Should we run with Vincent's original for now?

Yes, that patch is correct on its own. I have still the follow up clean
up on my todo list. I will get to this hopefully soon.

> And I'm wondering why this is -stable -material?  You said
> 
> : Anyway the patch is OK.  Even though this is not a bug strictly
> : speaking it is certainly a suboptimal behavior because zeroying takes
> : time so I would flag this for a stable tree 4.19+.  There is no clear
> : Fixes tag to apply (35fd1eb1e8212 would get closest I guess).
> 
> I'm not seeing any description of any runtime effect of the bug at
> present.  When would unzeroed sparsemem pageframes cause a problem? 
> Could they be visible during deferred initialization or mem hotadd?

The main user visible problem is a memory wastage. The overal amount of
memory should be small. I wouldn't call it a stable material.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-11-18 11:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30 13:11 [PATCH] mm/sparse: Consistently do not zero memmap Vincent Whitchurch
2019-10-30 13:29 ` Michal Hocko
2019-10-30 14:02   ` Vincent Whitchurch
2019-10-30 14:12     ` Michal Hocko
2019-10-30 15:20       ` Pavel Tatashin
2019-10-30 15:31         ` Michal Hocko
2019-10-30 16:53           ` Pavel Tatashin
2019-10-30 17:31             ` Michal Hocko
2019-10-30 17:53               ` Pavel Tatashin
2019-10-30 18:01                 ` Michal Hocko
2019-10-30 18:23                   ` Pavel Tatashin
2019-10-31  7:25               ` Michal Hocko
2019-11-04 15:51                 ` Vincent Whitchurch
2019-11-05  8:43                   ` Michal Hocko
2019-11-15 23:55                     ` Andrew Morton
2019-11-18 11:28                       ` Michal Hocko
2019-10-30 13:31 ` David Hildenbrand
2019-10-30 13:38 ` Oscar Salvador
2019-10-30 15:21 ` Pavel Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).