linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: include CMA pages in lowmem_reserve at boot
@ 2020-08-14 16:49 Doug Berger
  2020-08-19  3:18 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Doug Berger @ 2020-08-14 16:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jason Baron, David Rientjes, Kirill A. Shutemov, linux-mm,
	linux-kernel, Doug Berger

The lowmem_reserve arrays provide a means of applying pressure
against allocations from lower zones that were targeted at
higher zones. Its values are a function of the number of pages
managed by higher zones and are assigned by a call to the
setup_per_zone_lowmem_reserve() function.

The function is initially called at boot time by the function
init_per_zone_wmark_min() and may be called later by accesses
of the /proc/sys/vm/lowmem_reserve_ratio sysctl file.

The function init_per_zone_wmark_min() was moved up from a
module_init to a core_initcall to resolve a sequencing issue
with khugepaged. Unfortunately this created a sequencing issue
with CMA page accounting.

The CMA pages are added to the managed page count of a zone
when cma_init_reserved_areas() is called at boot also as a
core_initcall. This makes it uncertain whether the CMA pages
will be added to the managed page counts of their zones before
or after the call to init_per_zone_wmark_min() as it becomes
dependent on link order. With the current link order the pages
are added to the managed count after the lowmem_reserve arrays
are initialized at boot.

This means the lowmem_reserve values at boot may be lower than
the values used later if /proc/sys/vm/lowmem_reserve_ratio is
accessed even if the ratio values are unchanged.

In many cases the difference is not significant, but for example
an ARM platform with 1GB of memory and the following memory layout
[    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
[    0.000000]   Normal   empty
[    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]

would result in 0 lowmem_reserve for the DMA zone. This would allow
userspace to deplete the DMA zone easily. Funnily enough
$ cat /proc/sys/vm/lowmem_reserve_ratio
would fix up the situation because it forces
setup_per_zone_lowmem_reserve as a side effect.

This commit breaks the link order dependency by invoking
init_per_zone_wmark_min() as a postcore_initcall so that the
CMA pages have the chance to be properly accounted in their
zone(s) and allowing the lowmem_reserve arrays to receive
consistent values.

Fixes: bc22af74f271 ("mm: update min_free_kbytes from khugepaged after core initialization")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b7d0ecf30b1..f3e340ec2b6b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7887,7 +7887,7 @@ int __meminit init_per_zone_wmark_min(void)
 
 	return 0;
 }
-core_initcall(init_per_zone_wmark_min)
+postcore_initcall(init_per_zone_wmark_min)
 
 /*
  * min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm: include CMA pages in lowmem_reserve at boot
  2020-08-14 16:49 [PATCH v2] mm: include CMA pages in lowmem_reserve at boot Doug Berger
@ 2020-08-19  3:18 ` Andrew Morton
  2020-08-19 17:15   ` Florian Fainelli
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2020-08-19  3:18 UTC (permalink / raw)
  To: Doug Berger
  Cc: Jason Baron, David Rientjes, Kirill A. Shutemov, linux-mm, linux-kernel

On Fri, 14 Aug 2020 09:49:26 -0700 Doug Berger <opendmb@gmail.com> wrote:

> The lowmem_reserve arrays provide a means of applying pressure
> against allocations from lower zones that were targeted at
> higher zones. Its values are a function of the number of pages
> managed by higher zones and are assigned by a call to the
> setup_per_zone_lowmem_reserve() function.
> 
> The function is initially called at boot time by the function
> init_per_zone_wmark_min() and may be called later by accesses
> of the /proc/sys/vm/lowmem_reserve_ratio sysctl file.
> 
> The function init_per_zone_wmark_min() was moved up from a
> module_init to a core_initcall to resolve a sequencing issue
> with khugepaged. Unfortunately this created a sequencing issue
> with CMA page accounting.
> 
> The CMA pages are added to the managed page count of a zone
> when cma_init_reserved_areas() is called at boot also as a
> core_initcall. This makes it uncertain whether the CMA pages
> will be added to the managed page counts of their zones before
> or after the call to init_per_zone_wmark_min() as it becomes
> dependent on link order. With the current link order the pages
> are added to the managed count after the lowmem_reserve arrays
> are initialized at boot.
> 
> This means the lowmem_reserve values at boot may be lower than
> the values used later if /proc/sys/vm/lowmem_reserve_ratio is
> accessed even if the ratio values are unchanged.
> 
> In many cases the difference is not significant, but for example
> an ARM platform with 1GB of memory and the following memory layout
> [    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
> [    0.000000]   Normal   empty
> [    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]
> 
> would result in 0 lowmem_reserve for the DMA zone. This would allow
> userspace to deplete the DMA zone easily.

Sounds fairly serious for thos machines.  Was a cc:stable considered?

> Funnily enough
> $ cat /proc/sys/vm/lowmem_reserve_ratio
> would fix up the situation because it forces
> setup_per_zone_lowmem_reserve as a side effect.
> 
> This commit breaks the link order dependency by invoking
> init_per_zone_wmark_min() as a postcore_initcall so that the
> CMA pages have the chance to be properly accounted in their
> zone(s) and allowing the lowmem_reserve arrays to receive
> consistent values.
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm: include CMA pages in lowmem_reserve at boot
  2020-08-19  3:18 ` Andrew Morton
@ 2020-08-19 17:15   ` Florian Fainelli
  2020-08-19 17:22     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2020-08-19 17:15 UTC (permalink / raw)
  To: Andrew Morton, Doug Berger
  Cc: Jason Baron, David Rientjes, Kirill A. Shutemov, linux-mm, linux-kernel

On 8/18/20 8:18 PM, Andrew Morton wrote:
> On Fri, 14 Aug 2020 09:49:26 -0700 Doug Berger <opendmb@gmail.com> wrote:
> 
>> The lowmem_reserve arrays provide a means of applying pressure
>> against allocations from lower zones that were targeted at
>> higher zones. Its values are a function of the number of pages
>> managed by higher zones and are assigned by a call to the
>> setup_per_zone_lowmem_reserve() function.
>>
>> The function is initially called at boot time by the function
>> init_per_zone_wmark_min() and may be called later by accesses
>> of the /proc/sys/vm/lowmem_reserve_ratio sysctl file.
>>
>> The function init_per_zone_wmark_min() was moved up from a
>> module_init to a core_initcall to resolve a sequencing issue
>> with khugepaged. Unfortunately this created a sequencing issue
>> with CMA page accounting.
>>
>> The CMA pages are added to the managed page count of a zone
>> when cma_init_reserved_areas() is called at boot also as a
>> core_initcall. This makes it uncertain whether the CMA pages
>> will be added to the managed page counts of their zones before
>> or after the call to init_per_zone_wmark_min() as it becomes
>> dependent on link order. With the current link order the pages
>> are added to the managed count after the lowmem_reserve arrays
>> are initialized at boot.
>>
>> This means the lowmem_reserve values at boot may be lower than
>> the values used later if /proc/sys/vm/lowmem_reserve_ratio is
>> accessed even if the ratio values are unchanged.
>>
>> In many cases the difference is not significant, but for example
>> an ARM platform with 1GB of memory and the following memory layout
>> [    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
>> [    0.000000]   Normal   empty
>> [    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]
>>
>> would result in 0 lowmem_reserve for the DMA zone. This would allow
>> userspace to deplete the DMA zone easily.
> 
> Sounds fairly serious for thos machines.  Was a cc:stable considered?

Since there is a Fixes: tag, it may have been assumed that the patch
would be picked up and as soon as it reaches Linus' tree it would be
picked up by the stable selection.
-- 
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm: include CMA pages in lowmem_reserve at boot
  2020-08-19 17:15   ` Florian Fainelli
@ 2020-08-19 17:22     ` Andrew Morton
  2020-08-19 17:30       ` Florian Fainelli
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2020-08-19 17:22 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Doug Berger, Jason Baron, David Rientjes, Kirill A. Shutemov,
	linux-mm, linux-kernel

On Wed, 19 Aug 2020 10:15:53 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote:

> >> In many cases the difference is not significant, but for example
> >> an ARM platform with 1GB of memory and the following memory layout
> >> [    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
> >> [    0.000000]   Normal   empty
> >> [    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]
> >>
> >> would result in 0 lowmem_reserve for the DMA zone. This would allow
> >> userspace to deplete the DMA zone easily.
> > 
> > Sounds fairly serious for thos machines.  Was a cc:stable considered?
> 
> Since there is a Fixes: tag, it may have been assumed that the patch
> would be picked up and as soon as it reaches Linus' tree it would be
> picked up by the stable selection.

It doesn't work that way - sometimes a fix isn't considered important
enough to backport.  It could just fix a typo in a comment!





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm: include CMA pages in lowmem_reserve at boot
  2020-08-19 17:22     ` Andrew Morton
@ 2020-08-19 17:30       ` Florian Fainelli
  2020-08-19 17:40         ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2020-08-19 17:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Doug Berger, Jason Baron, David Rientjes, Kirill A. Shutemov,
	linux-mm, linux-kernel

On 8/19/20 10:22 AM, Andrew Morton wrote:
> On Wed, 19 Aug 2020 10:15:53 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>>>> In many cases the difference is not significant, but for example
>>>> an ARM platform with 1GB of memory and the following memory layout
>>>> [    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
>>>> [    0.000000] Zone ranges:
>>>> [    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
>>>> [    0.000000]   Normal   empty
>>>> [    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]
>>>>
>>>> would result in 0 lowmem_reserve for the DMA zone. This would allow
>>>> userspace to deplete the DMA zone easily.
>>>
>>> Sounds fairly serious for thos machines.  Was a cc:stable considered?
>>
>> Since there is a Fixes: tag, it may have been assumed that the patch
>> would be picked up and as soon as it reaches Linus' tree it would be
>> picked up by the stable selection.
> 
> It doesn't work that way - sometimes a fix isn't considered important
> enough to backport.  It could just fix a typo in a comment!

Then can this be applied ASAP and back ported?
-- 
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm: include CMA pages in lowmem_reserve at boot
  2020-08-19 17:30       ` Florian Fainelli
@ 2020-08-19 17:40         ` Andrew Morton
  2020-08-20  0:32           ` Doug Berger
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2020-08-19 17:40 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Doug Berger, Jason Baron, David Rientjes, Kirill A. Shutemov,
	linux-mm, linux-kernel

On Wed, 19 Aug 2020 10:30:25 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote:

> On 8/19/20 10:22 AM, Andrew Morton wrote:
> > On Wed, 19 Aug 2020 10:15:53 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote:
> > 
> >>>> In many cases the difference is not significant, but for example
> >>>> an ARM platform with 1GB of memory and the following memory layout
> >>>> [    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
> >>>> [    0.000000] Zone ranges:
> >>>> [    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
> >>>> [    0.000000]   Normal   empty
> >>>> [    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]
> >>>>
> >>>> would result in 0 lowmem_reserve for the DMA zone. This would allow
> >>>> userspace to deplete the DMA zone easily.
> >>>
> >>> Sounds fairly serious for thos machines.  Was a cc:stable considered?
> >>
> >> Since there is a Fixes: tag, it may have been assumed that the patch
> >> would be picked up and as soon as it reaches Linus' tree it would be
> >> picked up by the stable selection.
> > 
> > It doesn't work that way - sometimes a fix isn't considered important
> > enough to backport.  It could just fix a typo in a comment!
> 
> Then can this be applied ASAP and back ported?

Sure.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm: include CMA pages in lowmem_reserve at boot
  2020-08-19 17:40         ` Andrew Morton
@ 2020-08-20  0:32           ` Doug Berger
  0 siblings, 0 replies; 7+ messages in thread
From: Doug Berger @ 2020-08-20  0:32 UTC (permalink / raw)
  To: Andrew Morton, Florian Fainelli
  Cc: Jason Baron, David Rientjes, Kirill A. Shutemov, linux-mm, linux-kernel

On 8/19/2020 10:40 AM, Andrew Morton wrote:
> On Wed, 19 Aug 2020 10:30:25 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> On 8/19/20 10:22 AM, Andrew Morton wrote:
>>> On Wed, 19 Aug 2020 10:15:53 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>
>>>>>> In many cases the difference is not significant, but for example
>>>>>> an ARM platform with 1GB of memory and the following memory layout
>>>>>> [    0.000000] cma: Reserved 256 MiB at 0x0000000030000000
>>>>>> [    0.000000] Zone ranges:
>>>>>> [    0.000000]   DMA      [mem 0x0000000000000000-0x000000002fffffff]
>>>>>> [    0.000000]   Normal   empty
>>>>>> [    0.000000]   HighMem  [mem 0x0000000030000000-0x000000003fffffff]
>>>>>>
>>>>>> would result in 0 lowmem_reserve for the DMA zone. This would allow
>>>>>> userspace to deplete the DMA zone easily.
>>>>>
>>>>> Sounds fairly serious for thos machines.  Was a cc:stable considered?
>>>>
>>>> Since there is a Fixes: tag, it may have been assumed that the patch
>>>> would be picked up and as soon as it reaches Linus' tree it would be
>>>> picked up by the stable selection.
>>>
>>> It doesn't work that way - sometimes a fix isn't considered important
>>> enough to backport.  It could just fix a typo in a comment!
>>
>> Then can this be applied ASAP and back ported?
> 
> Sure.
> 
As Florian guessed, I assumed the Fixes: tag was a sufficient clue since
I wouldn't normally apply a Fixes tag to a typo correction in a comment.
I suspect I have been spoiled by David Miller :).

Thanks for the quick turn-around and applying the cc:stable to the mmotm,
    Doug


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-08-20  0:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14 16:49 [PATCH v2] mm: include CMA pages in lowmem_reserve at boot Doug Berger
2020-08-19  3:18 ` Andrew Morton
2020-08-19 17:15   ` Florian Fainelli
2020-08-19 17:22     ` Andrew Morton
2020-08-19 17:30       ` Florian Fainelli
2020-08-19 17:40         ` Andrew Morton
2020-08-20  0:32           ` Doug Berger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).