linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] mm: don't miss the last page because of round-off error
@ 2018-08-17 23:18 Roman Gushchin
  2018-08-18  1:22 ` Matthew Wilcox
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2018-08-17 23:18 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, kernel-team, Roman Gushchin, Andrew Morton,
	Johannes Weiner, Michal Hocko, Tejun Heo, Rik van Riel,
	Konstantin Khlebnikov

I've noticed, that dying memory cgroups are  often pinned
in memory by a single pagecache page. Even under moderate
memory pressure they sometimes stayed in such state
for a long time. That looked strange.

My investigation showed that the problem is caused by
applying the LRU pressure balancing math:

  scan = div64_u64(scan * fraction[lru], denominator),

where

  denominator = fraction[anon] + fraction[file] + 1.

Because fraction[lru] is always less than denominator,
if the initial scan size is 1, the result is always 0.

This means the last page is not scanned and has
no chances to be reclaimed.

Fix this by skipping the balancing logic if the initial
scan count is 1.

In practice this change significantly improves the speed
of dying cgroups reclaim.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
---
 mm/vmscan.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f86f288..f85c5ec01886 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2287,9 +2287,12 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 			/*
 			 * Scan types proportional to swappiness and
 			 * their relative recent reclaim efficiency.
+			 * Make sure we don't miss the last page
+			 * because of a round-off error.
 			 */
-			scan = div64_u64(scan * fraction[file],
-					 denominator);
+			if (scan > 1)
+				scan = div64_u64(scan * fraction[file],
+						 denominator);
 			break;
 		case SCAN_FILE:
 		case SCAN_ANON:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm: don't miss the last page because of round-off error
  2018-08-17 23:18 [PATCH RFC] mm: don't miss the last page because of round-off error Roman Gushchin
@ 2018-08-18  1:22 ` Matthew Wilcox
  2018-08-20 17:19   ` Roman Gushchin
  2018-08-21  5:11   ` Konstantin Khlebnikov
  0 siblings, 2 replies; 8+ messages in thread
From: Matthew Wilcox @ 2018-08-18  1:22 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, linux-kernel, kernel-team, Andrew Morton,
	Johannes Weiner, Michal Hocko, Tejun Heo, Rik van Riel,
	Konstantin Khlebnikov

On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> -			scan = div64_u64(scan * fraction[file],
> -					 denominator);
> +			if (scan > 1)
> +				scan = div64_u64(scan * fraction[file],
> +						 denominator);

Wouldn't we be better off doing a div_round_up?  ie:

	scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);

although i'd rather hide that in a new macro in math64.h than opencode it
here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm: don't miss the last page because of round-off error
  2018-08-18  1:22 ` Matthew Wilcox
@ 2018-08-20 17:19   ` Roman Gushchin
  2018-08-21  5:11   ` Konstantin Khlebnikov
  1 sibling, 0 replies; 8+ messages in thread
From: Roman Gushchin @ 2018-08-20 17:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, kernel-team, Andrew Morton,
	Johannes Weiner, Michal Hocko, Tejun Heo, Rik van Riel,
	Konstantin Khlebnikov

On Fri, Aug 17, 2018 at 06:22:13PM -0700, Matthew Wilcox wrote:
> On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> > -			scan = div64_u64(scan * fraction[file],
> > -					 denominator);
> > +			if (scan > 1)
> > +				scan = div64_u64(scan * fraction[file],
> > +						 denominator);
> 
> Wouldn't we be better off doing a div_round_up?  ie:
> 
> 	scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> 
> although i'd rather hide that in a new macro in math64.h than opencode it
> here.

Good idea! Will do in v2.

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm: don't miss the last page because of round-off error
  2018-08-18  1:22 ` Matthew Wilcox
  2018-08-20 17:19   ` Roman Gushchin
@ 2018-08-21  5:11   ` Konstantin Khlebnikov
  2018-08-21 13:35     ` Matthew Wilcox
  2018-08-21 17:15     ` Johannes Weiner
  1 sibling, 2 replies; 8+ messages in thread
From: Konstantin Khlebnikov @ 2018-08-21  5:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Roman Gushchin, linux-mm, Linux Kernel Mailing List, kernel-team,
	Andrew Morton, Johannes Weiner, Michal Hocko, Tejun Heo,
	Rik van Riel

On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
>> -                     scan = div64_u64(scan * fraction[file],
>> -                                      denominator);
>> +                     if (scan > 1)
>> +                             scan = div64_u64(scan * fraction[file],
>> +                                              denominator);
>
> Wouldn't we be better off doing a div_round_up?  ie:
>
>         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
>
> although i'd rather hide that in a new macro in math64.h than opencode it
> here.

All numbers here should be up to nr_pages * 200 and fit into unsigned long.
I see no reason for u64. If they overflow then u64 wouldn't help either.

There is macro DIV_ROUND_UP in kernel.h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm: don't miss the last page because of round-off error
  2018-08-21  5:11   ` Konstantin Khlebnikov
@ 2018-08-21 13:35     ` Matthew Wilcox
  2018-08-21 17:15     ` Johannes Weiner
  1 sibling, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2018-08-21 13:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Roman Gushchin, linux-mm, Linux Kernel Mailing List, kernel-team,
	Andrew Morton, Johannes Weiner, Michal Hocko, Tejun Heo,
	Rik van Riel, Shaohua Li

On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> -                     scan = div64_u64(scan * fraction[file],
> >> -                                      denominator);
> >> +                     if (scan > 1)
> >> +                             scan = div64_u64(scan * fraction[file],
> >> +                                              denominator);
> >
> > Wouldn't we be better off doing a div_round_up?  ie:
> >
> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >
> > although i'd rather hide that in a new macro in math64.h than opencode it
> > here.
> 
> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> I see no reason for u64. If they overflow then u64 wouldn't help either.

Shaohua added the div64 usage initially, adding him to the cc.

> There is macro DIV_ROUND_UP in kernel.h

Indeed there is.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm: don't miss the last page because of round-off error
  2018-08-21  5:11   ` Konstantin Khlebnikov
  2018-08-21 13:35     ` Matthew Wilcox
@ 2018-08-21 17:15     ` Johannes Weiner
  2018-08-22  6:01       ` Konstantin Khlebnikov
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2018-08-21 17:15 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Matthew Wilcox, Roman Gushchin, linux-mm,
	Linux Kernel Mailing List, kernel-team, Andrew Morton,
	Michal Hocko, Tejun Heo, Rik van Riel

On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> -                     scan = div64_u64(scan * fraction[file],
> >> -                                      denominator);
> >> +                     if (scan > 1)
> >> +                             scan = div64_u64(scan * fraction[file],
> >> +                                              denominator);
> >
> > Wouldn't we be better off doing a div_round_up?  ie:
> >
> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >
> > although i'd rather hide that in a new macro in math64.h than opencode it
> > here.
> 
> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> I see no reason for u64. If they overflow then u64 wouldn't help either.

It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
to four times of what's on the LRUs. That can overflow a u32 with even
small amounts of memory.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm: don't miss the last page because of round-off error
  2018-08-21 17:15     ` Johannes Weiner
@ 2018-08-22  6:01       ` Konstantin Khlebnikov
  2018-08-22 17:50         ` Roman Gushchin
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Khlebnikov @ 2018-08-22  6:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Matthew Wilcox, Roman Gushchin, linux-mm,
	Linux Kernel Mailing List, kernel-team, Andrew Morton,
	Michal Hocko, Tejun Heo, Rik van Riel

On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
>> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
>> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
>> >> -                     scan = div64_u64(scan * fraction[file],
>> >> -                                      denominator);
>> >> +                     if (scan > 1)
>> >> +                             scan = div64_u64(scan * fraction[file],
>> >> +                                              denominator);
>> >
>> > Wouldn't we be better off doing a div_round_up?  ie:
>> >
>> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
>> >
>> > although i'd rather hide that in a new macro in math64.h than opencode it
>> > here.
>>
>> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
>> I see no reason for u64. If they overflow then u64 wouldn't help either.
>
> It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
> to four times of what's on the LRUs. That can overflow a u32 with even
> small amounts of memory.

Ah, this thing is inverted because it aims to proportional reactivation rate
rather than the proportional pressure to reclaimable pages.
That's not obvious. I suppose this should be in comment above it.

Well, at least denominator should fit into unsigned long. So full
64/64 division is redundant.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm: don't miss the last page because of round-off error
  2018-08-22  6:01       ` Konstantin Khlebnikov
@ 2018-08-22 17:50         ` Roman Gushchin
  0 siblings, 0 replies; 8+ messages in thread
From: Roman Gushchin @ 2018-08-22 17:50 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Johannes Weiner, Matthew Wilcox, linux-mm,
	Linux Kernel Mailing List, kernel-team, Andrew Morton,
	Michal Hocko, Tejun Heo, Rik van Riel

On Wed, Aug 22, 2018 at 09:01:19AM +0300, Konstantin Khlebnikov wrote:
> On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote:
> >> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote:
> >> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote:
> >> >> -                     scan = div64_u64(scan * fraction[file],
> >> >> -                                      denominator);
> >> >> +                     if (scan > 1)
> >> >> +                             scan = div64_u64(scan * fraction[file],
> >> >> +                                              denominator);
> >> >
> >> > Wouldn't we be better off doing a div_round_up?  ie:
> >> >
> >> >         scan = div64_u64(scan * fraction[file] + denominator - 1, denominator);
> >> >
> >> > although i'd rather hide that in a new macro in math64.h than opencode it
> >> > here.
> >>
> >> All numbers here should be up to nr_pages * 200 and fit into unsigned long.
> >> I see no reason for u64. If they overflow then u64 wouldn't help either.
> >
> > It is nr_pages * 200 * recent_scanned, where recent_scanned can be up
> > to four times of what's on the LRUs. That can overflow a u32 with even
> > small amounts of memory.
> 
> Ah, this thing is inverted because it aims to proportional reactivation rate
> rather than the proportional pressure to reclaimable pages.
> That's not obvious. I suppose this should be in comment above it.
> 
> Well, at least denominator should fit into unsigned long. So full
> 64/64 division is redundant.

In any case it's not related to the original issue and should be
treated separately. I'd like to keep the patch simple to make
backporting to stable easy.

All refactorings can be done separately, if necessarily.

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-22 17:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17 23:18 [PATCH RFC] mm: don't miss the last page because of round-off error Roman Gushchin
2018-08-18  1:22 ` Matthew Wilcox
2018-08-20 17:19   ` Roman Gushchin
2018-08-21  5:11   ` Konstantin Khlebnikov
2018-08-21 13:35     ` Matthew Wilcox
2018-08-21 17:15     ` Johannes Weiner
2018-08-22  6:01       ` Konstantin Khlebnikov
2018-08-22 17:50         ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).