linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Hugh Dickins <hughd@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Wed, 4 Aug 2021 15:23:06 +0100	[thread overview]
Message-ID: <20210804142306.GE6464@techsingularity.net> (raw)
In-Reply-To: <91b2f893-eb6a-d91d-3769-baba8601b0f6@suse.cz>

On Wed, Aug 04, 2021 at 03:42:25PM +0200, Vlastimil Babka wrote:
> > Each counter
> > would need to be audited and two question are asked
> > 
> >  o If this counter is inaccurate, does anything break?
> >  o If this counter is inaccurate, does it both increment and decrement
> >    allowing the possibility it goes negative?
> > 
> > The decision on that is looking at the counter and seeing if any
> > functional decision is made based on its value. So two examples;
> > 
> > 	NR_VMSCAN_IMMEDIATE is a node-based counter that only every
> > 	increments and is reported to userspace. No kernel code makes
> > 	any decision based on its value. Therefore it's likely safe to
> > 	move to numa_stat_item instead.
> > 
> > 	Action: move it
> > 
> > 	WORKINGSET_ACTIVATE_FILE is a node-based counter that is used to
> > 	determine if a mem cgroup is potentially congested by looking at
> > 	the ratio of cgroup to node refault rates as well as deciding if
> > 	LRU file pages should be deactivate.  If that value drifts, the
> > 	ratios are miscalculated and could lead to functional oddities
> > 	and therefore must be accurate.
> > 
> > 	Action: leave it alone
> > 
> > I guess it could be further split into state that must be accurate from
> > IRQ and non-IRQ contexts but that probably would be very fragile and
> > offer limited value.
> > 
> >> Brilliant stuff which prevents you to do any validation on this. Over
> >> the years there have been several issues where callers had to be fixed
> >> by analysing bug reports instead of having a proper instrumentation in
> >> that code which would have told the developer that he got it wrong.
> >> 
> > 
> > I'm not sure it could be validated at build-time but I'm just back from
> > holiday and may be lacking imagination.
> 
> The idea was not build-time, but runtime (hidden behind lockdep, VM_DEBUG or
> whatnot), i.e.:
> 
> <sched_expert> what that code needs is switch(item) { case foo1: case foo2:
> lockdep_assert_irqs_disabled(); break; case bar1: case bar2:
> lockdep_assert_preempt_disabled(); lockdep_assert_no_in_irq(); break; } or
> something along those lines
> 

Ok, that would potentially work. It may not even need to split the stats
into different enums. Simply document which stats need protection from
IRQ or preemption and use PROVE_LOCKING to check if preemption or IRQs
are disabled depending on the kernel config. I don't think it gets rid
of preempt_disable_rt unless the API was completely reworked with entry
points that describe the locking requirements. That would be tricky
because the requirements differ between kernel configurations.

This would be independent of moving stats that do not need to be 100%
accurate to the inaccurate-but-fast API.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2021-08-04 14:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 10:00 [PATCH 0/2] Protect vmstats on PREEMPT_RT Mel Gorman
2021-07-23 10:00 ` [PATCH 1/2] preempt: Provide preempt_*_(no)rt variants Mel Gorman
2021-07-23 10:00 ` [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT Mel Gorman
2021-08-03 23:54   ` Thomas Gleixner
2021-08-04  9:54     ` Mel Gorman
2021-08-04 13:42       ` Vlastimil Babka
2021-08-04 14:23         ` Mel Gorman [this message]
2021-08-05 12:56           ` Thomas Gleixner
2021-08-05 14:04             ` Mel Gorman
2021-08-05 15:42               ` Thomas Gleixner
2021-08-05 18:47               ` Daniel Vacek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210804142306.GE6464@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --subject='Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).