linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Hugh Dickins <hughd@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Wed, 4 Aug 2021 15:42:25 +0200	[thread overview]
Message-ID: <91b2f893-eb6a-d91d-3769-baba8601b0f6@suse.cz> (raw)
In-Reply-To: <20210804095425.GA6464@techsingularity.net>

On 8/4/21 11:54 AM, Mel Gorman wrote:
> On Wed, Aug 04, 2021 at 01:54:47AM +0200, Thomas Gleixner wrote:
>> 
>>  <tglx> so in vmstat.c there is this magic comment:
>>  <tglx>  * For use when we know that interrupts are disabled
>>  <tglx>  * or when we know that preemption is disabled and that
>>  <tglx>  * particular counter cannot be updated from interrupt context.
>>  <tglx> how can I know which counters need what?
>>  <mm_expert> I don't think there's a list, one would have to check on counter to counter basis :/ 
>>  <tglx> and of course there is nothing which validates that, right?
>>  <mm_expert> exactly
>> 
> 
> While I'm not "mm_expert", I agree with his/her statements.

Phew, since you do, I can now disclose it was me.

> Each counter
> would need to be audited and two question are asked
> 
>  o If this counter is inaccurate, does anything break?
>  o If this counter is inaccurate, does it both increment and decrement
>    allowing the possibility it goes negative?
> 
> The decision on that is looking at the counter and seeing if any
> functional decision is made based on its value. So two examples;
> 
> 	NR_VMSCAN_IMMEDIATE is a node-based counter that only every
> 	increments and is reported to userspace. No kernel code makes
> 	any decision based on its value. Therefore it's likely safe to
> 	move to numa_stat_item instead.
> 
> 	Action: move it
> 
> 	WORKINGSET_ACTIVATE_FILE is a node-based counter that is used to
> 	determine if a mem cgroup is potentially congested by looking at
> 	the ratio of cgroup to node refault rates as well as deciding if
> 	LRU file pages should be deactivate.  If that value drifts, the
> 	ratios are miscalculated and could lead to functional oddities
> 	and therefore must be accurate.
> 
> 	Action: leave it alone
> 
> I guess it could be further split into state that must be accurate from
> IRQ and non-IRQ contexts but that probably would be very fragile and
> offer limited value.
> 
>> Brilliant stuff which prevents you to do any validation on this. Over
>> the years there have been several issues where callers had to be fixed
>> by analysing bug reports instead of having a proper instrumentation in
>> that code which would have told the developer that he got it wrong.
>> 
> 
> I'm not sure it could be validated at build-time but I'm just back from
> holiday and may be lacking imagination.

The idea was not build-time, but runtime (hidden behind lockdep, VM_DEBUG or
whatnot), i.e.:

<sched_expert> what that code needs is switch(item) { case foo1: case foo2:
lockdep_assert_irqs_disabled(); break; case bar1: case bar2:
lockdep_assert_preempt_disabled(); lockdep_assert_no_in_irq(); break; } or
something along those lines

>> Of course on RT kernels the preempt_disable_rt() will serialize
>> everything correctly, but as we have learned over the years just
>> slapping _if_rt() or if_not_rt() variants of things around is most of
>> the time papering over the underlying problem of badly defined
>> protection scopes. Let's not proliferate that. As I said in the above
>> IRC conversation:
>> 
>>  <tglx> I fundamentally hate this preempt_disable_rt() muck
>> 
> 
> The issue is that even if this was properly audited and the inaccurate
> and accurate counters were in the proper enums using the correct APIs, it
> would still be necessary to protect the accurate counters from updates from
> IRQ context. Hence, as I write this, I don't think preempt_[dis|en]able_rt
> would go away and that is why I didn't continue with the series to break
> out "accurate" stats
> 


  reply	other threads:[~2021-08-04 13:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 10:00 [PATCH 0/2] Protect vmstats on PREEMPT_RT Mel Gorman
2021-07-23 10:00 ` [PATCH 1/2] preempt: Provide preempt_*_(no)rt variants Mel Gorman
2021-07-23 10:00 ` [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT Mel Gorman
2021-08-03 23:54   ` Thomas Gleixner
2021-08-04  9:54     ` Mel Gorman
2021-08-04 13:42       ` Vlastimil Babka [this message]
2021-08-04 14:23         ` Mel Gorman
2021-08-05 12:56           ` Thomas Gleixner
2021-08-05 14:04             ` Mel Gorman
2021-08-05 15:42               ` Thomas Gleixner
2021-08-05 18:47               ` Daniel Vacek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91b2f893-eb6a-d91d-3769-baba8601b0f6@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --subject='Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).