All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Hugh Dickins <hughd@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Wed, 4 Aug 2021 15:42:25 +0200	[thread overview]
Message-ID: <91b2f893-eb6a-d91d-3769-baba8601b0f6@suse.cz> (raw)
In-Reply-To: <20210804095425.GA6464@techsingularity.net>

On 8/4/21 11:54 AM, Mel Gorman wrote:
> On Wed, Aug 04, 2021 at 01:54:47AM +0200, Thomas Gleixner wrote:
>> 
>>  <tglx> so in vmstat.c there is this magic comment:
>>  <tglx>  * For use when we know that interrupts are disabled
>>  <tglx>  * or when we know that preemption is disabled and that
>>  <tglx>  * particular counter cannot be updated from interrupt context.
>>  <tglx> how can I know which counters need what?
>>  <mm_expert> I don't think there's a list, one would have to check on counter to counter basis :/ 
>>  <tglx> and of course there is nothing which validates that, right?
>>  <mm_expert> exactly
>> 
> 
> While I'm not "mm_expert", I agree with his/her statements.

Phew, since you do, I can now disclose it was me.

> Each counter
> would need to be audited and two question are asked
> 
>  o If this counter is inaccurate, does anything break?
>  o If this counter is inaccurate, does it both increment and decrement
>    allowing the possibility it goes negative?
> 
> The decision on that is looking at the counter and seeing if any
> functional decision is made based on its value. So two examples;
> 
> 	NR_VMSCAN_IMMEDIATE is a node-based counter that only every
> 	increments and is reported to userspace. No kernel code makes
> 	any decision based on its value. Therefore it's likely safe to
> 	move to numa_stat_item instead.
> 
> 	Action: move it
> 
> 	WORKINGSET_ACTIVATE_FILE is a node-based counter that is used to
> 	determine if a mem cgroup is potentially congested by looking at
> 	the ratio of cgroup to node refault rates as well as deciding if
> 	LRU file pages should be deactivate.  If that value drifts, the
> 	ratios are miscalculated and could lead to functional oddities
> 	and therefore must be accurate.
> 
> 	Action: leave it alone
> 
> I guess it could be further split into state that must be accurate from
> IRQ and non-IRQ contexts but that probably would be very fragile and
> offer limited value.
> 
>> Brilliant stuff which prevents you to do any validation on this. Over
>> the years there have been several issues where callers had to be fixed
>> by analysing bug reports instead of having a proper instrumentation in
>> that code which would have told the developer that he got it wrong.
>> 
> 
> I'm not sure it could be validated at build-time but I'm just back from
> holiday and may be lacking imagination.

The idea was not build-time, but runtime (hidden behind lockdep, VM_DEBUG or
whatnot), i.e.:

<sched_expert> what that code needs is switch(item) { case foo1: case foo2:
lockdep_assert_irqs_disabled(); break; case bar1: case bar2:
lockdep_assert_preempt_disabled(); lockdep_assert_no_in_irq(); break; } or
something along those lines

>> Of course on RT kernels the preempt_disable_rt() will serialize
>> everything correctly, but as we have learned over the years just
>> slapping _if_rt() or if_not_rt() variants of things around is most of
>> the time papering over the underlying problem of badly defined
>> protection scopes. Let's not proliferate that. As I said in the above
>> IRC conversation:
>> 
>>  <tglx> I fundamentally hate this preempt_disable_rt() muck
>> 
> 
> The issue is that even if this was properly audited and the inaccurate
> and accurate counters were in the proper enums using the correct APIs, it
> would still be necessary to protect the accurate counters from updates from
> IRQ context. Hence, as I write this, I don't think preempt_[dis|en]able_rt
> would go away and that is why I didn't continue with the series to break
> out "accurate" stats
> 


  reply	other threads:[~2021-08-04 13:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 10:00 [PATCH 0/2] Protect vmstats on PREEMPT_RT Mel Gorman
2021-07-23 10:00 ` [PATCH 1/2] preempt: Provide preempt_*_(no)rt variants Mel Gorman
2021-07-23 10:00 ` [PATCH 2/2] mm/vmstat: Protect per cpu variables with preempt disable on RT Mel Gorman
2021-08-03 23:54   ` Thomas Gleixner
2021-08-03 23:54     ` Thomas Gleixner
2021-08-04  9:54     ` Mel Gorman
2021-08-04 13:42       ` Vlastimil Babka [this message]
2021-08-04 14:23         ` Mel Gorman
2021-08-05 12:56           ` Thomas Gleixner
2021-08-05 14:04             ` Mel Gorman
2021-08-05 15:42               ` Thomas Gleixner
2021-08-05 18:47               ` Daniel Vacek
2021-08-05 18:47                 ` Daniel Vacek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91b2f893-eb6a-d91d-3769-baba8601b0f6@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.