All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-mm@kvack.org, Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2] mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Wed, 21 Aug 2019 11:21:58 -0400	[thread overview]
Message-ID: <20190821152158.GA12901@cmpxchg.org> (raw)
In-Reply-To: <20190821112116.d4lejm6nai7uavcy@linutronix.de>

On Wed, Aug 21, 2019 at 01:21:16PM +0200, Sebastian Andrzej Siewior wrote:
> sorry, I somehow forgot about this…
> 
> On 2019-02-13 09:56:56 [-0500], Johannes Weiner wrote:
> > On Wed, Feb 13, 2019 at 10:27:54AM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2019-02-11 16:02:08 [-0500], Johannes Weiner wrote:
> > > > > how do you define safe? I've been looking for dependencies of
> > > > > __mod_lruvec_state() but found only that the lock is held during the RMW
> > > > > operation with WORKINGSET_NODES idx.
> > > > 
> > > > These stat functions are not allowed to nest, and the executing thread
> > > > cannot migrate to another CPU during the operation, otherwise they
> > > > corrupt the state they're modifying.
> > > 
> > > If everyone is taking the same lock (like i_pages.xa_lock) then there
> > > will not be two instances updating the same stat. The owner of the
> > > (sleeping)-spinlock will not be migrated to another CPU.
> > 
> > This might be true for this particular stat item, but they are general
> > VM statistics. They're assuredly not all taking the xa_lock.
> 
> This one in particular does and my guess is that the interrupts are
> disabled here because of xa_lock. So the question is why should the
> interrupts be disabled? Is this due to the lock that should have been
> acquired (and as such disable interrupts) _or_ because of the
> *_lruvec_slab_state() operation.
> 
> > > > They are called from interrupt handlers, such as when NR_WRITEBACK is
> > > > decreased. Thus workingset_node_update() must exclude preemption from
> > > > irq handlers on the local CPU.
> > > 
> > > Do you have an example for a code path to check NR_WRITEBACK?
> > 
> > end_page_writeback()
> >  test_clear_page_writeback()
> >    dec_lruvec_state(lruvec, NR_WRITEBACK)
> 
> So with a warning in dec_lruvec_state() I found only a call path from
> softirq (like scsi_io_completion() / bio_endio()). Having lockdep
> annotation instead "just" preempt_disable() would have helped :)
> 
> > > > They rely on IRQ-disabling to also disable CPU migration.
> > > The spinlock disables CPU migration. 
> > > 
> > > > > >                                            I'm guessing it's because
> > > > > > preemption is disabled and irq handlers are punted to process context.
> > > > > preemption is enabled and IRQ are processed in forced-threaded mode.
> > > > 
> > > > That doesn't sound safe.
> > > 
> > > Do you have test-case or something I could throw at it and verify that
> > > this still works? So far nothing complains…
> > 
> > It's not easy to get the timing right on purpose, but we've seen in
> > production what happens when you don't protect these counter updates
> > from interrupts. See c3cc39118c36 ("mm: memcontrol: fix NR_WRITEBACK
> > leak in memcg and system stats").
> 
> Based on the looking code I'm looking at, it looks fine. Should I just
> resubmit the patch?

No, NAK to this patch and others like it for the mm code.

The serialization scheme for the vmstats facilty is that stats can be
modified from interrupt context, and so they rely on interrupts being
disabled. This check is correct.

If you want to comprehensively change the scheme, you're of course
welcome to propose that, and I won't be in your way. But that includes
review and update of *all* participants, from the mutation points that
disable irqs (mod_zone_page_state() and friends) to the execution
context of all callstacks, including the full block layer.

What we are NOT doing is eliminating checks that correctly verify the
current locking scheme. We've seen race conditions in this code that
took millions of machine hours to trigger when the rules were broken,
so we rely on explicit checks during code development. It's also not
surprising that they're the only thing that triggers in your testing.

Making this work correctly for RT needs a more thoughtful approach.


  reply	other threads:[~2019-08-21 15:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11  9:57 [PATCH] mm: workingset: replace IRQ-off check with a lockdep assert Sebastian Andrzej Siewior
2019-02-11 11:38 ` [PATCH v2] " Sebastian Andrzej Siewior
2019-02-11 18:53   ` Johannes Weiner
2019-02-11 19:13     ` Sebastian Andrzej Siewior
2019-02-11 19:17       ` Matthew Wilcox
2019-02-11 19:41         ` Sebastian Andrzej Siewior
2019-02-11 21:02       ` Johannes Weiner
2019-02-13  9:27         ` Sebastian Andrzej Siewior
2019-02-13 14:56           ` Johannes Weiner
2019-08-21 11:21             ` Sebastian Andrzej Siewior
2019-08-21 15:21               ` Johannes Weiner [this message]
2019-02-11 17:07 ` [PATCH] " kbuild test robot
2019-02-11 17:37 ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190821152158.GA12901@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.