All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Matthew Wilcox <willy@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 0/4 v2] fs/dcache: Resolve the last RT woes.
Date: Wed, 27 Jul 2022 13:49:00 +0200	[thread overview]
Message-ID: <20220727114904.130761-1-bigeasy@linutronix.de> (raw)

This is v2 of the series, v1 is available at
   https://https://lkml.kernel.org/r/.kernel.org/all/20220613140712.77932-1-bigeasy@linutronix.de/

v1…v2:
   - Make patch around Al's description of a bug in d_add_ci(). I took
     the liberty to make him Author and added his signed-off-by since I
     sinmply added a patch-body around his words.

   - The reasoning of why delaying the wakeup is reasonable has been
     replaced with Al's analysis of the code.

   - The split of wake up has been done differently (and I hope this is
     what Al meant). First the wake up has been pushed to the caller and
     then delayed to end_dir_add() after preemption has been enabled.

   - There is still __d_lookup_unhash(), __d_lookup_unhash_wake() and
     __d_lookup_done() is removed.
     __d_lookup_done() is removed because it is exported and the return
     value changes which will affect OOT users which are not aware of
     it.
     There is still d_lookup_done() which invokes
     __d_lookup_unhash_wake(). This can't remain in the header file due
     to cyclic depencies which in turn can't resolve wake_up_all()
     within the inline function.

The original cover letter:

PREEMPT_RT has two issues with the dcache code:

   1) i_dir_seq is a special cased sequence counter which uses the lowest
      bit as writer lock. This requires that preemption is disabled.

      On !RT kernels preemption is implictly disabled by spin_lock(), but
      that's not the case on RT kernels.

      Replacing i_dir_seq on RT with a seqlock_t comes with its own
      problems due to arbitrary lock nesting. Using a seqcount with an
      associated lock is not possible either because the locks held by the
      callers are not necessarily related.

      Explicitly disabling preemption on RT kernels across the i_dir_seq
      write held critical section is the obvious and simplest solution. The
      critical section is small, so latency is not really a concern.

   2) The wake up of dentry::d_wait waiters is in a preemption disabled
      section, which violates the RT constraints as wake_up_all() has
      to acquire the wait queue head lock which is a 'sleeping' spinlock
      on RT.

      There are two reasons for the non-preemtible section:

         A) The wake up happens under the hash list bit lock

         B) The wake up happens inside the i_dir_seq write side
            critical section

       #A is solvable by moving it outside of the hash list bit lock held                                                                                                                      
       section.

       Making #B preemptible on RT is hard or even impossible due to lock
       nesting constraints.

       A possible solution would be to replace the waitqueue by a simple
       waitqueue which can be woken up inside atomic sections on RT.

       But aside of Linus not being a fan of simple waitqueues, there is
       another observation vs. this wake up. It's likely for the woken up
       waiter to immediately contend on dentry::lock.

       It turns out that there is no reason to do the wake up within the
       i_dir_seq write held region. The only requirement is to do the wake
       up within the dentry::lock held region. Further details in the
       individual patches.

       That allows to move the wake up out of the non-preemptible section
       on RT, which also reduces the dentry::lock held time after wake up.

Thanks,

Sebastian



             reply	other threads:[~2022-07-27 11:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-27 11:49 Sebastian Andrzej Siewior [this message]
2022-07-27 11:49 ` [PATCH 1/4 v2] fs/dcache: d_add_ci() needs to complete parallel lookup Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 2/4 v2] fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 3/4 v2] fs/dcache: Move the wakeup from __d_lookup_done() to the caller Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 4/4 v2] fs/dcache: Move wakeup out of i_seq_dir write held region Sebastian Andrzej Siewior
2022-07-30  4:41 ` [PATCH 0/4 v2] fs/dcache: Resolve the last RT woes Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220727114904.130761-1-bigeasy@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.