From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 0/4 v2] fs/dcache: Resolve the last RT woes.
Date: Wed, 27 Jul 2022 13:49:00 +0200 [thread overview]
Message-ID: <20220727114904.130761-1-bigeasy@linutronix.de> (raw)
This is v2 of the series, v1 is available at
https://https://lkml.kernel.org/r/.kernel.org/all/20220613140712.77932-1-bigeasy@linutronix.de/
v1…v2:
- Make patch around Al's description of a bug in d_add_ci(). I took
the liberty to make him Author and added his signed-off-by since I
sinmply added a patch-body around his words.
- The reasoning of why delaying the wakeup is reasonable has been
replaced with Al's analysis of the code.
- The split of wake up has been done differently (and I hope this is
what Al meant). First the wake up has been pushed to the caller and
then delayed to end_dir_add() after preemption has been enabled.
- There is still __d_lookup_unhash(), __d_lookup_unhash_wake() and
__d_lookup_done() is removed.
__d_lookup_done() is removed because it is exported and the return
value changes which will affect OOT users which are not aware of
it.
There is still d_lookup_done() which invokes
__d_lookup_unhash_wake(). This can't remain in the header file due
to cyclic depencies which in turn can't resolve wake_up_all()
within the inline function.
The original cover letter:
PREEMPT_RT has two issues with the dcache code:
1) i_dir_seq is a special cased sequence counter which uses the lowest
bit as writer lock. This requires that preemption is disabled.
On !RT kernels preemption is implictly disabled by spin_lock(), but
that's not the case on RT kernels.
Replacing i_dir_seq on RT with a seqlock_t comes with its own
problems due to arbitrary lock nesting. Using a seqcount with an
associated lock is not possible either because the locks held by the
callers are not necessarily related.
Explicitly disabling preemption on RT kernels across the i_dir_seq
write held critical section is the obvious and simplest solution. The
critical section is small, so latency is not really a concern.
2) The wake up of dentry::d_wait waiters is in a preemption disabled
section, which violates the RT constraints as wake_up_all() has
to acquire the wait queue head lock which is a 'sleeping' spinlock
on RT.
There are two reasons for the non-preemtible section:
A) The wake up happens under the hash list bit lock
B) The wake up happens inside the i_dir_seq write side
critical section
#A is solvable by moving it outside of the hash list bit lock held
section.
Making #B preemptible on RT is hard or even impossible due to lock
nesting constraints.
A possible solution would be to replace the waitqueue by a simple
waitqueue which can be woken up inside atomic sections on RT.
But aside of Linus not being a fan of simple waitqueues, there is
another observation vs. this wake up. It's likely for the woken up
waiter to immediately contend on dentry::lock.
It turns out that there is no reason to do the wake up within the
i_dir_seq write held region. The only requirement is to do the wake
up within the dentry::lock held region. Further details in the
individual patches.
That allows to move the wake up out of the non-preemptible section
on RT, which also reduces the dentry::lock held time after wake up.
Thanks,
Sebastian
next reply other threads:[~2022-07-27 11:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-27 11:49 Sebastian Andrzej Siewior [this message]
2022-07-27 11:49 ` [PATCH 1/4 v2] fs/dcache: d_add_ci() needs to complete parallel lookup Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 2/4 v2] fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 3/4 v2] fs/dcache: Move the wakeup from __d_lookup_done() to the caller Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 4/4 v2] fs/dcache: Move wakeup out of i_seq_dir write held region Sebastian Andrzej Siewior
2022-07-30 4:41 ` [PATCH 0/4 v2] fs/dcache: Resolve the last RT woes Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220727114904.130761-1-bigeasy@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.