All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Peter Zijlstra <peterz@infradead.org>,
	John Ogness <john.ogness@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] fs/namespace: Boost the mount_lock.lock owner instead of spinning on PREEMPT_RT.
Date: Tue, 26 Oct 2021 23:06:21 +0200	[thread overview]
Message-ID: <20211026210621.yeg56reluq2cqrqs@wittgenstein> (raw)
In-Reply-To: <20211025152218.opvcqfku2lhqvp4o@linutronix.de>

On Mon, Oct 25, 2021 at 05:22:18PM +0200, Sebastian Andrzej Siewior wrote:
> The MNT_WRITE_HOLD flag is used to hold back any new writers while the
> mount point is about to be made read-only. __mnt_want_write() then loops
> with disabled preemption until this flag disappears. Callers of
> mnt_hold_writers() (which sets the flag) hold the spinlock_t of
> mount_lock (seqlock_t) which disables preemption on !PREEMPT_RT and
> ensures the task is not scheduled away so that the spinning side spins
> for a long time.
> 
> On PREEMPT_RT the spinlock_t does not disable preemption and so it is
> possible that the task setting MNT_WRITE_HOLD is preempted by task with
> higher priority which then spins infinitely waiting for MNT_WRITE_HOLD
> to get removed.
> 
> Acquire mount_lock::lock which is held by setter of MNT_WRITE_HOLD. This
> will PI-boost the owner and wait until the lock is dropped and which
> means that MNT_WRITE_HOLD is cleared again.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---

I'm not an expert in real-time locking but this solution is way less
intrusive and easier to explain and understand than the first version.
Based on how I understand priority inheritance this solution seems good.

The scenario that we seem to mostly worry is:
Let's assume a low-priority task A is holding lock_mount_hash() and
writes MNT_WRITE_HOLD to mnt->mnt_flags. Another high-priority task B is
spinning waiting for MNT_WRITE_HOLD to be cleared.
On rt kernels task A could be scheduled away causing the high-priority
task B to spin for a long time. However, if we force the high-priority
task B to grab lock_mount_hash() we thereby priority boost low-priorty
task A causing it to relinquish the lock quickly preventing task B from
spinning for a long time.

Under the assumption I got this right:
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

>  fs/namespace.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 659a8f39c61af..3ab45b47b2860 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -343,8 +343,24 @@ int __mnt_want_write(struct vfsmount *m)
>  	 * incremented count after it has set MNT_WRITE_HOLD.
>  	 */
>  	smp_mb();
> -	while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD)
> -		cpu_relax();
> +	might_lock(&mount_lock.lock);
> +	while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
> +		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +			cpu_relax();

IS_ENABLED will have the same effect as using ifdef, i.e. compiling the
irrelevant branch out, I hope.

> +		} else {
> +			/*
> +			 * This prevents priority inversion, if the task
> +			 * setting MNT_WRITE_HOLD got preempted on a remote
> +			 * CPU, and it prevents life lock if the task setting
> +			 * MNT_WRITE_HOLD has a lower priority and is bound to
> +			 * the same CPU as the task that is spinning here.
> +			 */
> +			preempt_enable();
> +			lock_mount_hash();
> +			unlock_mount_hash();
> +			preempt_disable();
> +		}
> +	}

  reply	other threads:[~2021-10-26 21:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 22:01 [RFC PATCH] fs/namespace: use percpu_rw_semaphore for writer holding Sebastian Andrzej Siewior
2021-10-25  9:15 ` Christian Brauner
2021-10-25  9:20   ` Sebastian Andrzej Siewior
2021-10-25 15:22   ` [PATCH] fs/namespace: Boost the mount_lock.lock owner instead of spinning on PREEMPT_RT Sebastian Andrzej Siewior
2021-10-26 21:06     ` Christian Brauner [this message]
2021-10-27  6:42       ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211026210621.yeg56reluq2cqrqs@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=bigeasy@linutronix.de \
    --cc=john.ogness@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.