All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Tomas Glozar <tglozar@redhat.com>
Subject: Re: [PATCH] sched/fair: Make the BW replenish timer expire in hardirq context for PREEMPT_RT
Date: Tue, 31 Oct 2023 17:01:20 +0100	[thread overview]
Message-ID: <20231031160120.GE15024@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20231030145104.4107573-1-vschneid@redhat.com>

On Mon, Oct 30, 2023 at 03:51:04PM +0100, Valentin Schneider wrote:
> Consider the following scenario under PREEMPT_RT:
> o A CFS task p0 gets throttled while holding read_lock(&lock)
> o A task p1 blocks on write_lock(&lock), making further readers enter the
>   slowpath
> o A ktimers or ksoftirqd task blocks on read_lock(&lock)
> 
> If the cfs_bandwidth.period_timer to replenish p0's runtime is enqueued on
> the same CPU as one where ktimers/ksoftirqd is blocked on read_lock(&lock),
> this creates a circular dependency.
> 
> This has been observed to happen with:
> o fs/eventpoll.c::ep->lock
> o net/netlink/af_netlink.c::nl_table_lock (after hand-fixing the above)
> but can trigger with any rwlock that can be acquired in both process and
> softirq contexts.
> 
> The linux-rt tree has had
>   1ea50f9636f0 ("softirq: Use a dedicated thread for timer wakeups.")
> which helped this scenario for non-rwlock locks by ensuring the throttled
> task would get PI'd to FIFO1 (ktimers' default priority). Unfortunately,
> rwlocks cannot sanely do PI as they allow multiple readers.
> 
> Make the period_timer expire in hardirq context under PREEMPT_RT. The
> callback for this timer can end up doing a lot of work, but this is
> mitigated somewhat when using nohz_full / CPU isolation: the timers *are*
> pinned, but on the CPUs the taskgroups are created on, which is usually
> going to be HK CPUs.

Moo... so I think 'people' have been pushing towards changing the
bandwidth thing to only throttle on the return-to-user path. This solves
the kernel side of the lock holder 'preemption' issue.

I'm thinking working on that is saner than adding this O(n) cgroup loop
to hard-irq context. Hmm?

  reply	other threads:[~2023-10-31 16:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30 14:51 [PATCH] sched/fair: Make the BW replenish timer expire in hardirq context for PREEMPT_RT Valentin Schneider
2023-10-31 16:01 ` Peter Zijlstra [this message]
2023-11-02 16:19   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231031160120.GE15024@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglozar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.