linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Vernet <void@manifault.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	rostedt@goodmis.org, dietmar.eggemann@arm.com,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, joshdon@google.com,
	roman.gushchin@linux.dev, tj@kernel.org, kernel-team@meta.com
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS
Date: Wed, 21 Jun 2023 15:34:45 -0500	[thread overview]
Message-ID: <20230621203445.GC15990@maniforge> (raw)
In-Reply-To: <20230621142020.GG2053369@hirez.programming.kicks-ass.net>

On Wed, Jun 21, 2023 at 04:20:20PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 13, 2023 at 12:20:04AM -0500, David Vernet wrote:
> > +struct swqueue {
> > +	struct list_head list;
> > +	spinlock_t lock;
> > +} ____cacheline_aligned;
> 
> I'm thinking you can shard this just fine, it makes that pop() needs to
> iterate all shards, but that shouldn't be a problem, and it would still
> only need to take a single lock.

Is the idea here to have multiple sharded queues per LLC, with a single
lock protecting them? Assuming so, is the idea that it would avoid
bouncing the list heads amongst all of the cores' cachelines? I could
see that being useful in some scenarios, but it also feels a bit
complicated for what it gives you. If you have tasks being pulled
quickly I don't think that will help much because the list heads will
just bounce to the pullers. Also, if the lock is already heavily
contended, it seems possible that the cores inside a single shard could
still bounce the head back and forth amongst them, or the cache line
bouncing will be a small-order overhead compared to the lock itself.

Or did I misunderstand your suggestion entirely?

> I'm thinking 4 or 8 shards should be plenty, even for Intel LLC.
> 
> >  #ifdef CONFIG_SMP
> 
> > +static struct task_struct *swqueue_pull_task(struct swqueue *swqueue)
> > +{
> > +	unsigned long flags;
> > +
> > +	struct task_struct *p;
> > +
> > +	spin_lock_irqsave(&swqueue->lock, flags);
> > +	p = list_first_entry_or_null(&swqueue->list, struct task_struct,
> > +				     swqueue_node);
> > +	if (p)
> > +		list_del_init(&p->swqueue_node);
> > +	spin_unlock_irqrestore(&swqueue->lock, flags);
> > +
> > +	return p;
> > +}
> 
> Would this not normally be called pop() or somesuch?

Yes, I'll improve the name in the next iteration. swqueue_dequeue() and
swqueue_enqueue() seem like the most canonical. Let me know if you have another
preference.

> 
> > +static void swqueue_enqueue(struct rq *rq, struct task_struct *p, int enq_flags)
> > +{
> > +	unsigned long flags;
> > +	struct swqueue *swqueue;
> > +	bool task_migrated = enq_flags & ENQUEUE_MIGRATED;
> > +	bool task_wakeup = enq_flags & ENQUEUE_WAKEUP;
> > +
> > +	/*
> > +	 * Only enqueue the task in the shared wakequeue if:
> > +	 *
> > +	 * - SWQUEUE is enabled
> > +	 * - The task is on the wakeup path
> > +	 * - The task wasn't purposefully migrated to the current rq by
> > +	 *   select_task_rq()
> > +	 * - The task isn't pinned to a specific CPU
> > +	 */
> > +	if (!task_wakeup || task_migrated || p->nr_cpus_allowed == 1)
> > +		return;
> 
> Elsewhere you mentioned heuristics, this smells like them. This and the
> is_cpus_allowed() thing makes you loose plenty of opportunities.

Yeah fair enough, these certainly are heuristics as well.

I thought it best to try and avoid swqueue getting in the way of
select_task_rq_fair() (at least to start out with), but we could always
remove that and run other experiments to see how it does.

> > +	swqueue = rq_swqueue(rq);
> > +	spin_lock_irqsave(&swqueue->lock, flags);
> > +	list_add_tail(&p->swqueue_node, &swqueue->list);
> > +	spin_unlock_irqrestore(&swqueue->lock, flags);
> > +}
> > +
> >  static int swqueue_pick_next_task(struct rq *rq, struct rq_flags *rf)
> >  {
> > -	return 0;
> > +	struct swqueue *swqueue;
> > +	struct task_struct *p = NULL;
> > +	struct rq *src_rq;
> > +	struct rq_flags src_rf;
> > +	int ret;
> > +
> > +	swqueue = rq_swqueue(rq);
> > +	if (!list_empty(&swqueue->list))
> > +		p = swqueue_pull_task(swqueue);
> > +
> > +	if (!p)
> > +		return 0;
> 
> At this point you can do the whole is_cpu_allowed() and avoid the whole
> lock dance if not.

Good idea, will incorporate into the next iteration.

> > +
> > +	rq_unpin_lock(rq, rf);
> > +	raw_spin_rq_unlock(rq);
> > +
> > +	src_rq = task_rq_lock(p, &src_rf);
> > +
> > +	if (task_on_rq_queued(p) && !task_on_cpu(rq, p))
> > +		src_rq = migrate_task_to(src_rq, &src_rf, p, cpu_of(rq));
> 
> And then this becomes move_queued_task().

Yep, will make this change per your suggestion in [0].

[0]: https://lore.kernel.org/all/20230621130439.GF2053369@hirez.programming.kicks-ass.net/

> > +	if (src_rq->cpu != rq->cpu)
> > +		ret = 1;
> > +	else
> > +		ret = -1;
> > +
> > +	task_rq_unlock(src_rq, p, &src_rf);
> > +
> > +	raw_spin_rq_lock(rq);
> > +	rq_repin_lock(rq, rf);
> > +
> > +	return ret;
> >  }
> >  
> >  static void swqueue_remove_task(struct task_struct *p)
> > +{
> > +	unsigned long flags;
> > +	struct swqueue *swqueue;
> > +
> > +	if (!list_empty(&p->swqueue_node)) {
> > +		swqueue = rq_swqueue(task_rq(p));
> > +		spin_lock_irqsave(&swqueue->lock, flags);
> > +		list_del_init(&p->swqueue_node);
> > +		spin_unlock_irqrestore(&swqueue->lock, flags);
> > +	}
> > +}
> 
> dequeue()

Ack

  reply	other threads:[~2023-06-21 20:35 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13  5:20 [RFC PATCH 0/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13  5:20 ` [RFC PATCH 1/3] sched: Make migrate_task_to() take any task David Vernet
2023-06-21 13:04   ` Peter Zijlstra
2023-06-22  2:07     ` David Vernet
2023-06-13  5:20 ` [RFC PATCH 2/3] sched/fair: Add SWQUEUE sched feature and skeleton calls David Vernet
2023-06-21 12:49   ` Peter Zijlstra
2023-06-22 14:53     ` David Vernet
2023-06-13  5:20 ` [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13  8:32   ` Peter Zijlstra
2023-06-14  4:35     ` Aaron Lu
2023-06-14  9:27       ` Peter Zijlstra
2023-06-15  0:01       ` David Vernet
2023-06-15  4:49         ` Aaron Lu
2023-06-15  7:31           ` Aaron Lu
2023-06-15 23:26             ` David Vernet
2023-06-16  0:53               ` Aaron Lu
2023-06-20 17:36                 ` David Vernet
2023-06-21  2:35                   ` Aaron Lu
2023-06-21  2:43                     ` David Vernet
2023-06-21  4:54                       ` Aaron Lu
2023-06-21  5:43                         ` David Vernet
2023-06-21  6:03                           ` Aaron Lu
2023-06-22 15:57                             ` Chris Mason
2023-06-13  8:41   ` Peter Zijlstra
2023-06-14 20:26     ` David Vernet
2023-06-16  8:08   ` Vincent Guittot
2023-06-20 19:54     ` David Vernet
2023-06-20 21:37       ` Roman Gushchin
2023-06-21 14:22       ` Peter Zijlstra
2023-06-19  6:13   ` Gautham R. Shenoy
2023-06-20 20:08     ` David Vernet
2023-06-21  8:17       ` Gautham R. Shenoy
2023-06-22  1:43         ` David Vernet
2023-06-22  9:11           ` Gautham R. Shenoy
2023-06-22 10:29             ` Peter Zijlstra
2023-06-23  9:50               ` Gautham R. Shenoy
2023-06-26  6:04                 ` Gautham R. Shenoy
2023-06-27  3:17                   ` David Vernet
2023-06-27 16:31                     ` Chris Mason
2023-06-21 14:20   ` Peter Zijlstra
2023-06-21 20:34     ` David Vernet [this message]
2023-06-22 10:58       ` Peter Zijlstra
2023-06-22 14:43         ` David Vernet
2023-07-10 11:57 ` [RFC PATCH 0/3] " K Prateek Nayak
2023-07-11  4:43   ` David Vernet
2023-07-11  5:06     ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230621203445.GC15990@maniforge \
    --to=void@manifault.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).