From: Dario Faggioli <dfaggioli@suse.com>
To: "sstabellini@kernel.org" <sstabellini@kernel.org>
Cc: "George.Dunlap@eu.citrix.com" <George.Dunlap@eu.citrix.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
"julien.grall@arm.com" <julien.grall@arm.com>,
"jgross@suse.de" <jgross@suse.de>
Subject: Re: [Xen-devel] dom0less + sched=null => broken in staging
Date: Mon, 28 Oct 2019 05:35:52 +0000 [thread overview]
Message-ID: <114c301a92c942208c63daa5046db4534b95da4a.camel@suse.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1908231722430.26226@sstabellini-ThinkPad-T480s>
[-- Attachment #1.1.1: Type: text/plain, Size: 1057 bytes --]
On Fri, 2019-08-23 at 18:16 -0700, Stefano Stabellini wrote:
> On Wed, 21 Aug 2019, Dario Faggioli wrote:
> > Hey, Stefano, Julien,
> >
> > Here's another patch.
> >
> > Rather than a debug patch, this is rather an actual "proposed
> > solution".
> >
> > Can you give it a go? If it works, I'll spin it as a proper patch.
>
> Yes, this seems to solve the problem, thank you!
>
Hey,
Sorry this is taking a little while. Can any of you please test the
attached, on top of current staging?
In fact, I rebased the patch in my last email on top of that, and I'd
like to know if it still works, even now that core-scheduling is in.
If it does, then a proper changelog is the only thing it'd be missing,
and I'll do it quickly, I promise :-)
Regards,
Dario
--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)
[-- Attachment #1.1.2: xen-sched-null-vcpu-onoff-coresched.patch --]
[-- Type: text/x-patch, Size: 6949 bytes --]
commit 403339e2da498491573b8db539fe0307643264ee
Author: Dario Faggioli <dfaggioli@suse.com>
Date: Sat Oct 26 00:21:29 2019 +0200
TBD: Fix for online issue
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 2525464a7c..af1cf5e37e 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -568,50 +568,52 @@ static void null_unit_wake(const struct scheduler *ops,
else
SCHED_STAT_CRANK(unit_wake_not_runnable);
+ if ( likely(per_cpu(npc, cpu).unit == unit) )
+ {
+ cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+ return;
+ }
+
/*
* If a unit is neither on a pCPU nor in the waitqueue, it means it was
- * offline, and that it is now coming back being online.
+ * offline, and that it is now coming back being online. If we're lucky,
+ * and it's previous resource is free (and affinities match), we can just
+ * assign the unit to it (we own the proper lock already) and be done.
*/
- if ( unlikely(per_cpu(npc, cpu).unit != unit && list_empty(&nvc->waitq_elem)) )
+ if ( per_cpu(npc, cpu).unit == NULL &&
+ unit_check_affinity(unit, cpu, BALANCE_HARD_AFFINITY) )
{
- spin_lock(&prv->waitq_lock);
- list_add_tail(&nvc->waitq_elem, &prv->waitq);
- spin_unlock(&prv->waitq_lock);
-
- cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
- cpupool_domain_master_cpumask(unit->domain));
-
- if ( !cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) )
+ if ( !has_soft_affinity(unit) ||
+ unit_check_affinity(unit, cpu, BALANCE_SOFT_AFFINITY) )
{
- dprintk(XENLOG_G_WARNING, "WARNING: d%dv%d not assigned to any CPU!\n",
- unit->domain->domain_id, unit->unit_id);
+ unit_assign(prv, unit, cpu);
+ cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
return;
}
+ }
- /*
- * Now we would want to assign the unit to cpu, but we can't, because
- * we don't have the lock. So, let's do the following:
- * - try to remove cpu from the list of free cpus, to avoid races with
- * other onlining, inserting or migrating operations;
- * - tickle the cpu, which will pickup work from the waitqueue, and
- * assign it to itself;
- * - if we're racing already, and if there still are free cpus, try
- * again.
- */
- while ( cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) )
- {
- unsigned int new_cpu = pick_res(prv, unit)->master_cpu;
+ /*
+ * If the resource is not free (or affinities do not match) we need
+ * to assign unit to some other one, but we can't do it here, as:
+ * - we don't own the proper lock,
+ * - we can't change v->processor under vcpu_wake()'s feet.
+ * So we add it to the waitqueue, and tickle all the free CPUs (if any)
+ * on which unit can run. The first one that schedules will pick it up.
+ */
+ spin_lock(&prv->waitq_lock);
+ list_add_tail(&nvc->waitq_elem, &prv->waitq);
+ spin_unlock(&prv->waitq_lock);
- if ( test_and_clear_bit(new_cpu, &prv->cpus_free) )
- {
- cpu_raise_softirq(new_cpu, SCHEDULE_SOFTIRQ);
- return;
- }
- }
- }
+ cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
+ cpupool_domain_master_cpumask(unit->domain));
+ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+ &prv->cpus_free);
- /* Note that we get here only for units assigned to a pCPU */
- cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ);
+ if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
+ dprintk(XENLOG_G_WARNING, "WARNING: d%dv%d not assigned to any CPU!\n",
+ unit->domain->domain_id, unit->unit_id);
+ else
+ cpumask_raise_softirq(cpumask_scratch_cpu(cpu), SCHEDULE_SOFTIRQ);
}
static void null_unit_sleep(const struct scheduler *ops,
@@ -827,6 +829,8 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
*/
if ( unlikely(prev->next_task == NULL) )
{
+ bool unit_found;
+
spin_lock(&prv->waitq_lock);
if ( list_empty(&prv->waitq) )
@@ -839,6 +843,7 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
* it only in cases where a pcpu has no unit associated (e.g., as
* said above, the cpu has just joined a cpupool).
*/
+ unit_found = false;
for_each_affinity_balance_step( bs )
{
list_for_each_entry( wvc, &prv->waitq, waitq_elem )
@@ -849,13 +854,45 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
if ( unit_check_affinity(wvc->unit, sched_cpu, bs) )
{
- unit_assign(prv, wvc->unit, sched_cpu);
- list_del_init(&wvc->waitq_elem);
- prev->next_task = wvc->unit;
- goto unlock;
+ spinlock_t *lock;
+
+ unit_found = true;
+
+ /*
+ * If the unit in the waitqueue has just come up online,
+ * we risk racing with vcpu_wake(). To avoid this, sync
+ * on the spinlock that vcpu_wake() holds, but only with
+ * trylock, to avoid deadlock).
+ */
+ lock = pcpu_schedule_trylock(sched_unit_master(wvc->unit));
+
+ /*
+ * We know the vcpu's lock is not this resource's lock. In
+ * fact, if it were, since this cpu is free, vcpu_wake()
+ * would have assigned the unit to here directly.
+ */
+ ASSERT(lock != get_sched_res(sched_cpu)->schedule_lock);
+
+ if ( lock ) {
+ unit_assign(prv, wvc->unit, sched_cpu);
+ list_del_init(&wvc->waitq_elem);
+ prev->next_task = wvc->unit;
+ spin_unlock(lock);
+ goto unlock;
+ }
}
}
}
+ /*
+ * If we did find a unit with suitable affinity in the waitqueue, but
+ * we could not pick it up (due to lock contention), and hence we are
+ * still free, plan for another try. In fact, we don't want such unit
+ * to be stuck in the waitqueue, when there are free cpus where it
+ * could run.
+ */
+ if ( unlikely( unit_found && prev->next_task == NULL &&
+ !list_empty(&prv->waitq)) )
+ cpu_raise_softirq(cur_cpu, SCHEDULE_SOFTIRQ);
unlock:
spin_unlock(&prv->waitq_lock);
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2019-10-28 5:38 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-07 18:22 [Xen-devel] dom0less + sched=null => broken in staging Stefano Stabellini
2019-08-08 8:04 ` George Dunlap
2019-08-08 20:44 ` Stefano Stabellini
2019-08-09 7:40 ` Dario Faggioli
2019-08-09 17:57 ` Dario Faggioli
2019-08-09 18:30 ` Stefano Stabellini
2019-08-13 15:27 ` Dario Faggioli
2019-08-13 16:52 ` Julien Grall
2019-08-13 17:34 ` Dario Faggioli
2019-08-13 18:43 ` Julien Grall
2019-08-13 22:26 ` Julien Grall
2019-08-13 22:34 ` Dario Faggioli
2019-08-13 23:07 ` Julien Grall
2019-08-13 21:14 ` Stefano Stabellini
2019-08-14 2:04 ` Dario Faggioli
2019-08-14 16:27 ` Stefano Stabellini
2019-08-14 17:35 ` Dario Faggioli
2019-08-21 10:33 ` Dario Faggioli
2019-08-24 1:16 ` Stefano Stabellini
2019-09-11 13:53 ` Dario Faggioli
2019-09-25 15:19 ` Julien Grall
2019-09-25 15:34 ` Dario Faggioli
2019-09-25 15:39 ` Julien Grall
2019-09-25 15:41 ` Jürgen Groß
2019-10-28 5:35 ` Dario Faggioli [this message]
2019-10-28 18:40 ` Stefano Stabellini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=114c301a92c942208c63daa5046db4534b95da4a.camel@suse.com \
--to=dfaggioli@suse.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=jgross@suse.de \
--cc=julien.grall@arm.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).