All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"jtweaver@hawaii.edu" <jtweaver@hawaii.edu>,
	George Dunlap <George.Dunlap@citrix.com>,
	"henric@hawaii.edu" <henric@hawaii.edu>
Subject: Re: [PATCH v2] sched: credit2: respect per-vcpu hard affinity
Date: Fri, 6 Mar 2015 17:02:20 +0000	[thread overview]
Message-ID: <1425661338.16932.52.camel@citrix.com> (raw)
In-Reply-To: <54F9C55F.5070608@eu.citrix.com>


[-- Attachment #1.1: Type: text/plain, Size: 8133 bytes --]

On Fri, 2015-03-06 at 15:18 +0000, George Dunlap wrote:
> On 03/03/2015 03:15 AM, Dario Faggioli wrote:
> > On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:

> >>  /*
> >> + * Use this to avoid having too many cpumask_t structs on the stack
> >> + */
> >> +static cpumask_t **cpumask = NULL;
> >>
> > Not just 'cpumask', please... It's too generic a name. Let's pick up
> > something that makes it easier to understand what's the purpose of this.
> > 
> > I'm not really sure right now what something like that could be...
> > Credit has balance_mask, but we're not going as far as introducing
> > multiple step load balancing here (not with this patch, at least).
> > 
> > Maybe affinity_cpumask, or hard_affinity_cpumask (and we'll rename to
> > something else when introducing soft affinity, if needed).
> 
> Well I think it's just being used as scratch space, right?  Why not
> scratch_mask or something like that?
> 
Fine with me.

> >> +static int get_safe_pcpu(struct csched2_vcpu *svc)
> >> +{
> >>
> > I also don't like the name... __choose_cpu() maybe ?
> 
> I'm not 100% happy with the name, but I think "get_safe_pcpu" is more
> descriptive than just "__choose_cpu".
>
I don't like the "_safe_" part, but that is not a big deal, I certainly
can live with it.

> >> +    cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, &svc->rqd->active);
> >> +    if ( unlikely(cpumask_empty(csched2_cpumask)) )
> >> +        cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
> >> +            cpupool_online_cpumask(svc->vcpu->domain->cpupool));
> >>
> > VCPU2ONLINE(svc->vcpu) would make the line shorter.
> > 
> > Also, I know I'm the one that suggested this form for the code, but
> > thinking again about it, I'm not sure the first if is worth:
> > 
> >     cpumask_and(csched2_cpumask, &svc->rqd->active, VCPU2ONLINE(svc->vcpu));
> >     cpumask_and(csched2_cpumask, csched2_cpumask, svc->vcpu->cpu_hard_affinity);
> 
> Actually, I was going to say is there any reason not to start with:
> 
> if ( likely(cpumask_test_cpu(svc->vcpu->processor,
> svc->vcpu->cpu_hard_affinity))
>  return svc->vcpu->processor;
> 
> And then go through doing the unions and what-not once we've determined
> that the current processor isn't suitable?
> 
I like the idea, and I think it actually makes sense. I'm trying to
think to a scenario where this can be bugous, and where we actually need
to do the filtering against rqd->active and online up front, but I can't
imagine one.

I think the possible source of trouble is affinity being changed, but
even then, if we were on vcpu->processor, and that still is in our hard
affinity mask, it ought to be right to stay there (and hence George's
suggestion should be fine)... Justin, what do you think?

> > Oh, and, with either yours or my variant... can csched2_cpumask end up
> > empty after the two &&-s? That's important or, below, cpumask_any will
> > return garbage! :-/
> > 
> > From a quick inspection, it does not seem it can, in which case, I'd put
> > down an ASSERT() about this somewhere. OTOH, if it can, I'd look for
> > ways of preventing that to happen before getting here... It seems easier
> > and more correct to do that, rather than trying to figure out what to do
> > here if the mask is empty. :-O
> 
> Yes, we need to go through the code and make sure that we understand
> what our assumptions are, so that people can't crash Xen by doing
> irrational things like making the hard affinity not intersect with the
> cpupool cpus.
> 
True.

Something like that can be figured out and either forbidden or, in
general, addressed in other places, rather than requiring us to care
here. In fact, this seems to me to be what's happening already (see
below).

> If we make this an "unusual slow path", I don't see any reason to make
> the code a bit more resilient by handling cases we don't expect to
> happen.
>
Well, again, true, in a general fashion. However --perhaps because I'm
not sure I'm getting what "unusual slow path" means in this context-- if
we say that we support hard affinity, I see the fact that vCPUs can be
found on pCPUs outside of their hard affinity as a very bad thing.

For instance, people may rely on this for various reasons, e.g., to
achieve a high level of isolation between domains, by partitioning the
hard affinity of their vCPUs accordingly, and this isolation not being
enforced can screw things arbitrarily bad for them, I think.

Whether this is worth exploding it probably debatable (and workload and
use case dependent), but it definitely falls in the "I want to catch
this during testing" (so ==> ASSERT) category, IMO.

> It would be good to try to make sure we don't allow a situation
> where union(hard_affinity, domain->cpupool) is empty, but I'd rather the
> result be something not too bizarre (e.g., picking a random cpu in the
> cpupool) than having the host crash or something.
> 
Yeah, I like robustness... but only up to a certain extent, I think, or
you end up having to deal with all sort of nonsensical situations pretty
much everywhere! :-)

AFAICR, and having another quick look around, I think
intersection(hard_affinity,domain->cpupool) (because with union(), you
really meant intersection(), right? ;-P) can't be empty, and that's The
Right Thing (TM).

This is because, when a domain is moved to a cpupool, it's hard and soft
affinity is just reset to "all" (see sched_move_domain() in schedule.c).
Also, when the set of pCPUs of a cpupool is altered, or on CPU hotplug,
if that alteration is making the intersection empty, again the hard
affinity is reset to "all" (see cpu_disable_scheduler()).

So, dealing with this here with anything else than ASSERTs would at a
minimum look confusing to me. It would make me think that it is indeed
something possible and legitimate, and hence question why the code cited
above does what it does, etc. etc.

So, to summarize: provided the analysis above is verified, I wouldn't
BUG_ON, nor I would try to handle it, and I'd go for an ASSERT.

> >> -    /* FIXME: Pay attention to cpu affinity */                                                                                      
> >> -
> 
> Nice to see those disappear. :-)
> 
:-)

> >> +    /*
> >> +     * Assign new_cpu to vc->processor here to get a call to sched_move_irqs
> >> +     * in schedule.c in case there was a hard affinity change within the same
> >> +     * run queue. vc will not be able to run in certain situations without
> >> +     * this call.
> >> +     */
> >> +    vc->processor = new_cpu;
> >>
> > Oh, and this is what was causing you troubles, in case source and
> > destination runqueue were the same... Help me understand, which call to
> > sched_move_irqs() in schedule.c were we missing? I'd say it is the one
> > in vcpu_migrate(), but that does not seem to care about vc->processor
> > (much rater about new_cpu)... what am I missing?
> > 
> > However, if they are not the same, the call to migrate() will override
> > this right away, won't it? What I mean to say is, if this is required
> > only in case trqd and svc->rqd are equal, can we tweak this part of
> > csched2_vcpu_migrate() as follows?
> > 
> >     if ( trqd != svc->rqd )
> >         migrate(ops, svc, trqd, NOW());
> >     else
> >         vc->processor = new_cpu;
> 
> It does seem a bit weird to me looking at it now that when the generic
> scheduler does vcpu_migrate(), we go through all the hassle of calling
> pick_cpu, and then (if the runqueue happens to change) we end up picking
> *yet another* random cpu within that runqueue.
> 
Not sure I'm getting 100% of what you mean here... Are you complaining
on the fact that, independently from Justin's patch, inside migrate() we
call cpumask_any() instead than using new_cpu from here? If yes, that
makes sense, and it should not be too hard to fix...

> But that's a fix for another time I think.  
>
Agreed (again, if I got what you meant :-) ).

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-03-06 17:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-09  3:45 [PATCH v2] sched: credit2: respect per-vcpu hard affinity Justin T. Weaver
2015-03-03  3:15 ` Dario Faggioli
2015-03-03  9:12   ` Jan Beulich
2015-03-04 11:03     ` Dario Faggioli
2015-03-04 12:50       ` Jan Beulich
2015-03-04 13:08         ` Dario Faggioli
2015-03-04 13:24           ` Jan Beulich
2015-03-06 15:18   ` George Dunlap
2015-03-06 17:02     ` Dario Faggioli [this message]
2015-03-09  7:11       ` Justin Weaver
2015-03-09 11:45         ` George Dunlap
2015-03-09 15:07           ` Dario Faggioli
2015-03-10 13:23         ` Dario Faggioli
2015-03-13 17:11 ` Dario Faggioli
2015-03-14  3:48   ` Justin Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1425661338.16932.52.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=George.Dunlap@citrix.com \
    --cc=henric@hawaii.edu \
    --cc=jtweaver@hawaii.edu \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.