From: Julien Grall <julien@xen.org>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Jürgen Groß" <jgross@suse.com>,
xen-devel@lists.xenproject.org,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"George Dunlap" <george.dunlap@citrix.com>,
"Ian Jackson" <iwj@xenproject.org>,
"Stefano Stabellini" <sstabellini@kernel.org>,
"Wei Liu" <wl@xen.org>
Subject: Re: [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together
Date: Tue, 20 Oct 2020 10:25:07 +0100 [thread overview]
Message-ID: <2eb42b0e-f31e-2c1e-28bf-32c366fb1688@xen.org> (raw)
In-Reply-To: <4eb073bb-67ca-5376-bae1-e555d3c5fb30@suse.com>
Hi Jan,
On 16/10/2020 13:09, Jan Beulich wrote:
> On 16.10.2020 11:36, Julien Grall wrote:
>> On 15/10/2020 13:07, Jan Beulich wrote:
>>> On 14.10.2020 13:40, Julien Grall wrote:
>>>> On 13/10/2020 15:26, Jan Beulich wrote:
>>>>> On 13.10.2020 16:20, Jürgen Groß wrote:
>>>>>> Especially Julien was rather worried by the current situation. In
>>>>>> case you can convince him the current handling is fine, we can
>>>>>> easily drop this patch.
>>>>>
>>>>> Julien, in the light of the above - can you clarify the specific
>>>>> concerns you (still) have?
>>>>
>>>> Let me start with that the assumption if evtchn->lock is not held when
>>>> evtchn_fifo_set_pending() is called. If it is held, then my comment is moot.
>>>
>>> But this isn't interesting - we know there are paths where it is
>>> held, and ones (interdomain sending) where it's the remote port's
>>> lock instead which is held. What's important here is that a
>>> _consistent_ lock be held (but it doesn't need to be evtchn's).
>>
>> Yes, a _consistent_ lock *should* be sufficient. But it is better to use
>> the same lock everywhere so it is easier to reason (see more below).
>
> But that's already not the case, due to the way interdomain channels
> have events sent. You did suggest acquiring both locks, but as
> indicated at the time I think this goes too far. As far as the doc
> aspect - we can improve the situation. Iirc it was you who made me
> add the respective comment ahead of struct evtchn_port_ops.
>
>>>> From my understanding, the goal of lock_old_queue() is to return the
>>>> old queue used.
>>>>
>>>> last_priority and last_vcpu_id may be updated separately and I could not
>>>> convince myself that it would not be possible to return a queue that is
>>>> neither the current one nor the old one.
>>>>
>>>> The following could happen if evtchn->priority and
>>>> evtchn->notify_vcpu_id keeps changing between calls.
>>>>
>>>> pCPU0 | pCPU1
>>>> |
>>>> evtchn_fifo_set_pending(v0,...) |
>>>> | evtchn_fifo_set_pending(v1, ...)
>>>> [...] |
>>>> /* Queue has changed */ |
>>>> evtchn->last_vcpu_id = v0 |
>>>> | -> evtchn_old_queue()
>>>> | v = d->vcpu[evtchn->last_vcpu_id];
>>>> | old_q = ...
>>>> | spin_lock(old_q->...)
>>>> | v = ...
>>>> | q = ...
>>>> | /* q and old_q would be the same */
>>>> |
>>>> evtchn->las_priority = priority|
>>>>
>>>> If my diagram is correct, then pCPU1 would return a queue that is
>>>> neither the current nor old one.
>>>
>>> I think I agree.
>>>
>>>> In which case, I think it would at least be possible to corrupt the
>>>> queue. From evtchn_fifo_set_pending():
>>>>
>>>> /*
>>>> * If this event was a tail, the old queue is now empty and
>>>> * its tail must be invalidated to prevent adding an event to
>>>> * the old queue from corrupting the new queue.
>>>> */
>>>> if ( old_q->tail == port )
>>>> old_q->tail = 0;
>>>>
>>>> Did I miss anything?
>>>
>>> I don't think you did. The important point though is that a consistent
>>> lock is being held whenever we come here, so two racing set_pending()
>>> aren't possible for one and the same evtchn. As a result I don't think
>>> the patch here is actually needed.
>>
>> I haven't yet read in full details the rest of the patches to say
>> whether this is necessary or not. However, at a first glance, I think
>> this is not a sane to rely on different lock to protect us. And don't
>> get me started on the lack of documentation...
>>
>> Furthermore, the implementation of old_lock_queue() suggests that the
>> code was planned to be lockless. Why would you need the loop otherwise?
>
> The lock-less aspect of this affects multiple accesses to e.g.
> the same queue, I think.
I don't think we are talking about the same thing. What I was referring
to is the following code:
static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d,
struct evtchn *evtchn,
unsigned long *flags)
{
struct vcpu *v;
struct evtchn_fifo_queue *q, *old_q;
unsigned int try;
for ( try = 0; try < 3; try++ )
{
v = d->vcpu[evtchn->last_vcpu_id];
old_q = &v->evtchn_fifo->queue[evtchn->last_priority];
spin_lock_irqsave(&old_q->lock, *flags);
v = d->vcpu[evtchn->last_vcpu_id];
q = &v->evtchn_fifo->queue[evtchn->last_priority];
if ( old_q == q )
return old_q;
spin_unlock_irqrestore(&old_q->lock, *flags);
}
gprintk(XENLOG_WARNING,
"dom%d port %d lost event (too many queue changes)\n",
d->domain_id, evtchn->port);
return NULL;
}
Given that evtchn->last_vcpu_id and evtchn->last_priority can only be
modified in evtchn_fifo_set_pending(), this suggests that it is expected
for the function to multiple called concurrently on the same event channel.
> I'm unconvinced it was really considered
> whether racing sending on the same channel is also safe this way.
How would you explain the 3 try in lock_old_queue then?
>
>> Therefore, regardless the rest of the discussion, I think this patch
>> would be useful to have for our peace of mind.
>
> That's a fair position to take. My counterargument is mainly
> that readability (and hence maintainability) suffers with those
> changes.
We surely have different opinion... I don't particularly care about the
approach as long as it is *properly* documented.
Cheers,
--
Julien Grall
next prev parent reply other threads:[~2020-10-20 9:25 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-12 9:27 [PATCH v2 0/2] XSA-343 followup patches Juergen Gross
2020-10-12 9:27 ` [PATCH v2 1/2] xen/events: access last_priority and last_vcpu_id together Juergen Gross
2020-10-12 9:48 ` Paul Durrant
2020-10-12 9:56 ` Jürgen Groß
2020-10-12 10:06 ` Paul Durrant
2020-10-13 13:58 ` Jan Beulich
2020-10-13 14:20 ` Jürgen Groß
2020-10-13 14:26 ` Jan Beulich
2020-10-14 11:40 ` Julien Grall
2020-10-15 12:07 ` Jan Beulich
2020-10-16 5:46 ` Jürgen Groß
2020-10-16 9:36 ` Julien Grall
2020-10-16 12:09 ` Jan Beulich
2020-10-20 9:25 ` Julien Grall [this message]
2020-10-20 9:34 ` Jan Beulich
2020-10-20 10:01 ` Julien Grall
2020-10-20 10:06 ` Jan Beulich
2020-10-12 9:27 ` [PATCH v2 2/2] xen/evtchn: rework per event channel lock Juergen Gross
2020-10-13 14:02 ` Jan Beulich
2020-10-13 14:13 ` Jürgen Groß
2020-10-13 15:30 ` Jan Beulich
2020-10-13 15:28 ` Jan Beulich
2020-10-14 6:00 ` Jürgen Groß
2020-10-14 6:52 ` Jan Beulich
2020-10-14 7:27 ` Jürgen Groß
2020-10-16 9:51 ` Julien Grall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2eb42b0e-f31e-2c1e-28bf-32c366fb1688@xen.org \
--to=julien@xen.org \
--cc=andrew.cooper3@citrix.com \
--cc=george.dunlap@citrix.com \
--cc=iwj@xenproject.org \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=sstabellini@kernel.org \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).