backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?

All of lore.kernel.org
 help / color / mirror / Atom feed

* backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
@ 2017-02-16  9:20 Jan Beulich
  2017-02-16 10:13 ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2017-02-16  9:20 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

Paul,

as it looks to be quite non-trivial an operation, did you happen to
have to backport commit 480b83162a to 4.4 or older, without
backporting all the ioreq server stuff at the same time? It looks to
me as if the issue predates the addition of ioreq servers, and us
having had customer reports here would seem to make this a
candidate fix (perhaps with at least 125833f5f1 ["x86: fix
ioreq-server event channel vulnerability"] also backported, which
also appears to address a pre-existing issue).

Thanks, Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16  9:20 backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ? Jan Beulich
@ 2017-02-16 10:13 ` Paul Durrant
  2017-02-16 10:22   ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2017-02-16 10:13 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 16 February 2017 09:21
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel <xen-devel@lists.xenproject.org>
> Subject: backport of "x86/hvm: don't rely on shared ioreq state for
> completion handling" ?
> 
> Paul,
> 
> as it looks to be quite non-trivial an operation, did you happen to
> have to backport commit 480b83162a to 4.4 or older, without
> backporting all the ioreq server stuff at the same time? It looks to
> me as if the issue predates the addition of ioreq servers, and us
> having had customer reports here would seem to make this a
> candidate fix (perhaps with at least 125833f5f1 ["x86: fix
> ioreq-server event channel vulnerability"] also backported, which
> also appears to address a pre-existing issue).
> 

Hi Jan,

Sorry, no I don't have a back-port. Agreed that the issue existed prior to ioreq servers but the checking was probably sufficiently lax that it never resulted in a domain_crash(), just bad data coming back from an emulation request.

  Paul

> Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16 10:13 ` Paul Durrant
@ 2017-02-16 10:22   ` Jan Beulich
  2017-02-16 10:36     ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2017-02-16 10:22 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 16.02.17 at 11:13, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 16 February 2017 09:21
>> 
>> as it looks to be quite non-trivial an operation, did you happen to
>> have to backport commit 480b83162a to 4.4 or older, without
>> backporting all the ioreq server stuff at the same time? It looks to
>> me as if the issue predates the addition of ioreq servers, and us
>> having had customer reports here would seem to make this a
>> candidate fix (perhaps with at least 125833f5f1 ["x86: fix
>> ioreq-server event channel vulnerability"] also backported, which
>> also appears to address a pre-existing issue).
> 
> Sorry, no I don't have a back-port. Agreed that the issue existed prior to 
> ioreq servers but the checking was probably sufficiently lax that it never 
> resulted in a domain_crash(), just bad data coming back from an emulation 
> request.

Well, according to the reports we've got, maybe it was less likely
to trigger, but it looks like it wasn't lax enough. Albeit I'm yet to
get confirmation that the issue was only seen during domain
shutdown, which aiui was (leaving aside a guest fiddling with the
shared structure, in which case it deserves being crashed) the
only condition triggering that domain_crash().

Once I have aforementioned confirmation, would you mind if I
asked you to look over the backports, should I manage to create
them in the first place?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16 10:22   ` Jan Beulich
@ 2017-02-16 10:36     ` Paul Durrant
  2017-02-16 10:46       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2017-02-16 10:36 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 16 February 2017 10:23
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel <xen-devel@lists.xenproject.org>
> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
> completion handling" ?
> 
> >>> On 16.02.17 at 11:13, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 16 February 2017 09:21
> >>
> >> as it looks to be quite non-trivial an operation, did you happen to
> >> have to backport commit 480b83162a to 4.4 or older, without
> >> backporting all the ioreq server stuff at the same time? It looks to
> >> me as if the issue predates the addition of ioreq servers, and us
> >> having had customer reports here would seem to make this a
> >> candidate fix (perhaps with at least 125833f5f1 ["x86: fix
> >> ioreq-server event channel vulnerability"] also backported, which
> >> also appears to address a pre-existing issue).
> >
> > Sorry, no I don't have a back-port. Agreed that the issue existed prior to
> > ioreq servers but the checking was probably sufficiently lax that it never
> > resulted in a domain_crash(), just bad data coming back from an emulation
> > request.
> 
> Well, according to the reports we've got, maybe it was less likely
> to trigger, but it looks like it wasn't lax enough. Albeit I'm yet to
> get confirmation that the issue was only seen during domain
> shutdown, which aiui was (leaving aside a guest fiddling with the
> shared structure, in which case it deserves being crashed) the
> only condition triggering that domain_crash().

If it is only on shutdown then that's probably just a toolstack race (since QEMU should really by dying cleanly when the guest goes to S5) unless we're talking about a forced shutdown.

> 
> Once I have aforementioned confirmation, would you mind if I
> asked you to look over the backports, should I manage to create
> them in the first place?
> 

Sure. No problem.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16 10:36     ` Paul Durrant
@ 2017-02-16 10:46       ` Jan Beulich
  2017-02-16 10:53         ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2017-02-16 10:46 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 16.02.17 at 11:36, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 16 February 2017 10:23
>> To: Paul Durrant <Paul.Durrant@citrix.com>
>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
>> completion handling" ?
>> 
>> >>> On 16.02.17 at 11:13, <Paul.Durrant@citrix.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: 16 February 2017 09:21
>> >>
>> >> as it looks to be quite non-trivial an operation, did you happen to
>> >> have to backport commit 480b83162a to 4.4 or older, without
>> >> backporting all the ioreq server stuff at the same time? It looks to
>> >> me as if the issue predates the addition of ioreq servers, and us
>> >> having had customer reports here would seem to make this a
>> >> candidate fix (perhaps with at least 125833f5f1 ["x86: fix
>> >> ioreq-server event channel vulnerability"] also backported, which
>> >> also appears to address a pre-existing issue).
>> >
>> > Sorry, no I don't have a back-port. Agreed that the issue existed prior to
>> > ioreq servers but the checking was probably sufficiently lax that it never
>> > resulted in a domain_crash(), just bad data coming back from an emulation
>> > request.
>> 
>> Well, according to the reports we've got, maybe it was less likely
>> to trigger, but it looks like it wasn't lax enough. Albeit I'm yet to
>> get confirmation that the issue was only seen during domain
>> shutdown, which aiui was (leaving aside a guest fiddling with the
>> shared structure, in which case it deserves being crashed) the
>> only condition triggering that domain_crash().
> 
> If it is only on shutdown then that's probably just a toolstack race (since 
> QEMU should really by dying cleanly when the guest goes to S5) unless we're 
> talking about a forced shutdown.

Then I may have misunderstood the original mail thread: Under
what other conditions did this trigger for the original reporters
(Sander and Roger)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16 10:46       ` Jan Beulich
@ 2017-02-16 10:53         ` Paul Durrant
  2017-02-16 11:00           ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2017-02-16 10:53 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 16 February 2017 10:46
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel <xen-devel@lists.xenproject.org>
> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
> completion handling" ?
> 
> >>> On 16.02.17 at 11:36, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 16 February 2017 10:23
> >> To: Paul Durrant <Paul.Durrant@citrix.com>
> >> Cc: xen-devel <xen-devel@lists.xenproject.org>
> >> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
> >> completion handling" ?
> >>
> >> >>> On 16.02.17 at 11:13, <Paul.Durrant@citrix.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> Sent: 16 February 2017 09:21
> >> >>
> >> >> as it looks to be quite non-trivial an operation, did you happen to
> >> >> have to backport commit 480b83162a to 4.4 or older, without
> >> >> backporting all the ioreq server stuff at the same time? It looks to
> >> >> me as if the issue predates the addition of ioreq servers, and us
> >> >> having had customer reports here would seem to make this a
> >> >> candidate fix (perhaps with at least 125833f5f1 ["x86: fix
> >> >> ioreq-server event channel vulnerability"] also backported, which
> >> >> also appears to address a pre-existing issue).
> >> >
> >> > Sorry, no I don't have a back-port. Agreed that the issue existed prior to
> >> > ioreq servers but the checking was probably sufficiently lax that it never
> >> > resulted in a domain_crash(), just bad data coming back from an
> emulation
> >> > request.
> >>
> >> Well, according to the reports we've got, maybe it was less likely
> >> to trigger, but it looks like it wasn't lax enough. Albeit I'm yet to
> >> get confirmation that the issue was only seen during domain
> >> shutdown, which aiui was (leaving aside a guest fiddling with the
> >> shared structure, in which case it deserves being crashed) the
> >> only condition triggering that domain_crash().
> >
> > If it is only on shutdown then that's probably just a toolstack race (since
> > QEMU should really by dying cleanly when the guest goes to S5) unless
> we're
> > talking about a forced shutdown.
> 
> Then I may have misunderstood the original mail thread: Under
> what other conditions did this trigger for the original reporters
> (Sander and Roger)?

Now you're asking... I'll have to see if I can find the original mail threads. It's possible it was stubdom related... but I could be thinking of something else.

  Paul

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16 10:53         ` Paul Durrant
@ 2017-02-16 11:00           ` Jan Beulich
  2017-02-16 11:17             ` Paul Durrant
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2017-02-16 11:00 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 16.02.17 at 11:53, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 16 February 2017 10:46
>> To: Paul Durrant <Paul.Durrant@citrix.com>
>> Cc: xen-devel <xen-devel@lists.xenproject.org>
>> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
>> completion handling" ?
>> 
>> >>> On 16.02.17 at 11:36, <Paul.Durrant@citrix.com> wrote:
>> >>  -----Original Message-----
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: 16 February 2017 10:23
>> >> To: Paul Durrant <Paul.Durrant@citrix.com>
>> >> Cc: xen-devel <xen-devel@lists.xenproject.org>
>> >> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
>> >> completion handling" ?
>> >>
>> >> >>> On 16.02.17 at 11:13, <Paul.Durrant@citrix.com> wrote:
>> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >> Sent: 16 February 2017 09:21
>> >> >>
>> >> >> as it looks to be quite non-trivial an operation, did you happen to
>> >> >> have to backport commit 480b83162a to 4.4 or older, without
>> >> >> backporting all the ioreq server stuff at the same time? It looks to
>> >> >> me as if the issue predates the addition of ioreq servers, and us
>> >> >> having had customer reports here would seem to make this a
>> >> >> candidate fix (perhaps with at least 125833f5f1 ["x86: fix
>> >> >> ioreq-server event channel vulnerability"] also backported, which
>> >> >> also appears to address a pre-existing issue).
>> >> >
>> >> > Sorry, no I don't have a back-port. Agreed that the issue existed prior to
>> >> > ioreq servers but the checking was probably sufficiently lax that it never
>> >> > resulted in a domain_crash(), just bad data coming back from an
>> emulation
>> >> > request.
>> >>
>> >> Well, according to the reports we've got, maybe it was less likely
>> >> to trigger, but it looks like it wasn't lax enough. Albeit I'm yet to
>> >> get confirmation that the issue was only seen during domain
>> >> shutdown, which aiui was (leaving aside a guest fiddling with the
>> >> shared structure, in which case it deserves being crashed) the
>> >> only condition triggering that domain_crash().
>> >
>> > If it is only on shutdown then that's probably just a toolstack race (since
>> > QEMU should really by dying cleanly when the guest goes to S5) unless
>> we're
>> > talking about a forced shutdown.
>> 
>> Then I may have misunderstood the original mail thread: Under
>> what other conditions did this trigger for the original reporters
>> (Sander and Roger)?
> 
> Now you're asking... I'll have to see if I can find the original mail 
> threads. It's possible it was stubdom related... but I could be thinking of 
> something else.

https://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05210.html

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16 11:00           ` Jan Beulich
@ 2017-02-16 11:17             ` Paul Durrant
  2017-02-16 11:22               ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Durrant @ 2017-02-16 11:17 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 16 February 2017 11:00
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel <xen-devel@lists.xenproject.org>
> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
> completion handling" ?
> 
> >>> On 16.02.17 at 11:53, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 16 February 2017 10:46
> >> To: Paul Durrant <Paul.Durrant@citrix.com>
> >> Cc: xen-devel <xen-devel@lists.xenproject.org>
> >> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
> >> completion handling" ?
> >>
> >> >>> On 16.02.17 at 11:36, <Paul.Durrant@citrix.com> wrote:
> >> >>  -----Original Message-----
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> Sent: 16 February 2017 10:23
> >> >> To: Paul Durrant <Paul.Durrant@citrix.com>
> >> >> Cc: xen-devel <xen-devel@lists.xenproject.org>
> >> >> Subject: RE: backport of "x86/hvm: don't rely on shared ioreq state for
> >> >> completion handling" ?
> >> >>
> >> >> >>> On 16.02.17 at 11:13, <Paul.Durrant@citrix.com> wrote:
> >> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> >> Sent: 16 February 2017 09:21
> >> >> >>
> >> >> >> as it looks to be quite non-trivial an operation, did you happen to
> >> >> >> have to backport commit 480b83162a to 4.4 or older, without
> >> >> >> backporting all the ioreq server stuff at the same time? It looks to
> >> >> >> me as if the issue predates the addition of ioreq servers, and us
> >> >> >> having had customer reports here would seem to make this a
> >> >> >> candidate fix (perhaps with at least 125833f5f1 ["x86: fix
> >> >> >> ioreq-server event channel vulnerability"] also backported, which
> >> >> >> also appears to address a pre-existing issue).
> >> >> >
> >> >> > Sorry, no I don't have a back-port. Agreed that the issue existed prior
> to
> >> >> > ioreq servers but the checking was probably sufficiently lax that it
> never
> >> >> > resulted in a domain_crash(), just bad data coming back from an
> >> emulation
> >> >> > request.
> >> >>
> >> >> Well, according to the reports we've got, maybe it was less likely
> >> >> to trigger, but it looks like it wasn't lax enough. Albeit I'm yet to
> >> >> get confirmation that the issue was only seen during domain
> >> >> shutdown, which aiui was (leaving aside a guest fiddling with the
> >> >> shared structure, in which case it deserves being crashed) the
> >> >> only condition triggering that domain_crash().
> >> >
> >> > If it is only on shutdown then that's probably just a toolstack race (since
> >> > QEMU should really by dying cleanly when the guest goes to S5) unless
> >> we're
> >> > talking about a forced shutdown.
> >>
> >> Then I may have misunderstood the original mail thread: Under
> >> what other conditions did this trigger for the original reporters
> >> (Sander and Roger)?
> >
> > Now you're asking... I'll have to see if I can find the original mail
> > threads. It's possible it was stubdom related... but I could be thinking of
> > something else.
> 
> https://lists.xenproject.org/archives/html/xen-devel/2015-
> 07/msg05210.html
> 

Thanks.

So, looking at my message https://lists.xenproject.org/archives/html/xen-devel/2015-07/msg05506.html, the problem with the emulator/toolstack was never diagnosed. I wonder whether it is a problem with running a PV aware guest in an HVM container, and using a PV shutdown mechanism causing the toolstack to kill the emulator rather than it shutting down gracefully? Prior to the ioreq series going in, the shared ioreq pages were never removed from the P2M, and so there was no concept of zeroing them before re-insertion (resulting in the ioreq state magically going straight to 'none' rather than 'resp ready'). Hence, even if the emulator were killed, you wouldn't hit the same sort of crash... more likely you'd end up with a stuck emulation and a wedged vcpu.

Roger's repro was with FreeBSD which is quite PV aware AFAIK. All my prior testing was done with Windows. So, maybe this points at a problem with libxl's behaviour when a guest is shutting down?

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ?
  2017-02-16 11:17             ` Paul Durrant
@ 2017-02-16 11:22               ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2017-02-16 11:22 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel

>>> On 16.02.17 at 12:17, <Paul.Durrant@citrix.com> wrote:
> Roger's repro was with FreeBSD which is quite PV aware AFAIK. All my prior 
> testing was done with Windows. So, maybe this points at a problem with 
> libxl's behaviour when a guest is shutting down?

Not impossible, but with what you say I will want to be even more
certain that the reports we have are for guests being shut down,
so I'll definitely want to wait with any backporting attempt. Thanks
for the additional insight in any event!

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-02-16 11:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-16  9:20 backport of "x86/hvm: don't rely on shared ioreq state for completion handling" ? Jan Beulich
2017-02-16 10:13 ` Paul Durrant
2017-02-16 10:22   ` Jan Beulich
2017-02-16 10:36     ` Paul Durrant
2017-02-16 10:46       ` Jan Beulich
2017-02-16 10:53         ` Paul Durrant
2017-02-16 11:00           ` Jan Beulich
2017-02-16 11:17             ` Paul Durrant
2017-02-16 11:22               ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.