xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* panic("queue invalidate wait descriptor was not executed\n")
@ 2016-05-11 13:51 Zytaruk, Kelly
  2016-05-12  9:49 ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Zytaruk, Kelly @ 2016-05-11 13:51 UTC (permalink / raw)
  To: xen-devel

During Xen boot I am seeing the panic in the subject line from .../xen/drivers/passthrough/vgt/qinval.c

From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware timeout on the invalidate

Disabling queued invalidation is not an option.  I need to find out why the operation is timing out and fix it.  

I found two timeouts; one in software and one in hardware. 
After the invalidate is submitted there is a wait packet submitted and the boot software waits for the wait packet to complete in a loop with a software timeout.  At the end of the software timeout it issues the panic.  I can increase the software timeout but it still doesn't solve the problem.  Just before the panic I dump the value of the Fault Status Register and I see that the hardware has already timed out (FSTS_REG = 0x40 = ITE = "Invalidation Timeout Error").  As a first step in solving this I would like to increase the hardware timeout value.

I have the Intel spec and I was reading from the spec...

" Hardware starts an invalidation completion timer for this ITag, and issues the invalidation request message to the specified endpoint. If the invalidation command from software is for a first-level mapping, the invalidation request message is generated with the appropriate PASID prefix to identify the target PASID. The invalidation completion time-out value is recommended to be sufficiently larger than the PCI-Express read completion time-outs. "

The above leads me to believe that there should be some way of setting the invalidation completion time-out value.  Unfortunately I couldn't find anything in the Intel spec that tells me how to set the "invalidation completion time-out".   Can someone point me in the right direction to setting the completion timer?

Thanks,
Kelly

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: panic("queue invalidate wait descriptor was not executed\n")
  2016-05-11 13:51 panic("queue invalidate wait descriptor was not executed\n") Zytaruk, Kelly
@ 2016-05-12  9:49 ` Jan Beulich
  2016-05-12 12:36   ` Zytaruk, Kelly
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2016-05-12  9:49 UTC (permalink / raw)
  To: Kelly Zytaruk; +Cc: Kevin Tian, Feng Wu, xen-devel

>>> On 11.05.16 at 15:51, <Kelly.Zytaruk@amd.com> wrote:
> During Xen boot I am seeing the panic in the subject line from 
> .../xen/drivers/passthrough/vgt/qinval.c

And this is with current staging, or some much older version of Xen?
(ISTR some issue with the invalidation request getting sent to the
wrong IOMMU, leading to a timeout.)

> From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware 
> timeout on the invalidate
> 
> Disabling queued invalidation is not an option.  I need to find out why the 
> operation is timing out and fix it.  
> 
> I found two timeouts; one in software and one in hardware. 
> After the invalidate is submitted there is a wait packet submitted and the 
> boot software waits for the wait packet to complete in a loop with a software 
> timeout.  At the end of the software timeout it issues the panic.  I can 
> increase the software timeout but it still doesn't solve the problem.  Just 
> before the panic I dump the value of the Fault Status Register and I see that 
> the hardware has already timed out (FSTS_REG = 0x40 = ITE = "Invalidation 
> Timeout Error").  As a first step in solving this I would like to increase 
> the hardware timeout value.
> 
> I have the Intel spec and I was reading from the spec...
> 
> " Hardware starts an invalidation completion timer for this ITag, and issues 
> the invalidation request message to the specified endpoint. If the 
> invalidation command from software is for a first-level mapping, the 
> invalidation request message is generated with the appropriate PASID prefix 
> to identify the target PASID. The invalidation completion time-out value is 
> recommended to be sufficiently larger than the PCI-Express read completion 
> time-outs. "
> 
> The above leads me to believe that there should be some way of setting the 
> invalidation completion time-out value.  Unfortunately I couldn't find 
> anything in the Intel spec that tells me how to set the "invalidation 
> completion time-out".   Can someone point me in the right direction to 
> setting the completion timer?

For this I guess you should have Cc-ed the VT-d maintainers, which
I have now done.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: panic("queue invalidate wait descriptor was not executed\n")
  2016-05-12  9:49 ` Jan Beulich
@ 2016-05-12 12:36   ` Zytaruk, Kelly
  2016-05-12 13:50     ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Zytaruk, Kelly @ 2016-05-12 12:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Feng Wu, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, May 12, 2016 5:49 AM
> To: Zytaruk, Kelly
> Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> >>> On 11.05.16 at 15:51, <Kelly.Zytaruk@amd.com> wrote:
> > During Xen boot I am seeing the panic in the subject line from
> > .../xen/drivers/passthrough/vgt/qinval.c
> 
> And this is with current staging, or some much older version of Xen?
> (ISTR some issue with the invalidation request getting sent to the wrong
> IOMMU, leading to a timeout.)

No this is not current Xen, it is with 4.2.

Can you tell me more about the invalidation request getting sent to the wrong IOMMU problem and approximately when it was fixed?  If you could identify the patch I could back port it into my copy of Xen for testing.

This is a NUMA system with 2 IOMMUs
I have 4 devices on 2 PCIe cards (2 per card)
They reside at the following locations 3:0.0, 5:0.0, 83:0.0 and 85:0.0
From what I understand about NUMA, based on the BDFs,  2 devices should be on one IOMMU and the other 2 should on the other IOMMU.

I put in some more print statements last night and discovered that during boot Xen attaches all 4 devices to the same IOMMU structure. Xen sends out a flush to all 4 devices on the first IOMMU and then follows it with a Wait invalidation packet to the same IOMMU.  Below is what I am seeing;

(XEN) IOMMU LIST - List of defined IOMMU structures
(XEN) iommu[00] @ ffff83103fffa5c0, Q=2060c04002, HEAD=90, TAIL=90
(XEN)     Seq Num = 0, pt_levels = 4, cap = 0x00d2078c106f0466, ecap = 0x0000000000f020df, domid_bitmap = 1, domid_map=0x0
(XEN) iommu[01] @ ffff83103fffa790, Q=103ffec002, HEAD=bd0, TAIL=bd0
(XEN)     Seq Num = 1, pt_levels = 4, cap = 0x00d2078c106f0466, ecap = 0x0000000000f020df, domid_bitmap = 1, domid_map=0x0

(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7ffffffffffff001 0x0000830000000003 for 83:00.0 (index = 9), iommu = ffff83103fffa5c0, fault = 0x00000000
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7ffffffffffff001 0x0000810000000003 for 81:00.0 (index = 10), iommu = ffff83103fffa5c0, fault = 0x00000000
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7ffffffffffff001 0x0000050000000003 for 05:00.0 (index = 11), iommu = ffff83103fffa5c0, fault = 0x00000000
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7ffffffffffff001 0x0000030000000003 for 03:00.0 (index = 12), iommu = ffff83103fffa5c0, fault = 0x00000000
(XEN) queue_invalidate_wait (iommu = ffff83103fffa5c0)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) queue invalidate wait descriptor was not executed
(XEN) ****************************************

Is it a bug to have all 4 devices on the same IOMMU?  Is this why the Wait Invalidation is failing?
Actually I am not sure if Xen is attaching all 4 devices to the same IOMMU or if it is generating the dev iotlb descriptors wrong

> 
> > From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware
> > timeout on the invalidate
> >
> > Disabling queued invalidation is not an option.  I need to find out
> > why the operation is timing out and fix it.
> >
> > I found two timeouts; one in software and one in hardware.
> > After the invalidate is submitted there is a wait packet submitted and
> > the boot software waits for the wait packet to complete in a loop with
> > a software timeout.  At the end of the software timeout it issues the
> > panic.  I can increase the software timeout but it still doesn't solve
> > the problem.  Just before the panic I dump the value of the Fault
> > Status Register and I see that the hardware has already timed out
> > (FSTS_REG = 0x40 = ITE = "Invalidation Timeout Error").  As a first
> > step in solving this I would like to increase the hardware timeout value.
> >
> > I have the Intel spec and I was reading from the spec...
> >
> > " Hardware starts an invalidation completion timer for this ITag, and
> > issues the invalidation request message to the specified endpoint. If
> > the invalidation command from software is for a first-level mapping,
> > the invalidation request message is generated with the appropriate
> > PASID prefix to identify the target PASID. The invalidation completion
> > time-out value is recommended to be sufficiently larger than the
> > PCI-Express read completion time-outs. "
> >
> > The above leads me to believe that there should be some way of setting
> > the invalidation completion time-out value.  Unfortunately I couldn't
> > find anything in the Intel spec that tells me how to set the "invalidation
> > completion time-out".   Can someone point me in the right direction to
> > setting the completion timer?
> 
> For this I guess you should have Cc-ed the VT-d maintainers, which I have now
> done.
> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: panic("queue invalidate wait descriptor was not executed\n")
  2016-05-12 12:36   ` Zytaruk, Kelly
@ 2016-05-12 13:50     ` Jan Beulich
  2016-05-12 14:21       ` Zytaruk, Kelly
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2016-05-12 13:50 UTC (permalink / raw)
  To: Kelly Zytaruk; +Cc: Kevin Tian, Feng Wu, xen-devel

>>> On 12.05.16 at 14:36, <Kelly.Zytaruk@amd.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, May 12, 2016 5:49 AM
>> >>> On 11.05.16 at 15:51, <Kelly.Zytaruk@amd.com> wrote:
>> > During Xen boot I am seeing the panic in the subject line from
>> > .../xen/drivers/passthrough/vgt/qinval.c
>> 
>> And this is with current staging, or some much older version of Xen?
>> (ISTR some issue with the invalidation request getting sent to the wrong
>> IOMMU, leading to a timeout.)
> 
> No this is not current Xen, it is with 4.2.
> 
> Can you tell me more about the invalidation request getting sent to the 
> wrong IOMMU problem and approximately when it was fixed?  If you could 
> identify the patch I could back port it into my copy of Xen for testing.

Note that 4.2.5 has said change, and also note that you could have
done exactly what I have done now - go through the list of commits
altering files in the vtd/ subtree. This is what I've been remembering:
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb99278b6ba885616bb183b88ad67

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: panic("queue invalidate wait descriptor was not executed\n")
  2016-05-12 13:50     ` Jan Beulich
@ 2016-05-12 14:21       ` Zytaruk, Kelly
  2016-05-13  7:11         ` Wu, Feng
  0 siblings, 1 reply; 8+ messages in thread
From: Zytaruk, Kelly @ 2016-05-12 14:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Feng Wu, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, May 12, 2016 9:51 AM
> To: Zytaruk, Kelly
> Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> >>> On 12.05.16 at 14:36, <Kelly.Zytaruk@amd.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Thursday, May 12, 2016 5:49 AM
> >> >>> On 11.05.16 at 15:51, <Kelly.Zytaruk@amd.com> wrote:
> >> > During Xen boot I am seeing the panic in the subject line from
> >> > .../xen/drivers/passthrough/vgt/qinval.c
> >>
> >> And this is with current staging, or some much older version of Xen?
> >> (ISTR some issue with the invalidation request getting sent to the
> >> wrong IOMMU, leading to a timeout.)
> >
> > No this is not current Xen, it is with 4.2.
> >
> > Can you tell me more about the invalidation request getting sent to
> > the wrong IOMMU problem and approximately when it was fixed?  If you
> > could identify the patch I could back port it into my copy of Xen for testing.
> 
> Note that 4.2.5 has said change, and also note that you could have done exactly
> what I have done now - go through the list of commits altering files in the vtd/
> subtree. 

Unfortunately GIT is not my strong suit :( I am still learning to navigate with it. I guess part of my problem with GIT is that I don't yet know what I don't know.

>This is what I've been remembering:
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9927
> 8b6ba885616bb183b88ad67

The comment on this link describes exactly what I am experiencing.
Thanks so much.

> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: panic("queue invalidate wait descriptor was not executed\n")
  2016-05-12 14:21       ` Zytaruk, Kelly
@ 2016-05-13  7:11         ` Wu, Feng
  2016-05-13 12:22           ` Zytaruk, Kelly
  0 siblings, 1 reply; 8+ messages in thread
From: Wu, Feng @ 2016-05-13  7:11 UTC (permalink / raw)
  To: Zytaruk, Kelly, Jan Beulich; +Cc: Tian, Kevin, Wu, Feng, xen-devel



> -----Original Message-----
> From: Zytaruk, Kelly [mailto:Kelly.Zytaruk@amd.com]
> Sent: Thursday, May 12, 2016 10:21 PM
> To: Jan Beulich <JBeulich@suse.com>
> Cc: Wu, Feng <feng.wu@intel.com>; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> 
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Thursday, May 12, 2016 9:51 AM
> > To: Zytaruk, Kelly
> > Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> > executed\n")
> >
> > >>> On 12.05.16 at 14:36, <Kelly.Zytaruk@amd.com> wrote:
> > >> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> Sent: Thursday, May 12, 2016 5:49 AM
> > >> >>> On 11.05.16 at 15:51, <Kelly.Zytaruk@amd.com> wrote:
> > >> > During Xen boot I am seeing the panic in the subject line from
> > >> > .../xen/drivers/passthrough/vgt/qinval.c
> > >>
> > >> And this is with current staging, or some much older version of Xen?
> > >> (ISTR some issue with the invalidation request getting sent to the
> > >> wrong IOMMU, leading to a timeout.)
> > >
> > > No this is not current Xen, it is with 4.2.
> > >
> > > Can you tell me more about the invalidation request getting sent to
> > > the wrong IOMMU problem and approximately when it was fixed?  If you
> > > could identify the patch I could back port it into my copy of Xen for testing.
> >
> > Note that 4.2.5 has said change, and also note that you could have done
> exactly
> > what I have done now - go through the list of commits altering files in the vtd/
> > subtree.
> 
> Unfortunately GIT is not my strong suit :( I am still learning to navigate with it. I
> guess part of my problem with GIT is that I don't yet know what I don't know.
> 
> >This is what I've been remembering:
> >
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9927
> > 8b6ba885616bb183b88ad67
> 
> The comment on this link describes exactly what I am experiencing.
> Thanks so much.

Thanks Jan for providing the information above. Kelly, if you still met the same
issue after applying the patches, let us know, maybe I can consult some hardware
expert internally.

Thanks,
Feng

> 
> >
> > Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: panic("queue invalidate wait descriptor was not executed\n")
  2016-05-13  7:11         ` Wu, Feng
@ 2016-05-13 12:22           ` Zytaruk, Kelly
  2016-05-13 12:29             ` Wu, Feng
  0 siblings, 1 reply; 8+ messages in thread
From: Zytaruk, Kelly @ 2016-05-13 12:22 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich; +Cc: Tian, Kevin, xen-devel



> -----Original Message-----
> From: Wu, Feng [mailto:feng.wu@intel.com]
> Sent: Friday, May 13, 2016 3:11 AM
> To: Zytaruk, Kelly; Jan Beulich
> Cc: Tian, Kevin; xen-devel@lists.xen.org; Wu, Feng
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> 
> 
> > -----Original Message-----
> > From: Zytaruk, Kelly [mailto:Kelly.Zytaruk@amd.com]
> > Sent: Thursday, May 12, 2016 10:21 PM
> > To: Jan Beulich <JBeulich@suse.com>
> > Cc: Wu, Feng <feng.wu@intel.com>; Tian, Kevin <kevin.tian@intel.com>;
> > xen- devel@lists.xen.org
> > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was
> > not
> > executed\n")
> >
> >
> >
> > > -----Original Message-----
> > > From: Jan Beulich [mailto:JBeulich@suse.com]
> > > Sent: Thursday, May 12, 2016 9:51 AM
> > > To: Zytaruk, Kelly
> > > Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> > > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was
> > > not
> > > executed\n")
> > >
> > > >>> On 12.05.16 at 14:36, <Kelly.Zytaruk@amd.com> wrote:
> > > >> From: Jan Beulich [mailto:JBeulich@suse.com]
> > > >> Sent: Thursday, May 12, 2016 5:49 AM
> > > >> >>> On 11.05.16 at 15:51, <Kelly.Zytaruk@amd.com> wrote:
> > > >> > During Xen boot I am seeing the panic in the subject line from
> > > >> > .../xen/drivers/passthrough/vgt/qinval.c
> > > >>
> > > >> And this is with current staging, or some much older version of Xen?
> > > >> (ISTR some issue with the invalidation request getting sent to
> > > >> the wrong IOMMU, leading to a timeout.)
> > > >
> > > > No this is not current Xen, it is with 4.2.
> > > >
> > > > Can you tell me more about the invalidation request getting sent
> > > > to the wrong IOMMU problem and approximately when it was fixed?
> > > > If you could identify the patch I could back port it into my copy of Xen for
> testing.
> > >
> > > Note that 4.2.5 has said change, and also note that you could have
> > > done
> > exactly
> > > what I have done now - go through the list of commits altering files
> > > in the vtd/ subtree.
> >
> > Unfortunately GIT is not my strong suit :( I am still learning to
> > navigate with it. I guess part of my problem with GIT is that I don't yet know
> what I don't know.
> >
> > >This is what I've been remembering:
> > >
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9
> > 927
> > > 8b6ba885616bb183b88ad67
> >
> > The comment on this link describes exactly what I am experiencing.
> > Thanks so much.
> 
> Thanks Jan for providing the information above. Kelly, if you still met the same
> issue after applying the patches, let us know, maybe I can consult some
> hardware expert internally.

Turns out this was exactly my problem.  The description matched my symptoms and when I applied the patch the problem has gone away.
Thanks,
Kelly

> 
> Thanks,
> Feng
> 
> >
> > >
> > > Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: panic("queue invalidate wait descriptor was not executed\n")
  2016-05-13 12:22           ` Zytaruk, Kelly
@ 2016-05-13 12:29             ` Wu, Feng
  0 siblings, 0 replies; 8+ messages in thread
From: Wu, Feng @ 2016-05-13 12:29 UTC (permalink / raw)
  To: Zytaruk, Kelly, Jan Beulich; +Cc: Tian, Kevin, Wu, Feng, xen-devel

> > > >This is what I've been remembering:
> > > >
> > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9
> > > 927
> > > > 8b6ba885616bb183b88ad67
> > >
> > > The comment on this link describes exactly what I am experiencing.
> > > Thanks so much.
> >
> > Thanks Jan for providing the information above. Kelly, if you still met the same
> > issue after applying the patches, let us know, maybe I can consult some
> > hardware expert internally.
> 
> Turns out this was exactly my problem.  The description matched my symptoms
> and when I applied the patch the problem has gone away.
> Thanks,
> Kelly
> 

Good to hear this! :)

Thanks,
Feng

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-13 12:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-11 13:51 panic("queue invalidate wait descriptor was not executed\n") Zytaruk, Kelly
2016-05-12  9:49 ` Jan Beulich
2016-05-12 12:36   ` Zytaruk, Kelly
2016-05-12 13:50     ` Jan Beulich
2016-05-12 14:21       ` Zytaruk, Kelly
2016-05-13  7:11         ` Wu, Feng
2016-05-13 12:22           ` Zytaruk, Kelly
2016-05-13 12:29             ` Wu, Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).