[PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
@ 2017-10-19 11:26 Andrew Cooper
  2017-10-19 12:11 ` Jan Beulich
  2017-10-19 16:22 ` [PATCH for-4.10 v2] " Andrew Cooper
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Cooper @ 2017-10-19 11:26 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Kevin Tian, Julien Grall, Jan Beulich

DMA-ing to the stack is generally considered bad practice.  In this case, if a
timeout occurs because of a sluggish device which is processing the request,
the completion notification will corrupt the stack of a subsequent deeper call
tree.

Place the poll_slot in a percpu area and DMA to that instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Julien Grall <julien.grall@arm.com>

Julien: This wants backporting to all releases, and therefore should be
considered for 4.10 at this point.
---
 xen/drivers/passthrough/vtd/qinval.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
index e95dc54..0ddda00 100644
--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
@@ -147,7 +147,8 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
                                               u8 iflag, u8 sw, u8 fn,
                                               bool_t flush_dev_iotlb)
 {
-    volatile u32 poll_slot = QINVAL_STAT_INIT;
+    static DEFINE_PER_CPU(u32, poll_slot);
+    volatile u32 *this_poll_slot = &this_cpu(poll_slot);
     unsigned int index;
     unsigned long flags;
     u64 entry_base;
@@ -167,7 +168,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
     qinval_entry->q.inv_wait_dsc.lo.res_1 = 0;
     qinval_entry->q.inv_wait_dsc.lo.sdata = QINVAL_STAT_DONE;
     qinval_entry->q.inv_wait_dsc.hi.res_1 = 0;
-    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(&poll_slot) >> 2;
+    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(this_poll_slot) >> 2;
 
     unmap_vtd_domain_page(qinval_entries);
     qinval_update_qtail(iommu, index);
@@ -182,7 +183,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
         timeout = NOW() + MILLISECS(flush_dev_iotlb ?
                                     iommu_dev_iotlb_timeout : VTD_QI_TIMEOUT);
 
-        while ( poll_slot != QINVAL_STAT_DONE )
+        while ( *this_poll_slot != QINVAL_STAT_DONE )
         {
             if ( NOW() > timeout )
             {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-19 11:26 [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait() Andrew Cooper
@ 2017-10-19 12:11 ` Jan Beulich
  2017-10-19 12:54   ` Andrew Cooper
  2017-10-19 16:22 ` [PATCH for-4.10 v2] " Andrew Cooper
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2017-10-19 12:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Kevin Tian, Xen-devel

>>> On 19.10.17 at 13:26, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/drivers/passthrough/vtd/qinval.c
> +++ b/xen/drivers/passthrough/vtd/qinval.c
> @@ -147,7 +147,8 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>                                                u8 iflag, u8 sw, u8 fn,
>                                                bool_t flush_dev_iotlb)
>  {
> -    volatile u32 poll_slot = QINVAL_STAT_INIT;

You've lost the initializer.

> +    static DEFINE_PER_CPU(u32, poll_slot);

volatile u32

> @@ -182,7 +183,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>          timeout = NOW() + MILLISECS(flush_dev_iotlb ?
>                                      iommu_dev_iotlb_timeout : VTD_QI_TIMEOUT);
>  
> -        while ( poll_slot != QINVAL_STAT_DONE )
> +        while ( *this_poll_slot != QINVAL_STAT_DONE )
>          {
>              if ( NOW() > timeout )
>              {

Okay, you indeed improve the situation. But is that improvement
enough? I.e. what if the write of a first (timed out) request happens
while waiting for a subsequent one? Don't you need distinct addresses
for every possible slot? Or alternatively isn't it high time for the
interrupt approach to be made work (perhaps not by you, but rather
by Intel folks)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-19 12:11 ` Jan Beulich
@ 2017-10-19 12:54   ` Andrew Cooper
  2017-10-19 13:25     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2017-10-19 12:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Julien Grall, Kevin Tian, Xen-devel

On 19/10/17 13:11, Jan Beulich wrote:
>>>> On 19.10.17 at 13:26, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/qinval.c
>> +++ b/xen/drivers/passthrough/vtd/qinval.c
>> @@ -147,7 +147,8 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>>                                                u8 iflag, u8 sw, u8 fn,
>>                                                bool_t flush_dev_iotlb)
>>  {
>> -    volatile u32 poll_slot = QINVAL_STAT_INIT;
> You've lost the initializer.

Deliberately so.

>
>> +    static DEFINE_PER_CPU(u32, poll_slot);
> volatile u32

You've clipped out the bit declaring the pointer as volatile, which
suffices to retain the previous properties.

>
>> @@ -182,7 +183,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>>          timeout = NOW() + MILLISECS(flush_dev_iotlb ?
>>                                      iommu_dev_iotlb_timeout : VTD_QI_TIMEOUT);
>>  
>> -        while ( poll_slot != QINVAL_STAT_DONE )
>> +        while ( *this_poll_slot != QINVAL_STAT_DONE )
>>          {
>>              if ( NOW() > timeout )
>>              {
> Okay, you indeed improve the situation. But is that improvement
> enough?

For not corrupting the stack, yes.

> I.e. what if the write of a first (timed out) request happens
> while waiting for a subsequent one? Don't you need distinct addresses
> for every possible slot?

Certainly everything which is currently pending.

> Or alternatively isn't it high time for the
> interrupt approach to be made work (perhaps not by you, but rather
> by Intel folks)?

I'm not going to pretend that the current implementation is great, but I
really don't have time to address the other remaining swamps here.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-19 12:54   ` Andrew Cooper
@ 2017-10-19 13:25     ` Jan Beulich
  2017-10-19 13:31       ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2017-10-19 13:25 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Kevin Tian, Xen-devel

>>> On 19.10.17 at 14:54, <andrew.cooper3@citrix.com> wrote:
> On 19/10/17 13:11, Jan Beulich wrote:
>>>>> On 19.10.17 at 13:26, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/qinval.c
>>> +++ b/xen/drivers/passthrough/vtd/qinval.c
>>> @@ -147,7 +147,8 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>>>                                                u8 iflag, u8 sw, u8 fn,
>>>                                                bool_t flush_dev_iotlb)
>>>  {
>>> -    volatile u32 poll_slot = QINVAL_STAT_INIT;
>> You've lost the initializer.
> 
> Deliberately so.

I don't understand: By never writing QINVAL_STAT_INIT, how can
multiple waits work? Afaict you'll find the variable set to
QINVAL_STAT_DONE the 2nd time you come here, and hence you
won't wait at all.

>>> +    static DEFINE_PER_CPU(u32, poll_slot);
>> volatile u32
> 
> You've clipped out the bit declaring the pointer as volatile, which
> suffices to retain the previous properties.

Still the variable itself would better also be declared volatile.

>> Or alternatively isn't it high time for the
>> interrupt approach to be made work (perhaps not by you, but rather
>> by Intel folks)?
> 
> I'm not going to pretend that the current implementation is great, but I
> really don't have time to address the other remaining swamps here.

Right, hence my hint at this really being something the maintainer(s)
should look after.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-19 13:25     ` Jan Beulich
@ 2017-10-19 13:31       ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2017-10-19 13:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Julien Grall, Kevin Tian, Xen-devel

On 19/10/17 14:25, Jan Beulich wrote:
>>>> On 19.10.17 at 14:54, <andrew.cooper3@citrix.com> wrote:
>> On 19/10/17 13:11, Jan Beulich wrote:
>>>>>> On 19.10.17 at 13:26, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/qinval.c
>>>> +++ b/xen/drivers/passthrough/vtd/qinval.c
>>>> @@ -147,7 +147,8 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>>>>                                                u8 iflag, u8 sw, u8 fn,
>>>>                                                bool_t flush_dev_iotlb)
>>>>  {
>>>> -    volatile u32 poll_slot = QINVAL_STAT_INIT;
>>> You've lost the initializer.
>> Deliberately so.
> I don't understand: By never writing QINVAL_STAT_INIT, how can
> multiple waits work? Afaict you'll find the variable set to
> QINVAL_STAT_DONE the 2nd time you come here, and hence you
> won't wait at all.

Oh yes - you are quite correct.

My test box, which does end up doing frequent queued invalidations,
doesn't appear to suffer any bad side effects from this
quite-clearly-wrong code.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-19 11:26 [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait() Andrew Cooper
  2017-10-19 12:11 ` Jan Beulich
@ 2017-10-19 16:22 ` Andrew Cooper
  2017-10-20  7:12   ` Jan Beulich
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2017-10-19 16:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Kevin Tian, Julien Grall, Jan Beulich

DMA-ing to the stack is generally considered bad practice.  In this case, if a
timeout occurs because of a sluggish device which is processing the request,
the completion notification will corrupt the stack of a subsequent deeper call
tree.

Place the poll_slot in a percpu area and DMA to that instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Julien Grall <julien.grall@arm.com>

Julien: This wants backporting to all releases, and therefore should be
considered for 4.10 at this point.

v2:
 * Retain volatile declaration for poll_slot.
 * Initialise poll_slot to QINVAL_STAT_INIT on each call.
---
 xen/drivers/passthrough/vtd/qinval.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c
index e95dc54..51aef37 100644
--- a/xen/drivers/passthrough/vtd/qinval.c
+++ b/xen/drivers/passthrough/vtd/qinval.c
@@ -147,13 +147,15 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
                                               u8 iflag, u8 sw, u8 fn,
                                               bool_t flush_dev_iotlb)
 {
-    volatile u32 poll_slot = QINVAL_STAT_INIT;
+    static DEFINE_PER_CPU(volatile u32, poll_slot);
     unsigned int index;
     unsigned long flags;
     u64 entry_base;
     struct qinval_entry *qinval_entry, *qinval_entries;
+    volatile u32 *this_poll_slot = &this_cpu(poll_slot);
 
     spin_lock_irqsave(&iommu->register_lock, flags);
+    *this_poll_slot = QINVAL_STAT_INIT;
     index = qinval_next_index(iommu);
     entry_base = iommu_qi_ctrl(iommu)->qinval_maddr +
                  ((index >> QINVAL_ENTRY_ORDER) << PAGE_SHIFT);
@@ -167,7 +169,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
     qinval_entry->q.inv_wait_dsc.lo.res_1 = 0;
     qinval_entry->q.inv_wait_dsc.lo.sdata = QINVAL_STAT_DONE;
     qinval_entry->q.inv_wait_dsc.hi.res_1 = 0;
-    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(&poll_slot) >> 2;
+    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(this_poll_slot) >> 2;
 
     unmap_vtd_domain_page(qinval_entries);
     qinval_update_qtail(iommu, index);
@@ -182,7 +184,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
         timeout = NOW() + MILLISECS(flush_dev_iotlb ?
                                     iommu_dev_iotlb_timeout : VTD_QI_TIMEOUT);
 
-        while ( poll_slot != QINVAL_STAT_DONE )
+        while ( *this_poll_slot != QINVAL_STAT_DONE )
         {
             if ( NOW() > timeout )
             {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-19 16:22 ` [PATCH for-4.10 v2] " Andrew Cooper
@ 2017-10-20  7:12   ` Jan Beulich
  2017-10-20 17:55     ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2017-10-20  7:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Kevin Tian, Xen-devel

>>> On 19.10.17 at 18:22, <andrew.cooper3@citrix.com> wrote:
> DMA-ing to the stack is generally considered bad practice.  In this case, if a
> timeout occurs because of a sluggish device which is processing the request,
> the completion notification will corrupt the stack of a subsequent deeper call
> tree.
> 
> Place the poll_slot in a percpu area and DMA to that instead.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Please could you extend the commit message to state the issue
remaining with using a single per-CPU slot? With that
Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit ...

> @@ -167,7 +169,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>      qinval_entry->q.inv_wait_dsc.lo.res_1 = 0;
>      qinval_entry->q.inv_wait_dsc.lo.sdata = QINVAL_STAT_DONE;
>      qinval_entry->q.inv_wait_dsc.hi.res_1 = 0;
> -    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(&poll_slot) >> 2;
> +    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(this_poll_slot) >> 2;

... this one is still a literal number rather than something allowing
to associate back where that value is coming from (but since you're
not introducing it here, I also won't insist on you changing it in this
patch).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-20  7:12   ` Jan Beulich
@ 2017-10-20 17:55     ` Andrew Cooper
  2017-10-23  7:05       ` Tian, Kevin
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrew Cooper @ 2017-10-20 17:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Julien Grall, Kevin Tian, Xen-devel

On 20/10/17 08:12, Jan Beulich wrote:
>>>> On 19.10.17 at 18:22, <andrew.cooper3@citrix.com> wrote:
>> DMA-ing to the stack is generally considered bad practice.  In this case, if a
>> timeout occurs because of a sluggish device which is processing the request,
>> the completion notification will corrupt the stack of a subsequent deeper call
>> tree.
>>
>> Place the poll_slot in a percpu area and DMA to that instead.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Please could you extend the commit message to state the issue
> remaining with using a single per-CPU slot? With that
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

How about this?

Note: This change does not address other issues with the current
implementation, such as once a timeout has been suffered, subsequent
completions can't be correlated with their requests.

> albeit ...
>
>> @@ -167,7 +169,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>>      qinval_entry->q.inv_wait_dsc.lo.res_1 = 0;
>>      qinval_entry->q.inv_wait_dsc.lo.sdata = QINVAL_STAT_DONE;
>>      qinval_entry->q.inv_wait_dsc.hi.res_1 = 0;
>> -    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(&poll_slot) >> 2;
>> +    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(this_poll_slot) >> 2;
> ... this one is still a literal number rather than something allowing
> to associate back where that value is coming from (but since you're
> not introducing it here, I also won't insist on you changing it in this
> patch).

I don't understand.  What is still a literal number?

The sdata field is software-chosen, so one option would be to count the
number of wait descriptors we've requested of the hardware, and have it
echo that back.  That would allow us to detect if it is out of date.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-20 17:55     ` Andrew Cooper
@ 2017-10-23  7:05       ` Tian, Kevin
       [not found]       ` <AADFC41AFE54684AB9EE6CBC0274A5D190E3101C@SHSMSX101.ccr.corp.intel.com>
  2017-10-23  7:18       ` Jan Beulich
  2 siblings, 0 replies; 12+ messages in thread
From: Tian, Kevin @ 2017-10-23  7:05 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: Julien Grall, Xen-devel

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Saturday, October 21, 2017 1:55 AM
> 
> On 20/10/17 08:12, Jan Beulich wrote:
> >>>> On 19.10.17 at 18:22, <andrew.cooper3@citrix.com> wrote:
> >> DMA-ing to the stack is generally considered bad practice.  In this case, if
> a
> >> timeout occurs because of a sluggish device which is processing the
> request,
> >> the completion notification will corrupt the stack of a subsequent
> deeper call
> >> tree.
> >>
> >> Place the poll_slot in a percpu area and DMA to that instead.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Please could you extend the commit message to state the issue
> > remaining with using a single per-CPU slot? With that
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> How about this?
> 
> Note: This change does not address other issues with the current
> implementation, such as once a timeout has been suffered, subsequent
> completions can't be correlated with their requests.
> 

Can you increase the poll data +1 every time when a new wait
comes?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
       [not found]       ` <AADFC41AFE54684AB9EE6CBC0274A5D190E3101C@SHSMSX101.ccr.corp.intel.com>
@ 2017-10-23  7:06         ` Tian, Kevin
  0 siblings, 0 replies; 12+ messages in thread
From: Tian, Kevin @ 2017-10-23  7:06 UTC (permalink / raw)
  To: 'Andrew Cooper', Jan Beulich; +Cc: Julien Grall, Xen-devel

> From: Tian, Kevin
> Sent: Monday, October 23, 2017 3:05 PM
> 
> > From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> > Sent: Saturday, October 21, 2017 1:55 AM
> >
> > On 20/10/17 08:12, Jan Beulich wrote:
> > >>>> On 19.10.17 at 18:22, <andrew.cooper3@citrix.com> wrote:
> > >> DMA-ing to the stack is generally considered bad practice.  In this case,
> if
> > a
> > >> timeout occurs because of a sluggish device which is processing the
> > request,
> > >> the completion notification will corrupt the stack of a subsequent
> > deeper call
> > >> tree.
> > >>
> > >> Place the poll_slot in a percpu area and DMA to that instead.
> > >>
> > >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Please could you extend the commit message to state the issue
> > > remaining with using a single per-CPU slot? With that
> > > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >
> > How about this?
> >
> > Note: This change does not address other issues with the current
> > implementation, such as once a timeout has been suffered, subsequent
> > completions can't be correlated with their requests.
> >
> 
> Can you increase the poll data +1 every time when a new wait
> comes?
> 

btw curious whether you observed a real issue or just eye-caught
the problem?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-20 17:55     ` Andrew Cooper
  2017-10-23  7:05       ` Tian, Kevin
       [not found]       ` <AADFC41AFE54684AB9EE6CBC0274A5D190E3101C@SHSMSX101.ccr.corp.intel.com>
@ 2017-10-23  7:18       ` Jan Beulich
  2017-10-23  8:06         ` Andrew Cooper
  2 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2017-10-23  7:18 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Kevin Tian, Xen-devel

>>> On 20.10.17 at 19:55, <andrew.cooper3@citrix.com> wrote:
> On 20/10/17 08:12, Jan Beulich wrote:
>>>>> On 19.10.17 at 18:22, <andrew.cooper3@citrix.com> wrote:
>>> DMA-ing to the stack is generally considered bad practice.  In this case, if 
> a
>>> timeout occurs because of a sluggish device which is processing the request,
>>> the completion notification will corrupt the stack of a subsequent deeper 
> call
>>> tree.
>>>
>>> Place the poll_slot in a percpu area and DMA to that instead.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Please could you extend the commit message to state the issue
>> remaining with using a single per-CPU slot? With that
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> How about this?
> 
> Note: This change does not address other issues with the current
> implementation, such as once a timeout has been suffered, subsequent
> completions can't be correlated with their requests.

Sounds good.

>> albeit ...
>>
>>> @@ -167,7 +169,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>>>      qinval_entry->q.inv_wait_dsc.lo.res_1 = 0;
>>>      qinval_entry->q.inv_wait_dsc.lo.sdata = QINVAL_STAT_DONE;
>>>      qinval_entry->q.inv_wait_dsc.hi.res_1 = 0;
>>> -    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(&poll_slot) >> 2;
>>> +    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(this_poll_slot) >> 2;
>> ... this one is still a literal number rather than something allowing
>> to associate back where that value is coming from (but since you're
>> not introducing it here, I also won't insist on you changing it in this
>> patch).
> 
> I don't understand.  What is still a literal number?

There's still that literal 2 there as the shift count.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
  2017-10-23  7:18       ` Jan Beulich
@ 2017-10-23  8:06         ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2017-10-23  8:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Julien Grall, Kevin Tian, Xen-devel

On 23/10/2017 08:18, Jan Beulich wrote:
>>>> On 20.10.17 at 19:55, <andrew.cooper3@citrix.com> wrote:
>> On 20/10/17 08:12, Jan Beulich wrote:
>>>>>> On 19.10.17 at 18:22, <andrew.cooper3@citrix.com> wrote:
>>>> DMA-ing to the stack is generally considered bad practice.  In this case, if 
>> a
>>>> timeout occurs because of a sluggish device which is processing the request,
>>>> the completion notification will corrupt the stack of a subsequent deeper 
>> call
>>>> tree.
>>>>
>>>> Place the poll_slot in a percpu area and DMA to that instead.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Please could you extend the commit message to state the issue
>>> remaining with using a single per-CPU slot? With that
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> How about this?
>>
>> Note: This change does not address other issues with the current
>> implementation, such as once a timeout has been suffered, subsequent
>> completions can't be correlated with their requests.
> Sounds good.
>
>>> albeit ...
>>>
>>>> @@ -167,7 +169,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu,
>>>>      qinval_entry->q.inv_wait_dsc.lo.res_1 = 0;
>>>>      qinval_entry->q.inv_wait_dsc.lo.sdata = QINVAL_STAT_DONE;
>>>>      qinval_entry->q.inv_wait_dsc.hi.res_1 = 0;
>>>> -    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(&poll_slot) >> 2;
>>>> +    qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(this_poll_slot) >> 2;
>>> ... this one is still a literal number rather than something allowing
>>> to associate back where that value is coming from (but since you're
>>> not introducing it here, I also won't insist on you changing it in this
>>> patch).
>> I don't understand.  What is still a literal number?
> There's still that literal 2 there as the shift count.

Ah - that is because of the bitfield definition of saddr.  It is a 32bit
field, but the bottom two bits are reserved, to cause an aligned dword
write.

It would probably be cleaner to not declare saddr as a bitfield, and
rely on the alignment of u32 to keep the bottom two bits clear.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-23  8:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-19 11:26 [PATCH for-4.10] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait() Andrew Cooper
2017-10-19 12:11 ` Jan Beulich
2017-10-19 12:54   ` Andrew Cooper
2017-10-19 13:25     ` Jan Beulich
2017-10-19 13:31       ` Andrew Cooper
2017-10-19 16:22 ` [PATCH for-4.10 v2] " Andrew Cooper
2017-10-20  7:12   ` Jan Beulich
2017-10-20 17:55     ` Andrew Cooper
2017-10-23  7:05       ` Tian, Kevin
     [not found]       ` <AADFC41AFE54684AB9EE6CBC0274A5D190E3101C@SHSMSX101.ccr.corp.intel.com>
2017-10-23  7:06         ` Tian, Kevin
2017-10-23  7:18       ` Jan Beulich
2017-10-23  8:06         ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.