All of lore.kernel.org
 help / color / mirror / Atom feed
* iommu=dom0-passthrough behavior
@ 2012-11-05 14:30 Jan Beulich
  2012-11-13  0:11 ` Zhang, Yang Z
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-05 14:30 UTC (permalink / raw)
  To: wei.huang2, weiwang.dd, xiantao.zhang; +Cc: xen-devel

All,

so far it was my understanding that this option is intended to get
the DMA behavior that Dom0 observes as close as possible to how
it would be without IOMMU.

However, we're now dealing with a customer report where a
single function device is observed to initiate DMA operations
appearing to originate from function 1, which makes obvious that
the option above is not making things as transparent as I would
have expected them to be: Without IOMMU, such requests get
processed fine, while with IOMMU (due to there not being a
context entry for the bogus device) the device fails to initialize
(causing DMA faults, the presence of which I had to convince
myself of separately, as for whatever reason at least the VT-d
code doesn't issue any log message in that case).

So I'm now seeking for alternative workaround suggestions that
we could pass to that customer (less intrusive than "iommu=off").

Thanks, Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-05 14:30 iommu=dom0-passthrough behavior Jan Beulich
@ 2012-11-13  0:11 ` Zhang, Yang Z
  2012-11-13  8:07   ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Yang Z @ 2012-11-13  0:11 UTC (permalink / raw)
  To: Jan Beulich, wei.huang2, weiwang.dd, Zhang, Xiantao; +Cc: xen-devel

Jan Beulich wrote on 2012-11-05:
> All,
> 
> so far it was my understanding that this option is intended to get
> the DMA behavior that Dom0 observes as close as possible to how
> it would be without IOMMU.

Correct. There is a bit in context entry which controlling the DMA request(from this device) to walk or not walk the iommu page table.
As we known, walking page table introduced extra cost, so we use this parameter to make sure the device which owned by dom0 not to walking iommu page table when DMA request is arrived.
 
> However, we're now dealing with a customer report where a
> single function device is observed to initiate DMA operations
> appearing to originate from function 1, which makes obvious that
> the option above is not making things as transparent as I would
> have expected them to be: Without IOMMU, such requests get
> processed fine, while with IOMMU (due to there not being a
> context entry for the bogus device) the device fails to initialize
> (causing DMA faults, the presence of which I had to convince
> myself of separately, as for whatever reason at least the VT-d
> code doesn't issue any log message in that case).

Sorry, I cannot understand your problem. Is there any bug in current VT-d code?

> So I'm now seeking for alternative workaround suggestions that
> we could pass to that customer (less intrusive than "iommu=off").



Best regards,
Yang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13  0:11 ` Zhang, Yang Z
@ 2012-11-13  8:07   ` Jan Beulich
  2012-11-13  8:50     ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-13  8:07 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 13.11.12 at 01:11, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
> Jan Beulich wrote on 2012-11-05:
>> so far it was my understanding that this option is intended to get
>> the DMA behavior that Dom0 observes as close as possible to how
>> it would be without IOMMU.
> 
> Correct. There is a bit in context entry which controlling the DMA 
> request(from this device) to walk or not walk the iommu page table.
> As we known, walking page table introduced extra cost, so we use this 
> parameter to make sure the device which owned by dom0 not to walking iommu 
> page table when DMA request is arrived.

Okay, so that would be slightly different from the meaning I give
to the option (as described above).

>> However, we're now dealing with a customer report where a
>> single function device is observed to initiate DMA operations
>> appearing to originate from function 1, which makes obvious that
>> the option above is not making things as transparent as I would
>> have expected them to be: Without IOMMU, such requests get
>> processed fine, while with IOMMU (due to there not being a
>> context entry for the bogus device) the device fails to initialize
>> (causing DMA faults, the presence of which I had to convince
>> myself of separately, as for whatever reason at least the VT-d
>> code doesn't issue any log message in that case).
> 
> Sorry, I cannot understand your problem. Is there any bug in current VT-d 
> code?

We need to settle on the concept here first: What specifically is
said option intended to do?

Only then we can talk about bugs, and if there is one I suspect it's
not only in VT-d code, but equally much in AMD IOMMU's.

The thing here is that a device functioning properly without IOMMU
(with "properly" not necessarily meaning it being implemented
correctly as per specification, albeit I also didn't check whether the
spec would allow for the observed behavior) doesn't once DMA
translation is enabled (even if suppressed for Dom0 via above
option).

The problem being that while device enumeration only finds a single
device at function zero of the respective (seg,bus,dev) tuple, DMA
requests - as seen by the IOMMU - originate from non-zero
functions under the same tuple. Since a non-discovered device
doesn't get a context entry inserted, this result in an IOMMU fault,
rendering the device non-functional.

The data from the system I have so far doesn't tell me whether the
device incorrectly claims itself as single function (with the functions
other than func 0 simply not being discovered during device
enumeration, as single function devices don't get their non-zero
functions scanned) or whether the config space for functions 1-7
indeed is unpopulated, with the device issuing requests with non-
zero function number for other, unexplained reasons.

Bottom line - I'm seeking advice as to whether working around
this problem in the IOMMU code is desirable/necessary, or
whether this is a design flaw on the device's side that just
cannot be tolerated with an IOMMU in the picture (which would
need good reasoning, so that a customer expecting such a
device to work regardless of IOMMU usage can understand that
this cannot reasonably be made work).

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13  8:07   ` Jan Beulich
@ 2012-11-13  8:50     ` Zhang, Xiantao
  2012-11-13  9:41       ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Xiantao @ 2012-11-13  8:50 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: wei.huang2, weiwang.dd, Zhang, Xiantao, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, November 13, 2012 4:08 PM
> To: Zhang, Xiantao; Zhang, Yang Z
> Cc: wei.huang2@amd.com; weiwang.dd@gmail.com; xen-devel
> Subject: RE: [Xen-devel] iommu=dom0-passthrough behavior
> 
> >>> On 13.11.12 at 01:11, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
> > Jan Beulich wrote on 2012-11-05:
> >> so far it was my understanding that this option is intended to get
> >> the DMA behavior that Dom0 observes as close as possible to how it
> >> would be without IOMMU.
> >
> > Correct. There is a bit in context entry which controlling the DMA
> > request(from this device) to walk or not walk the iommu page table.
> > As we known, walking page table introduced extra cost, so we use this
> > parameter to make sure the device which owned by dom0 not to walking
> > iommu page table when DMA request is arrived.
> 
> Okay, so that would be slightly different from the meaning I give to the
> option (as described above).
> 
> >> However, we're now dealing with a customer report where a single
> >> function device is observed to initiate DMA operations appearing to
> >> originate from function 1, which makes obvious that the option above
> >> is not making things as transparent as I would have expected them to
> >> be: Without IOMMU, such requests get processed fine, while with
> IOMMU
> >> (due to there not being a context entry for the bogus device) the
> >> device fails to initialize (causing DMA faults, the presence of which
> >> I had to convince myself of separately, as for whatever reason at
> >> least the VT-d code doesn't issue any log message in that case).
> >
> > Sorry, I cannot understand your problem. Is there any bug in current
> > VT-d code?
> 
> We need to settle on the concept here first: What specifically is said option
> intended to do? 

Basically,  this options just allows the transactions from dom0's devices not subject to VT-d engine.  Actually,  It is not targeted to fix something, but just allows users isolating VT-d issues from dom0. 
As I know, in early days, VT-d is not that stable,   if dom0's devices are controlled by VT-d,  some strange issues may trigger in system's boot stage, so use this options to disable VT-d for Dom0. 

> Only then we can talk about bugs, and if there is one I suspect it's not only in
> VT-d code, but equally much in AMD IOMMU's.
> 
> The thing here is that a device functioning properly without IOMMU (with
> "properly" not necessarily meaning it being implemented correctly as per
> specification, albeit I also didn't check whether the spec would allow for the
> observed behavior) doesn't once DMA translation is enabled (even if
> suppressed for Dom0 via above option).
> 
> The problem being that while device enumeration only finds a single device
> at function zero of the respective (seg,bus,dev) tuple, DMA requests - as
> seen by the IOMMU - originate from non-zero functions under the same
> tuple. Since a non-discovered device doesn't get a context entry inserted,
> this result in an IOMMU fault, rendering the device non-functional.
> 
> The data from the system I have so far doesn't tell me whether the device
> incorrectly claims itself as single function (with the functions other than func
> 0 simply not being discovered during device enumeration, as single function
> devices don't get their non-zero functions scanned) or whether the config
> space for functions 1-7 indeed is unpopulated, with the device issuing
> requests with non- zero function number for other, unexplained reasons.


> Bottom line - I'm seeking advice as to whether working around this problem
> in the IOMMU code is desirable/necessary, or whether this is a design flaw
> on the device's side that just cannot be tolerated with an IOMMU in the
> picture (which would need good reasoning, so that a customer expecting
> such a device to work regardless of IOMMU usage can understand that this
> cannot reasonably be made work).

The issue is why the non-zero functions don't claim themselves during PCI bus scan.   From security point of view,  VT-d shouldn't allow transactions from the unknown devices. 
Xiantao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13  8:50     ` Zhang, Xiantao
@ 2012-11-13  9:41       ` Jan Beulich
  2012-11-13 11:13         ` Zhang, Yang Z
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-13  9:41 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 13.11.12 at 09:50, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> We need to settle on the concept here first: What specifically is said 
>> option intended to do? 
> 
> Basically,  this options just allows the transactions from dom0's devices 
> not subject to VT-d engine.  Actually,  It is not targeted to fix something, 
> but just allows users isolating VT-d issues from dom0. 
> As I know, in early days, VT-d is not that stable,   if dom0's devices are 
> controlled by VT-d,  some strange issues may trigger in system's boot stage, 
> so use this options to disable VT-d for Dom0. 

Which it doesn't, as the specific case here shows.

>> Bottom line - I'm seeking advice as to whether working around this problem
>> in the IOMMU code is desirable/necessary, or whether this is a design flaw
>> on the device's side that just cannot be tolerated with an IOMMU in the
>> picture (which would need good reasoning, so that a customer expecting
>> such a device to work regardless of IOMMU usage can understand that this
>> cannot reasonably be made work).
> 
> The issue is why the non-zero functions don't claim themselves during PCI bus 
> scan.

As said - I can't tell whether there is a secondary function in the
first place (and I didn't try to find out because it doesn't really
matter for the purpose of finding a solution/workaround).

>   From security point of view,  VT-d shouldn't allow transactions from 
> the unknown devices. 

That's inconsistent with what you say above: Either there is a way
to suppress IOMMU involvement in Dom0 operation (which is
inherently insecure), or there is not. If there is, there's no point in
claiming security for one aspect but not another.

I think it is obvious that I'm not suggesting to allow pass through of
such a device at this point (albeit even that would seem possible and,
from a customer's pov, desirable), and as long as users are aware of
the security implications when using the option under discussion here,
I don't see why that option shouldn't be made fully work. It should be
left to the admins of individual systems to decide between security
and functionality, we should provide them with ways to implement
their choice.

Otoh I'm unaware of a similar option for native Linux, yet it is
suffering the same problem on that system when DMA translation
gets turned on.

But the direction you give to the discussion doesn't lead us towards
a solution for the customer (or a profound explanation why none
is possible), so if we could please focus on that aspect again.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13  9:41       ` Jan Beulich
@ 2012-11-13 11:13         ` Zhang, Yang Z
  2012-11-13 11:24           ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Yang Z @ 2012-11-13 11:13 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Xiantao; +Cc: wei.huang2, weiwang.dd, xen-devel

Jan Beulich wrote on 2012-11-13:
>>>> On 13.11.12 at 09:50, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> We need to settle on the concept here first: What specifically is said
>>> option intended to do?
>> 
>> Basically,  this options just allows the transactions from dom0's devices
>> not subject to VT-d engine.  Actually,  It is not targeted to fix something,
>> but just allows users isolating VT-d issues from dom0.
>> As I know, in early days, VT-d is not that stable,   if dom0's devices are
>> controlled by VT-d,  some strange issues may trigger in system's boot stage,
>> so use this options to disable VT-d for Dom0.
> 
> Which it doesn't, as the specific case here shows.
> 
>>> Bottom line - I'm seeking advice as to whether working around this problem
>>> in the IOMMU code is desirable/necessary, or whether this is a design flaw
>>> on the device's side that just cannot be tolerated with an IOMMU in the
>>> picture (which would need good reasoning, so that a customer expecting
>>> such a device to work regardless of IOMMU usage can understand that this
>>> cannot reasonably be made work).

Why not just disable the IOMMU in this case ?

>> The issue is why the non-zero functions don't claim themselves during PCI bus
>> scan.
> 
> As said - I can't tell whether there is a secondary function in the
> first place (and I didn't try to find out because it doesn't really
> matter for the purpose of finding a solution/workaround).

If software cannot see it, then how to use it? If there still have an approach to detect it, then xen can do it too and setup the context entry as passthrough.

>>   From security point of view,  VT-d shouldn't allow transactions from
>> the unknown devices.
> 
> That's inconsistent with what you say above: Either there is a way
> to suppress IOMMU involvement in Dom0 operation (which is
> inherently insecure), or there is not. If there is, there's no point in
> claiming security for one aspect but not another.
> 
> I think it is obvious that I'm not suggesting to allow pass through of
> such a device at this point (albeit even that would seem possible and,
> from a customer's pov, desirable), and as long as users are aware of
> the security implications when using the option under discussion here,
> I don't see why that option shouldn't be made fully work. It should be
> left to the admins of individual systems to decide between security
> and functionality, we should provide them with ways to implement
> their choice.
> 
> Otoh I'm unaware of a similar option for native Linux, yet it is
> suffering the same problem on that system when DMA translation
> gets turned on.
> 
> But the direction you give to the discussion doesn't lead us towards
> a solution for the customer (or a profound explanation why none
> is possible), so if we could please focus on that aspect again.
> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13 11:13         ` Zhang, Yang Z
@ 2012-11-13 11:24           ` Jan Beulich
  2012-11-13 15:02             ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-13 11:24 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 13.11.12 at 12:13, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
> Jan Beulich wrote on 2012-11-13:
>>>>> On 13.11.12 at 09:50, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>> Bottom line - I'm seeking advice as to whether working around this problem
>>>> in the IOMMU code is desirable/necessary, or whether this is a design flaw
>>>> on the device's side that just cannot be tolerated with an IOMMU in the
>>>> picture (which would need good reasoning, so that a customer expecting
>>>> such a device to work regardless of IOMMU usage can understand that this
>>>> cannot reasonably be made work).
> 
> Why not just disable the IOMMU in this case ?

Because that disables (secure) pass through of other devices.

>>> The issue is why the non-zero functions don't claim themselves during PCI bus
>>> scan.
>> 
>> As said - I can't tell whether there is a secondary function in the
>> first place (and I didn't try to find out because it doesn't really
>> matter for the purpose of finding a solution/workaround).
> 
> If software cannot see it, then how to use it? If there still have an 
> approach to detect it, then xen can do it too and setup the context entry as 
> passthrough.

We see it the latest at the point the fault occurs. So there are
multiple options:
a) if the device is "real" as in having a valid config space despite
    func 0 not advertising itself as multi-function, we have ways to
    discover the device (at boot time)
b) we could insert the context entry in the fault handler, assuming
    the device is able to recover
c) we could provide a command line option to allow fake devices to
    be created
d) we could create context entries for all BDFs, whether or not a
    device exists there
Does any of these have obvious downsides?

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13 11:24           ` Jan Beulich
@ 2012-11-13 15:02             ` Zhang, Xiantao
  2012-11-13 15:29               ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Xiantao @ 2012-11-13 15:02 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: wei.huang2, weiwang.dd, Zhang, Xiantao, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, November 13, 2012 7:25 PM
> To: Zhang, Xiantao; Zhang, Yang Z
> Cc: wei.huang2@amd.com; weiwang.dd@gmail.com; xen-devel
> Subject: RE: [Xen-devel] iommu=dom0-passthrough behavior
> 
> >>> On 13.11.12 at 12:13, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
> > Jan Beulich wrote on 2012-11-13:
> >>>>> On 13.11.12 at 09:50, "Zhang, Xiantao" <xiantao.zhang@intel.com>
> wrote:
> >>>> From: Jan Beulich [mailto:JBeulich@suse.com] Bottom line - I'm
> >>>> seeking advice as to whether working around this problem in the
> >>>> IOMMU code is desirable/necessary, or whether this is a design flaw
> >>>> on the device's side that just cannot be tolerated with an IOMMU in
> >>>> the picture (which would need good reasoning, so that a customer
> >>>> expecting such a device to work regardless of IOMMU usage can
> >>>> understand that this cannot reasonably be made work).
> >
> > Why not just disable the IOMMU in this case ?
> 
> Because that disables (secure) pass through of other devices.
> 
> >>> The issue is why the non-zero functions don't claim themselves
> >>> during PCI bus scan.
> >>
> >> As said - I can't tell whether there is a secondary function in the
> >> first place (and I didn't try to find out because it doesn't really
> >> matter for the purpose of finding a solution/workaround).
> >
> > If software cannot see it, then how to use it? If there still have an
> > approach to detect it, then xen can do it too and setup the context
> > entry as passthrough.
> 
> We see it the latest at the point the fault occurs. So there are multiple
> options:
> a) if the device is "real" as in having a valid config space despite
>     func 0 not advertising itself as multi-function, we have ways to
>     discover the device (at boot time)

I think current Xen logic also covers multi-function devices case.  When Xen boots up, it will scan all functions of devices on each bus, through reads their vendor ID.  If the vendor ID is valid, Xen deems this BDF has a real device/function existed there


> b) we could insert the context entry in the fault handler, assuming
>     the device is able to recover
At least current VT-d doesn't have recovery fault supported, so each triggered faults are fatal. 

> c) we could provide a command line option to allow fake devices to
>     be create

Agree, this maybe a feasible solution I can figure out, so far. 

> d) we could create context entries for all BDFs, whether or not a
>     device exists there

As I said,  this maybe bring security issue. Even for the iommu-passthrough option,  it is also not suggested to be used if security is considered. 

> Does any of these have obvious downsides?
> 
> Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13 15:02             ` Zhang, Xiantao
@ 2012-11-13 15:29               ` Jan Beulich
  2012-11-14  0:37                 ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-13 15:29 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 13.11.12 at 16:02, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> > If software cannot see it, then how to use it? If there still have an
>> > approach to detect it, then xen can do it too and setup the context
>> > entry as passthrough.
>> 
>> We see it the latest at the point the fault occurs. So there are multiple
>> options:
>> a) if the device is "real" as in having a valid config space despite
>>     func 0 not advertising itself as multi-function, we have ways to
>>     discover the device (at boot time)
> 
> I think current Xen logic also covers multi-function devices case.  When Xen 
> boots up, it will scan all functions of devices on each bus, through reads 
> their vendor ID.  If the vendor ID is valid, Xen deems this BDF has a real 
> device/function existed there

Not anymore - we're now honoring the multi-function flag in
_scan_pci_devices() (as Linux always did in its bus scan logic);
see c/s 22337:7afd8dd1d6cb (with a subsequent adjustment in
c/s 25869:59b3663316db).

>> b) we could insert the context entry in the fault handler, assuming
>>     the device is able to recover
> At least current VT-d doesn't have recovery fault supported, so each 
> triggered faults are fatal. 

Fatal in what sense? I would assume that if the driver retries the
operation, it would succeed if the first fault causes the context
entry to be inserted.

>> c) we could provide a command line option to allow fake devices to
>>     be create
> 
> Agree, this maybe a feasible solution I can figure out, so far. 
> 
>> d) we could create context entries for all BDFs, whether or not a
>>     device exists there
> 
> As I said,  this maybe bring security issue. Even for the iommu-passthrough 
> option,  it is also not suggested to be used if security is considered. 

As said - it is clear that the basic thing here (using
"iommu=dom0-passthrough") is already weakening security. So
security isn't the concern in this discussion, that's left to whoever
is intending to use that option.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-13 15:29               ` Jan Beulich
@ 2012-11-14  0:37                 ` Zhang, Xiantao
  2012-11-14 13:40                   ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Xiantao @ 2012-11-14  0:37 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: wei.huang2, weiwang.dd, Zhang, Xiantao, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, November 13, 2012 11:29 PM
> To: Zhang, Xiantao; Zhang, Yang Z
> Cc: wei.huang2@amd.com; weiwang.dd@gmail.com; xen-devel
> Subject: RE: [Xen-devel] iommu=dom0-passthrough behavior
> 
> >>> On 13.11.12 at 16:02, "Zhang, Xiantao" <xiantao.zhang@intel.com>
> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> > If software cannot see it, then how to use it? If there still have
> >> > an approach to detect it, then xen can do it too and setup the
> >> > context entry as passthrough.
> >>
> >> We see it the latest at the point the fault occurs. So there are
> >> multiple
> >> options:
> >> a) if the device is "real" as in having a valid config space despite
> >>     func 0 not advertising itself as multi-function, we have ways to
> >>     discover the device (at boot time)
> >
> > I think current Xen logic also covers multi-function devices case.
> > When Xen boots up, it will scan all functions of devices on each bus,
> > through reads their vendor ID.  If the vendor ID is valid, Xen deems
> > this BDF has a real device/function existed there
> 
> Not anymore - we're now honoring the multi-function flag in
> _scan_pci_devices() (as Linux always did in its bus scan logic); see c/s
> 22337:7afd8dd1d6cb (with a subsequent adjustment in c/s
> 25869:59b3663316db).
> 
> >> b) we could insert the context entry in the fault handler, assuming
> >>     the device is able to recover
> > At least current VT-d doesn't have recovery fault supported, so each
> > triggered faults are fatal.
> 
> Fatal in what sense? I would assume that if the driver retries the operation, it
> would succeed if the first fault causes the context entry to be inserted.

If the driver knows the fact, it should work.  For VT-d without fault recovery capability,  it should do nothing for the unknown translations which maybe faked by some malicious devices. 

> >> c) we could provide a command line option to allow fake devices to
> >>     be create
> >
> > Agree, this maybe a feasible solution I can figure out, so far.
> >
> >> d) we could create context entries for all BDFs, whether or not a
> >>     device exists there
> >
> > As I said,  this maybe bring security issue. Even for the
> > iommu-passthrough option,  it is also not suggested to be used if security is
> considered.
> 
> As said - it is clear that the basic thing here (using
> "iommu=dom0-passthrough") is already weakening security. So security isn't
> the concern in this discussion, that's left to whoever is intending to use that
> option.

Okay,  I vote your option C if don't care security. 
Xiantao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-14  0:37                 ` Zhang, Xiantao
@ 2012-11-14 13:40                   ` Jan Beulich
  2012-11-15  8:23                     ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-14 13:40 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 14.11.12 at 01:37, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>> >> c) we could provide a command line option to allow fake devices to
>> >>     be create
>> >
>> > Agree, this maybe a feasible solution I can figure out, so far.
>> >
>> >> d) we could create context entries for all BDFs, whether or not a
>> >>     device exists there
>> >
>> > As I said,  this maybe bring security issue. Even for the
>> > iommu-passthrough option,  it is also not suggested to be used if security 
> is
>> considered.
>> 
>> As said - it is clear that the basic thing here (using
>> "iommu=dom0-passthrough") is already weakening security. So security isn't
>> the concern in this discussion, that's left to whoever is intending to use 
> that
>> option.
> 
> Okay,  I vote your option C if don't care security. 

Which, if I'm not mistaken, could be implemented entirely
independent of "iommu=dom0-passthrough". I'll see if that
helps on the offending system.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-14 13:40                   ` Jan Beulich
@ 2012-11-15  8:23                     ` Zhang, Xiantao
  2012-11-15  9:05                       ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Xiantao @ 2012-11-15  8:23 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: wei.huang2, weiwang.dd, Zhang, Xiantao, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, November 14, 2012 9:40 PM
> To: Zhang, Xiantao; Zhang, Yang Z
> Cc: wei.huang2@amd.com; weiwang.dd@gmail.com; xen-devel
> Subject: RE: [Xen-devel] iommu=dom0-passthrough behavior
> 
> >>> On 14.11.12 at 01:37, "Zhang, Xiantao" <xiantao.zhang@intel.com>
> wrote:
> >> >> c) we could provide a command line option to allow fake devices to
> >> >>     be create
> >> >
> >> > Agree, this maybe a feasible solution I can figure out, so far.
> >> >
> >> >> d) we could create context entries for all BDFs, whether or not a
> >> >>     device exists there
> >> >
> >> > As I said,  this maybe bring security issue. Even for the
> >> > iommu-passthrough option,  it is also not suggested to be used if
> >> > security
> > is
> >> considered.
> >>
> >> As said - it is clear that the basic thing here (using
> >> "iommu=dom0-passthrough") is already weakening security. So security
> >> isn't the concern in this discussion, that's left to whoever is
> >> intending to use
> > that
> >> option.
> >
> > Okay,  I vote your option C if don't care security.
> 
> Which, if I'm not mistaken, could be implemented entirely independent of
> "iommu=dom0-passthrough". I'll see if that helps on the offending system.

I mean this one: 
>>c) we could provide a command line option to allow fake devices to be create

Yes,  I don't think "iommu=dom0-passthrough" can meet your requirement.
 We had better add a cmd line option to  pass the related information to hypervisor and VT-d can create 
the pass-through context entry  for the undetectable device.  

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-15  8:23                     ` Zhang, Xiantao
@ 2012-11-15  9:05                       ` Jan Beulich
  2012-11-16  6:21                         ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-15  9:05 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 15.11.12 at 09:23, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, November 14, 2012 9:40 PM
>> To: Zhang, Xiantao; Zhang, Yang Z
>> Cc: wei.huang2@amd.com; weiwang.dd@gmail.com; xen-devel
>> Subject: RE: [Xen-devel] iommu=dom0-passthrough behavior
>> 
>> >>> On 14.11.12 at 01:37, "Zhang, Xiantao" <xiantao.zhang@intel.com>
>> wrote:
>> >> >> c) we could provide a command line option to allow fake devices to
>> >> >>     be create
>> >> >
>> >> > Agree, this maybe a feasible solution I can figure out, so far.
>> >> >
>> >> >> d) we could create context entries for all BDFs, whether or not a
>> >> >>     device exists there
>> >> >
>> >> > As I said,  this maybe bring security issue. Even for the
>> >> > iommu-passthrough option,  it is also not suggested to be used if
>> >> > security
>> > is
>> >> considered.
>> >>
>> >> As said - it is clear that the basic thing here (using
>> >> "iommu=dom0-passthrough") is already weakening security. So security
>> >> isn't the concern in this discussion, that's left to whoever is
>> >> intending to use
>> > that
>> >> option.
>> >
>> > Okay,  I vote your option C if don't care security.
>> 
>> Which, if I'm not mistaken, could be implemented entirely independent of
>> "iommu=dom0-passthrough". I'll see if that helps on the offending system.
> 
> I mean this one: 
>>>c) we could provide a command line option to allow fake devices to be create
> 
> Yes,  I don't think "iommu=dom0-passthrough" can meet your requirement.
>  We had better add a cmd line option to  pass the related information to 
> hypervisor and VT-d can create 
> the pass-through context entry  for the undetectable device.  

You misunderstood: What I was saying (and seeking confirmation)
is that I don't think the new command line option would need to
have any connection to the existing, non-suitable one. In
particular, for it to take effect, "iommu=dom0-passthrough"
wouldn't need to be specified at all.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-15  9:05                       ` Jan Beulich
@ 2012-11-16  6:21                         ` Zhang, Xiantao
  2012-11-16  8:22                           ` Jan Beulich
  2012-11-16  9:26                           ` Jan Beulich
  0 siblings, 2 replies; 18+ messages in thread
From: Zhang, Xiantao @ 2012-11-16  6:21 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: wei.huang2, weiwang.dd, Zhang, Xiantao, xen-devel

> >> Which, if I'm not mistaken, could be implemented entirely independent
> >> of "iommu=dom0-passthrough". I'll see if that helps on the offending
> system.
> >
> > I mean this one:
> >>>c) we could provide a command line option to allow fake devices to be
> >>>create
> >
> > Yes,  I don't think "iommu=dom0-passthrough" can meet your requirement.
> >  We had better add a cmd line option to  pass the related information
> > to hypervisor and VT-d can create the pass-through context entry  for
> > the undetectable device.
> 
> You misunderstood: What I was saying (and seeking confirmation) is that I
> don't think the new command line option would need to have any
> connection to the existing, non-suitable one. In particular, for it to take effect,
> "iommu=dom0-passthrough"
> wouldn't need to be specified at all.
Okay.  Back to your customer's issue, I don't think we have a clean solution if the device can't be detected by hypervisor.   We only can figure out how to workaround this issue through a new command line option.  
BTW,   if the device can't be detected,  how to load its driver by OS ?  
Xiantao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-16  6:21                         ` Zhang, Xiantao
@ 2012-11-16  8:22                           ` Jan Beulich
  2012-11-16  9:26                           ` Jan Beulich
  1 sibling, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2012-11-16  8:22 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 16.11.12 at 07:21, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>> >> Which, if I'm not mistaken, could be implemented entirely independent
>> >> of "iommu=dom0-passthrough". I'll see if that helps on the offending
>> system.
>> >
>> > I mean this one:
>> >>>c) we could provide a command line option to allow fake devices to be
>> >>>create
>> >
>> > Yes,  I don't think "iommu=dom0-passthrough" can meet your requirement.
>> >  We had better add a cmd line option to  pass the related information
>> > to hypervisor and VT-d can create the pass-through context entry  for
>> > the undetectable device.
>> 
>> You misunderstood: What I was saying (and seeking confirmation) is that I
>> don't think the new command line option would need to have any
>> connection to the existing, non-suitable one. In particular, for it to take 
> effect,
>> "iommu=dom0-passthrough"
>> wouldn't need to be specified at all.
> Okay.  Back to your customer's issue, I don't think we have a clean solution 
> if the device can't be detected by hypervisor.   We only can figure out how 
> to workaround this issue through a new command line option.  
> BTW,   if the device can't be detected,  how to load its driver by OS ?  

Once again - the device a function 0 is being detected, but when the
driver loads we see at least on DMA operation with a source ID of
function 1 under the same bus and slot. There's no driver needed for
the phantom device at function 1.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-16  6:21                         ` Zhang, Xiantao
  2012-11-16  8:22                           ` Jan Beulich
@ 2012-11-16  9:26                           ` Jan Beulich
  2012-11-16  9:43                             ` Zhang, Xiantao
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-11-16  9:26 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 16.11.12 at 07:21, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
> Okay.  Back to your customer's issue, I don't think we have a clean solution 
> if the device can't be detected by hypervisor.   We only can figure out how 
> to workaround this issue through a new command line option.  

So after I implemented a draft patch for this yesterday, my
attention now was directed to phantom functions permitted by
the PCIe spec (of which I was entirely unaware so far). While
Linux to date doesn't enable them, I can't exclude that the BIOS
might be (requested respective data to find out). And even if it
doesn't, adding proper support for that functionality, and then
putting a quirk on top of that would seem the much cleaner
solution (pretty likely me_wifi_quirk() could then also be made
fit in there).

Thoughts?

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-16  9:26                           ` Jan Beulich
@ 2012-11-16  9:43                             ` Zhang, Xiantao
  2012-11-16  9:53                               ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Xiantao @ 2012-11-16  9:43 UTC (permalink / raw)
  To: Jan Beulich, Zhang, Yang Z
  Cc: wei.huang2, weiwang.dd, Zhang, Xiantao, xen-devel

> So after I implemented a draft patch for this yesterday, my attention now
> was directed to phantom functions permitted by the PCIe spec (of which I
> was entirely unaware so far). While Linux to date doesn't enable them, I can't
> exclude that the BIOS might be (requested respective data to find out). And
> even if it doesn't, adding proper support for that functionality, and then
> putting a quirk on top of that would seem the much cleaner solution (pretty
> likely me_wifi_quirk() could then also be made fit in there).
> 
> Thoughts?
Agree, Xen doesn't have phantom functions supported with VT-d, so it is  better to create a quirk to fix such issues.  
Xiantao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: iommu=dom0-passthrough behavior
  2012-11-16  9:43                             ` Zhang, Xiantao
@ 2012-11-16  9:53                               ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2012-11-16  9:53 UTC (permalink / raw)
  To: Xiantao Zhang, Yang Z Zhang; +Cc: wei.huang2, weiwang.dd, xen-devel

>>> On 16.11.12 at 10:43, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:
>>  So after I implemented a draft patch for this yesterday, my attention now
>> was directed to phantom functions permitted by the PCIe spec (of which I
>> was entirely unaware so far). While Linux to date doesn't enable them, I 
> can't
>> exclude that the BIOS might be (requested respective data to find out). And
>> even if it doesn't, adding proper support for that functionality, and then
>> putting a quirk on top of that would seem the much cleaner solution (pretty
>> likely me_wifi_quirk() could then also be made fit in there).
>> 
>> Thoughts?
> Agree, Xen doesn't have phantom functions supported with VT-d, so it is  
> better to create a quirk to fix such issues.  

Sorry, no, we should add phantom function support (and not
only to VT-d, but to the generic IOMMU code), and use that (if
necessary, i.e. if the device in question doesn't have phantom
functions turned on) to base a workaround on.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-11-16  9:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-05 14:30 iommu=dom0-passthrough behavior Jan Beulich
2012-11-13  0:11 ` Zhang, Yang Z
2012-11-13  8:07   ` Jan Beulich
2012-11-13  8:50     ` Zhang, Xiantao
2012-11-13  9:41       ` Jan Beulich
2012-11-13 11:13         ` Zhang, Yang Z
2012-11-13 11:24           ` Jan Beulich
2012-11-13 15:02             ` Zhang, Xiantao
2012-11-13 15:29               ` Jan Beulich
2012-11-14  0:37                 ` Zhang, Xiantao
2012-11-14 13:40                   ` Jan Beulich
2012-11-15  8:23                     ` Zhang, Xiantao
2012-11-15  9:05                       ` Jan Beulich
2012-11-16  6:21                         ` Zhang, Xiantao
2012-11-16  8:22                           ` Jan Beulich
2012-11-16  9:26                           ` Jan Beulich
2012-11-16  9:43                             ` Zhang, Xiantao
2012-11-16  9:53                               ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.