All of lore.kernel.org
 help / color / mirror / Atom feed
* Inconsistent use of libxl__device_pci_reset() when adding/removing all functions of a device
@ 2014-02-04 18:19 Martin Öhrling
  2014-02-04 19:55 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Öhrling @ 2014-02-04 18:19 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 961 bytes --]

I tried to use vga passthrough with an AMD card and ran into the issues
with dom0 crash on domU reboot. My intention is to debug the issue and I
have started to read up on the code for pci passthrough. The following
observations may not be an error but perhaps someone could shed some light
over the design intentions.

My first impression was that libxl__device_pci_reset() is a function level
reset function. It is called each time a single function of a device is
added or removed. A device reset would break other domU:s if other
functions of the device are passed through to these domU:s.

The strange thing is that there is only a single libxl__device_pci_reset()
call when pcidev->vfunc_mask is set to LIBXL_PCI_FUNC_ALL. I'm getting the
impression that the function is assumed to be a device reset function.

Is the attempt to reset a function, when all other functions belongs to
pciback, detected and replaced by a device reset?

Best Regards,
Martin

[-- Attachment #1.2: Type: text/html, Size: 1349 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inconsistent use of libxl__device_pci_reset() when adding/removing all functions of a device
  2014-02-04 18:19 Inconsistent use of libxl__device_pci_reset() when adding/removing all functions of a device Martin Öhrling
@ 2014-02-04 19:55 ` Konrad Rzeszutek Wilk
  2014-02-04 20:05   ` Sander Eikelenboom
  2014-02-25 11:31   ` Martin Öhrling
  0 siblings, 2 replies; 4+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-02-04 19:55 UTC (permalink / raw)
  To: Martin Öhrling, linux; +Cc: xen-devel

On Tue, Feb 04, 2014 at 07:19:16PM +0100, Martin Öhrling wrote:
> I tried to use vga passthrough with an AMD card and ran into the issues
> with dom0 crash on domU reboot. My intention is to debug the issue and I
> have started to read up on the code for pci passthrough. The following
> observations may not be an error but perhaps someone could shed some light
> over the design intentions.
> 
> My first impression was that libxl__device_pci_reset() is a function level
> reset function. It is called each time a single function of a device is
> added or removed. A device reset would break other domU:s if other
> functions of the device are passed through to these domU:s.
> 
> The strange thing is that there is only a single libxl__device_pci_reset()
> call when pcidev->vfunc_mask is set to LIBXL_PCI_FUNC_ALL. I'm getting the
> impression that the function is assumed to be a device reset function.
> 
> Is the attempt to reset a function, when all other functions belongs to
> pciback, detected and replaced by a device reset?

Yes with these patches:
https://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=devel/xen-pciback.slot_and_bus.v0

But the last one seems to hang pciback when the device is unbound.

> 
> Best Regards,
> Martin

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inconsistent use of libxl__device_pci_reset() when adding/removing all functions of a device
  2014-02-04 19:55 ` Konrad Rzeszutek Wilk
@ 2014-02-04 20:05   ` Sander Eikelenboom
  2014-02-25 11:31   ` Martin Öhrling
  1 sibling, 0 replies; 4+ messages in thread
From: Sander Eikelenboom @ 2014-02-04 20:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Martin Öhrling, xen-devel


Tuesday, February 4, 2014, 8:55:51 PM, you wrote:

> On Tue, Feb 04, 2014 at 07:19:16PM +0100, Martin Öhrling wrote:
>> I tried to use vga passthrough with an AMD card and ran into the issues
>> with dom0 crash on domU reboot. My intention is to debug the issue and I
>> have started to read up on the code for pci passthrough. The following
>> observations may not be an error but perhaps someone could shed some light
>> over the design intentions.
>> 
>> My first impression was that libxl__device_pci_reset() is a function level
>> reset function. It is called each time a single function of a device is
>> added or removed. A device reset would break other domU:s if other
>> functions of the device are passed through to these domU:s.
>> 
>> The strange thing is that there is only a single libxl__device_pci_reset()
>> call when pcidev->vfunc_mask is set to LIBXL_PCI_FUNC_ALL. I'm getting the
>> impression that the function is assumed to be a device reset function.
>> 
>> Is the attempt to reset a function, when all other functions belongs to
>> pciback, detected and replaced by a device reset?

> Yes with these patches:
> https://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=devel/xen-pciback.slot_and_bus.v0

> But the last one seems to hang pciback when the device is unbound.

Another thing i'm seeing now sometimes is the "irq 16: nobody cared" after shutting down
a guest which has a pci device passed through, don't know for sure if that's only since
running with those 4 patches from that tree (could perhaps be due to that patch changing some order,
opening a very small race window there ..

But these patches do improve the other case in which it was crashing when using xl pci-assignable-remove.

>> 
>> Best Regards,
>> Martin

>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inconsistent use of libxl__device_pci_reset() when adding/removing all functions of a device
  2014-02-04 19:55 ` Konrad Rzeszutek Wilk
  2014-02-04 20:05   ` Sander Eikelenboom
@ 2014-02-25 11:31   ` Martin Öhrling
  1 sibling, 0 replies; 4+ messages in thread
From: Martin Öhrling @ 2014-02-25 11:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3444 bytes --]

Hi Konrad,

I finally got some time to dig into why I had issues with the last patch.
Added a lot of
printk:s to find out what happened. My conclusions, possibly incorrect, are
as follows:

Each device bound to the pciback module will be reset twice during boot.
The done-counter
in the completion struct will be set to 2 since no calls have been made to
wait_for_completion.

The counter in the completion struct will be decremented to 1 at the first
pcistub_get_pci_dev()
or pcistub_get_pci_dev_by_slot() call.

Something bad happens when pcistub_put_pci_dev() during shutdown of domU.
It appears to me
as if the function is expected to block until reset is completed. Code from
the patch file:
+ schedule_work(&found_psdev->reset_work); + + /* Wait .. wait */ +
wait_for_completion(&found_psdev->reset_done); +

The function will not wait since the done-counter is set to 1. The
pcistub_put_pci_dev()
will return
before the work item is excecuted. I do not belive that this was intended.
The completion structs done
member is decremented to zero but once again set to 1 when the reset work
item has been executed.

It is possible to boot the DomU a second time (done-counter in completion
struct goes down to zero).
The problem is that pcistub_put_pci_dev() will leave the done-counter at
zero when the domU is shut
down for the second time. Trying to boot the domU a third time will stall.

The get-functions expects to be able to decrement the completion counter
without initiation work that
increments the counter and the put-function initiates the same number of
reset works as the function
calls wait_for_completion(). It doesn't seem right. Could also explain why
unbound devices hangs. Their
completion struct is not likely to be initiated with a count of 2 after
startup.

Best Regards,
/Martin




2014-02-04 20:55 GMT+01:00 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:

> On Tue, Feb 04, 2014 at 07:19:16PM +0100, Martin Öhrling wrote:
> > I tried to use vga passthrough with an AMD card and ran into the issues
> > with dom0 crash on domU reboot. My intention is to debug the issue and I
> > have started to read up on the code for pci passthrough. The following
> > observations may not be an error but perhaps someone could shed some
> light
> > over the design intentions.
> >
> > My first impression was that libxl__device_pci_reset() is a function
> level
> > reset function. It is called each time a single function of a device is
> > added or removed. A device reset would break other domU:s if other
> > functions of the device are passed through to these domU:s.
> >
> > The strange thing is that there is only a single
> libxl__device_pci_reset()
> > call when pcidev->vfunc_mask is set to LIBXL_PCI_FUNC_ALL. I'm getting
> the
> > impression that the function is assumed to be a device reset function.
> >
> > Is the attempt to reset a function, when all other functions belongs to
> > pciback, detected and replaced by a device reset?
>
> Yes with these patches:
>
> https://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=devel/xen-pciback.slot_and_bus.v0
>
> But the last one seems to hang pciback when the device is unbound.
>
> >
> > Best Regards,
> > Martin
>
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
>
>

[-- Attachment #1.2: Type: text/html, Size: 6758 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-25 11:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-04 18:19 Inconsistent use of libxl__device_pci_reset() when adding/removing all functions of a device Martin Öhrling
2014-02-04 19:55 ` Konrad Rzeszutek Wilk
2014-02-04 20:05   ` Sander Eikelenboom
2014-02-25 11:31   ` Martin Öhrling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.