Hi Konrad, I finally got some time to dig into why I had issues with the last patch. Added a lot of printk:s to find out what happened. My conclusions, possibly incorrect, are as follows: Each device bound to the pciback module will be reset twice during boot. The done-counter in the completion struct will be set to 2 since no calls have been made to wait_for_completion. The counter in the completion struct will be decremented to 1 at the first pcistub_get_pci_dev() or pcistub_get_pci_dev_by_slot() call. Something bad happens when pcistub_put_pci_dev() during shutdown of domU. It appears to me as if the function is expected to block until reset is completed. Code from the patch file: + schedule_work(&found_psdev->reset_work); + + /* Wait .. wait */ + wait_for_completion(&found_psdev->reset_done); + The function will not wait since the done-counter is set to 1. The pcistub_put_pci_dev() will return before the work item is excecuted. I do not belive that this was intended. The completion structs done member is decremented to zero but once again set to 1 when the reset work item has been executed. It is possible to boot the DomU a second time (done-counter in completion struct goes down to zero). The problem is that pcistub_put_pci_dev() will leave the done-counter at zero when the domU is shut down for the second time. Trying to boot the domU a third time will stall. The get-functions expects to be able to decrement the completion counter without initiation work that increments the counter and the put-function initiates the same number of reset works as the function calls wait_for_completion(). It doesn't seem right. Could also explain why unbound devices hangs. Their completion struct is not likely to be initiated with a count of 2 after startup. Best Regards, /Martin 2014-02-04 20:55 GMT+01:00 Konrad Rzeszutek Wilk : > On Tue, Feb 04, 2014 at 07:19:16PM +0100, Martin Öhrling wrote: > > I tried to use vga passthrough with an AMD card and ran into the issues > > with dom0 crash on domU reboot. My intention is to debug the issue and I > > have started to read up on the code for pci passthrough. The following > > observations may not be an error but perhaps someone could shed some > light > > over the design intentions. > > > > My first impression was that libxl__device_pci_reset() is a function > level > > reset function. It is called each time a single function of a device is > > added or removed. A device reset would break other domU:s if other > > functions of the device are passed through to these domU:s. > > > > The strange thing is that there is only a single > libxl__device_pci_reset() > > call when pcidev->vfunc_mask is set to LIBXL_PCI_FUNC_ALL. I'm getting > the > > impression that the function is assumed to be a device reset function. > > > > Is the attempt to reset a function, when all other functions belongs to > > pciback, detected and replaced by a device reset? > > Yes with these patches: > > https://git.kernel.org/cgit/linux/kernel/git/konrad/xen.git/log/?h=devel/xen-pciback.slot_and_bus.v0 > > But the last one seems to hang pciback when the device is unbound. > > > > > Best Regards, > > Martin > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > >