linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen-pcifront: Handle missed Connected state
@ 2022-08-29 15:15 Jason Andryuk
  2022-09-01  2:35 ` Rich Persaud
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Jason Andryuk @ 2022-08-29 15:15 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Bjorn Helgaas
  Cc: Jason Andryuk, xen-devel, linux-pci, linux-kernel

An HVM guest with linux stubdom and 2 PCI devices failed to start as
libxl timed out waiting for the PCI devices to be added.  It happens
intermittently but with some regularity.  libxl wrote the two xenstore
entries for the devices, but then timed out waiting for backend state 4
(Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
to an HVM with stubdomain is PV passthrough to the stubdomain and then
HVM passthrough with the QEMU inside the stubdomain.)

The stubdom kernel never printed "pcifront pci-0: Installing PCI
frontend", so it seems to have missed state 4 which would have
called pcifront_try_connect -> pcifront_connect_and_init_dma

Have pcifront_detach_devices special-case state Initialised and call
pcifront_connect_and_init_dma.  Don't use pcifront_try_connect because
that sets the xenbus state which may throw off the backend.  After
connecting, skip the remainder of detach_devices since none have been
initialized yet.  When the backend switches to Reconfigured,
pcifront_attach_devices will pick them up again.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
 drivers/pci/xen-pcifront.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -1012,13 +1012,26 @@ static int pcifront_detach_devices(struc
 {
 	int err = 0;
 	int i, num_devs;
+	enum xenbus_state state;
 	unsigned int domain, bus, slot, func;
 	struct pci_dev *pci_dev;
 	char str[64];
 
-	if (xenbus_read_driver_state(pdev->xdev->nodename) !=
-	    XenbusStateConnected)
+	state = xenbus_read_driver_state(pdev->xdev->nodename);
+	if (state == XenbusStateInitialised) {
+		dev_dbg(&pdev->xdev->dev, "Handle skipped connect.\n");
+		/* We missed Connected and need to initialize. */
+		err = pcifront_connect_and_init_dma(pdev);
+		if (err && err != -EEXIST) {
+			xenbus_dev_fatal(pdev->xdev, err,
+					 "Error setting up PCI Frontend");
+			goto out;
+		}
+
+		goto out_switch_state;
+	} else if (state != XenbusStateConnected) {
 		goto out;
+	}
 
 	err = xenbus_scanf(XBT_NIL, pdev->xdev->otherend, "num_devs", "%d",
 			   &num_devs);
@@ -1079,6 +1092,7 @@ static int pcifront_detach_devices(struc
 			domain, bus, slot, func);
 	}
 
+ out_switch_state:
 	err = xenbus_switch_state(pdev->xdev, XenbusStateReconfiguring);
 
 out:

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen-pcifront: Handle missed Connected state
  2022-08-29 15:15 [PATCH] xen-pcifront: Handle missed Connected state Jason Andryuk
@ 2022-09-01  2:35 ` Rich Persaud
  2022-09-01 12:55   ` Jason Andryuk
  2022-09-02 16:59 ` Bjorn Helgaas
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Rich Persaud @ 2022-09-01  2:35 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Bjorn Helgaas, xen-devel, linux-pci, linux-kernel

On Aug 29, 2022, at 11:16 AM, Jason Andryuk <jandryuk@gmail.com> wrote:
> 
> An HVM guest with linux stubdom and 2 PCI devices failed to start as
> libxl timed out waiting for the PCI devices to be added.  It happens
> intermittently but with some regularity.  libxl wrote the two xenstore
> entries for the devices, but then timed out waiting for backend state 4
> (Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
> to an HVM with stubdomain is PV passthrough to the stubdomain and then
> HVM passthrough with the QEMU inside the stubdomain.)
> 
> The stubdom kernel never printed "pcifront pci-0: Installing PCI
> frontend", so it seems to have missed state 4 which would have
> called pcifront_try_connect -> pcifront_connect_and_init_dma

Is there a state machine doc/flowchart for LibXL and Xen PCI device passthrough to Linux? This would be a valuable addition to Xen's developer docs, even as a whiteboard photo in this thread.

Rich

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen-pcifront: Handle missed Connected state
  2022-09-01  2:35 ` Rich Persaud
@ 2022-09-01 12:55   ` Jason Andryuk
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Andryuk @ 2022-09-01 12:55 UTC (permalink / raw)
  To: Rich Persaud
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Bjorn Helgaas, xen-devel, linux-pci, open list

On Wed, Aug 31, 2022 at 10:35 PM Rich Persaud <persaur@gmail.com> wrote:
>
> On Aug 29, 2022, at 11:16 AM, Jason Andryuk <jandryuk@gmail.com> wrote:
> >
> > An HVM guest with linux stubdom and 2 PCI devices failed to start as
> > libxl timed out waiting for the PCI devices to be added.  It happens
> > intermittently but with some regularity.  libxl wrote the two xenstore
> > entries for the devices, but then timed out waiting for backend state 4
> > (Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
> > to an HVM with stubdomain is PV passthrough to the stubdomain and then
> > HVM passthrough with the QEMU inside the stubdomain.)
> >
> > The stubdom kernel never printed "pcifront pci-0: Installing PCI
> > frontend", so it seems to have missed state 4 which would have
> > called pcifront_try_connect -> pcifront_connect_and_init_dma
>
> Is there a state machine doc/flowchart for LibXL and Xen PCI device passthrough to Linux? This would be a valuable addition to Xen's developer docs, even as a whiteboard photo in this thread.

I am not aware of one.

-Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen-pcifront: Handle missed Connected state
  2022-08-29 15:15 [PATCH] xen-pcifront: Handle missed Connected state Jason Andryuk
  2022-09-01  2:35 ` Rich Persaud
@ 2022-09-02 16:59 ` Bjorn Helgaas
  2022-09-06 15:18   ` Jason Andryuk
  2022-10-04 14:22 ` Juergen Gross
  2022-10-06  5:12 ` Juergen Gross
  3 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2022-09-02 16:59 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Bjorn Helgaas, xen-devel, linux-pci, linux-kernel

The conventional style for subject (from "git log --oneline") is:

  xen/pcifront: Handle ...

On Mon, Aug 29, 2022 at 11:15:36AM -0400, Jason Andryuk wrote:
> An HVM guest with linux stubdom and 2 PCI devices failed to start as

"stubdom" might be handy shorthand in the Xen world, but I think
it would be nice to consistently spell out "stubdomain" since you use
both forms randomly in this commit log and newbies like me have to
wonder whether they're the same or different.

> libxl timed out waiting for the PCI devices to be added.  It happens
> intermittently but with some regularity.  libxl wrote the two xenstore
> entries for the devices, but then timed out waiting for backend state 4
> (Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
> to an HVM with stubdomain is PV passthrough to the stubdomain and then
> HVM passthrough with the QEMU inside the stubdomain.)
> 
> The stubdom kernel never printed "pcifront pci-0: Installing PCI
> frontend", so it seems to have missed state 4 which would have
> called pcifront_try_connect -> pcifront_connect_and_init_dma

Add "()" after function names for clarity.

> Have pcifront_detach_devices special-case state Initialised and call
> pcifront_connect_and_init_dma.  Don't use pcifront_try_connect because
> that sets the xenbus state which may throw off the backend.  After
> connecting, skip the remainder of detach_devices since none have been
> initialized yet.  When the backend switches to Reconfigured,
> pcifront_attach_devices will pick them up again.

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen-pcifront: Handle missed Connected state
  2022-09-02 16:59 ` Bjorn Helgaas
@ 2022-09-06 15:18   ` Jason Andryuk
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Andryuk @ 2022-09-06 15:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Bjorn Helgaas, xen-devel, linux-pci, open list

On Fri, Sep 2, 2022 at 12:59 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> The conventional style for subject (from "git log --oneline") is:
>
>   xen/pcifront: Handle ...
>
> On Mon, Aug 29, 2022 at 11:15:36AM -0400, Jason Andryuk wrote:
> > An HVM guest with linux stubdom and 2 PCI devices failed to start as
>
> "stubdom" might be handy shorthand in the Xen world, but I think
> it would be nice to consistently spell out "stubdomain" since you use
> both forms randomly in this commit log and newbies like me have to
> wonder whether they're the same or different.
>
> > libxl timed out waiting for the PCI devices to be added.  It happens
> > intermittently but with some regularity.  libxl wrote the two xenstore
> > entries for the devices, but then timed out waiting for backend state 4
> > (Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
> > to an HVM with stubdomain is PV passthrough to the stubdomain and then
> > HVM passthrough with the QEMU inside the stubdomain.)
> >
> > The stubdom kernel never printed "pcifront pci-0: Installing PCI
> > frontend", so it seems to have missed state 4 which would have
> > called pcifront_try_connect -> pcifront_connect_and_init_dma
>
> Add "()" after function names for clarity.
>
> > Have pcifront_detach_devices special-case state Initialised and call
> > pcifront_connect_and_init_dma.  Don't use pcifront_try_connect because
> > that sets the xenbus state which may throw off the backend.  After
> > connecting, skip the remainder of detach_devices since none have been
> > initialized yet.  When the backend switches to Reconfigured,
> > pcifront_attach_devices will pick them up again.

Thanks for taking a look, Bjorn.  That all sounds good.  I'll wait a
little longer to see if there is any more feedback before sending a
v2.

Regards,
Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen-pcifront: Handle missed Connected state
  2022-08-29 15:15 [PATCH] xen-pcifront: Handle missed Connected state Jason Andryuk
  2022-09-01  2:35 ` Rich Persaud
  2022-09-02 16:59 ` Bjorn Helgaas
@ 2022-10-04 14:22 ` Juergen Gross
  2022-10-06  5:12 ` Juergen Gross
  3 siblings, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2022-10-04 14:22 UTC (permalink / raw)
  To: Jason Andryuk, Stefano Stabellini, Oleksandr Tyshchenko, Bjorn Helgaas
  Cc: xen-devel, linux-pci, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 1363 bytes --]

On 29.08.22 17:15, Jason Andryuk wrote:
> An HVM guest with linux stubdom and 2 PCI devices failed to start as
> libxl timed out waiting for the PCI devices to be added.  It happens
> intermittently but with some regularity.  libxl wrote the two xenstore
> entries for the devices, but then timed out waiting for backend state 4
> (Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
> to an HVM with stubdomain is PV passthrough to the stubdomain and then
> HVM passthrough with the QEMU inside the stubdomain.)
> 
> The stubdom kernel never printed "pcifront pci-0: Installing PCI
> frontend", so it seems to have missed state 4 which would have
> called pcifront_try_connect -> pcifront_connect_and_init_dma
> 
> Have pcifront_detach_devices special-case state Initialised and call
> pcifront_connect_and_init_dma.  Don't use pcifront_try_connect because
> that sets the xenbus state which may throw off the backend.  After
> connecting, skip the remainder of detach_devices since none have been
> initialized yet.  When the backend switches to Reconfigured,
> pcifront_attach_devices will pick them up again.
> 
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

Reviewed-by: Juergen Gross <jgross@suse.com>

The modifications of the commit message requested by Bjorn can be done
while committing.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen-pcifront: Handle missed Connected state
  2022-08-29 15:15 [PATCH] xen-pcifront: Handle missed Connected state Jason Andryuk
                   ` (2 preceding siblings ...)
  2022-10-04 14:22 ` Juergen Gross
@ 2022-10-06  5:12 ` Juergen Gross
  3 siblings, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2022-10-06  5:12 UTC (permalink / raw)
  To: Jason Andryuk, Stefano Stabellini, Oleksandr Tyshchenko, Bjorn Helgaas
  Cc: xen-devel, linux-pci, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 1261 bytes --]

On 29.08.22 17:15, Jason Andryuk wrote:
> An HVM guest with linux stubdom and 2 PCI devices failed to start as
> libxl timed out waiting for the PCI devices to be added.  It happens
> intermittently but with some regularity.  libxl wrote the two xenstore
> entries for the devices, but then timed out waiting for backend state 4
> (Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
> to an HVM with stubdomain is PV passthrough to the stubdomain and then
> HVM passthrough with the QEMU inside the stubdomain.)
> 
> The stubdom kernel never printed "pcifront pci-0: Installing PCI
> frontend", so it seems to have missed state 4 which would have
> called pcifront_try_connect -> pcifront_connect_and_init_dma
> 
> Have pcifront_detach_devices special-case state Initialised and call
> pcifront_connect_and_init_dma.  Don't use pcifront_try_connect because
> that sets the xenbus state which may throw off the backend.  After
> connecting, skip the remainder of detach_devices since none have been
> initialized yet.  When the backend switches to Reconfigured,
> pcifront_attach_devices will pick them up again.
> 
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

Pushed to xen/tip.git for-linus-6.1


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-06  5:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29 15:15 [PATCH] xen-pcifront: Handle missed Connected state Jason Andryuk
2022-09-01  2:35 ` Rich Persaud
2022-09-01 12:55   ` Jason Andryuk
2022-09-02 16:59 ` Bjorn Helgaas
2022-09-06 15:18   ` Jason Andryuk
2022-10-04 14:22 ` Juergen Gross
2022-10-06  5:12 ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).