All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest
@ 2021-12-14  7:49 Jan Beulich
  2021-12-14 13:34 ` Jason Andryuk
  2022-01-04  7:53 ` Ping: " Jan Beulich
  0 siblings, 2 replies; 7+ messages in thread
From: Jan Beulich @ 2021-12-14  7:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Anthony Perard, Juergen Gross, Paul Durrant, Stefano Stabellini

Attempting to wait when the backend hasn't been created yet can't work:
the function will complain "Backend ... does not exist". Move the
waiting past the creation of the backend (and that of other related
nodes), hoping that there are no other dependencies that would now be
broken.

Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are reflected in the config")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Just to make it explicit: I have no idea why the waiting is needed in
the first place. It's been there from the very introduction of PCI
passthrough support (commit b0a1af61678b). I therefore can't exclude
that an even better fix would be to simply omit the 2nd hunk here.

--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -157,11 +157,6 @@ static int libxl__device_pci_add_xenstor
     if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
 
-    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
-        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected)) < 0)
-            return ERROR_FAIL;
-    }
-
     back = flexarray_make(gc, 16, 1);
 
     LOGD(DEBUG, domid, "Adding new pci device to xenstore");
@@ -213,6 +208,9 @@ static int libxl__device_pci_add_xenstor
         if (rc < 0) goto out;
     }
 
+    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV)
+        rc = libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected));
+
 out:
     libxl__xs_transaction_abort(gc, &t);
     if (lock) libxl__unlock_file(lock);



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest
  2021-12-14  7:49 [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest Jan Beulich
@ 2021-12-14 13:34 ` Jason Andryuk
  2021-12-14 13:52   ` Jan Beulich
  2022-01-04  7:53 ` Ping: " Jan Beulich
  1 sibling, 1 reply; 7+ messages in thread
From: Jason Andryuk @ 2021-12-14 13:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Wei Liu, Anthony Perard, Juergen Gross, Paul Durrant,
	Stefano Stabellini

On Tue, Dec 14, 2021 at 2:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> Attempting to wait when the backend hasn't been created yet can't work:
> the function will complain "Backend ... does not exist". Move the
> waiting past the creation of the backend (and that of other related
> nodes), hoping that there are no other dependencies that would now be
> broken.
>
> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are reflected in the config")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Just to make it explicit: I have no idea why the waiting is needed in
> the first place. It's been there from the very introduction of PCI
> passthrough support (commit b0a1af61678b). I therefore can't exclude
> that an even better fix would be to simply omit the 2nd hunk here.

The first time a device is attached, the backend does not exist, and
the wait is not needed.  However, when a second device is attached,
the backend does exist.  Since pciback goes through Reconfiguring and
Reconfigured, I believe the wait exists to let the frontend/backend
settle back to Connected before modifying the xenstore entries to add
the additional device.  I could be wrong, but that is my best answer
for why someone went to the trouble of adding a wait in the first
place.

Prior to 0fdb48ffe7a1, the backend was created before the watch:
     num_devs = libxl__xs_read(gc, XBT_NULL, GCSPRINTF("%s/num_devs", be_path));
-    if (!num_devs)
-        return libxl__create_pci_backend(gc, domid, pci, 1);

     libxl_domain_type domtype = libxl__domain_type(gc, domid);
     if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;

     if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
         if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d",
XenbusStateConnected)) < 0)
             return ERROR_FAIL;
     }

Here and elsewhere, num_devs has been used to identify pre-existing
backends.  That's why I went with the following to address this:
-    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
-        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d",
XenbusStateConnected)) < 0)
+    /* wait is only needed if the backend already exists (num_devs != NULL) */
+    if (num_devs && !starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
+        if (libxl__wait_for_backend(gc, be_path,
+                                    GCSPRINTF("%d", XenbusStateConnected)) < 0)

Regards,
Jason

> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -157,11 +157,6 @@ static int libxl__device_pci_add_xenstor
>      if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
>
> -    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> -        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected)) < 0)
> -            return ERROR_FAIL;
> -    }
> -
>      back = flexarray_make(gc, 16, 1);
>
>      LOGD(DEBUG, domid, "Adding new pci device to xenstore");
> @@ -213,6 +208,9 @@ static int libxl__device_pci_add_xenstor
>          if (rc < 0) goto out;
>      }
>
> +    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV)
> +        rc = libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected));
> +
>  out:
>      libxl__xs_transaction_abort(gc, &t);
>      if (lock) libxl__unlock_file(lock);
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest
  2021-12-14 13:34 ` Jason Andryuk
@ 2021-12-14 13:52   ` Jan Beulich
  2022-01-07 15:20     ` Anthony PERARD
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2021-12-14 13:52 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: xen-devel, Wei Liu, Anthony Perard, Juergen Gross, Paul Durrant,
	Stefano Stabellini

On 14.12.2021 14:34, Jason Andryuk wrote:
> On Tue, Dec 14, 2021 at 2:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> Attempting to wait when the backend hasn't been created yet can't work:
>> the function will complain "Backend ... does not exist". Move the
>> waiting past the creation of the backend (and that of other related
>> nodes), hoping that there are no other dependencies that would now be
>> broken.
>>
>> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are reflected in the config")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Just to make it explicit: I have no idea why the waiting is needed in
>> the first place. It's been there from the very introduction of PCI
>> passthrough support (commit b0a1af61678b). I therefore can't exclude
>> that an even better fix would be to simply omit the 2nd hunk here.
> 
> The first time a device is attached, the backend does not exist, and
> the wait is not needed.  However, when a second device is attached,
> the backend does exist.  Since pciback goes through Reconfiguring and
> Reconfigured, I believe the wait exists to let the frontend/backend
> settle back to Connected before modifying the xenstore entries to add
> the additional device.  I could be wrong, but that is my best answer
> for why someone went to the trouble of adding a wait in the first
> place.

If things are as you describe them, then the change here is wrong: The
waiting gets moved from before the creation of the new device's nodes
to immediately after. Yet then I also can't see how else I should
address the issue at hand, so I'd have to defer to someone else; this
may involve undoing / redoing some of what the commit referenced by
the Fixes: tag did.

However, since all new nodes get added in a single transaction, I
can't see why waiting for the completion of a prior reconfigure would
be necessary: That'll either notice (and process) the new nodes, or
it won't. If it does, the next reconfigure would simply be a no-op.

Jan



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Ping: [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest
  2021-12-14  7:49 [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest Jan Beulich
  2021-12-14 13:34 ` Jason Andryuk
@ 2022-01-04  7:53 ` Jan Beulich
  2022-01-04  8:18   ` Durrant, Paul
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2022-01-04  7:53 UTC (permalink / raw)
  To: Wei Liu, Anthony Perard
  Cc: Juergen Gross, Paul Durrant, Stefano Stabellini, xen-devel

On 14.12.2021 08:49, Jan Beulich wrote:
> Attempting to wait when the backend hasn't been created yet can't work:
> the function will complain "Backend ... does not exist". Move the
> waiting past the creation of the backend (and that of other related
> nodes), hoping that there are no other dependencies that would now be
> broken.
> 
> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are reflected in the config")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Just to make it explicit: I have no idea why the waiting is needed in
> the first place. It's been there from the very introduction of PCI
> passthrough support (commit b0a1af61678b). I therefore can't exclude
> that an even better fix would be to simply omit the 2nd hunk here.

Anyone, be it an ack or an alternative?

Jan

> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -157,11 +157,6 @@ static int libxl__device_pci_add_xenstor
>      if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
>  
> -    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> -        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected)) < 0)
> -            return ERROR_FAIL;
> -    }
> -
>      back = flexarray_make(gc, 16, 1);
>  
>      LOGD(DEBUG, domid, "Adding new pci device to xenstore");
> @@ -213,6 +208,9 @@ static int libxl__device_pci_add_xenstor
>          if (rc < 0) goto out;
>      }
>  
> +    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV)
> +        rc = libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected));
> +
>  out:
>      libxl__xs_transaction_abort(gc, &t);
>      if (lock) libxl__unlock_file(lock);
> 
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Ping: [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest
  2022-01-04  7:53 ` Ping: " Jan Beulich
@ 2022-01-04  8:18   ` Durrant, Paul
  0 siblings, 0 replies; 7+ messages in thread
From: Durrant, Paul @ 2022-01-04  8:18 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu, Anthony Perard
  Cc: Juergen Gross, Paul Durrant, Stefano Stabellini, xen-devel

On 04/01/2022 07:53, Jan Beulich wrote:
> On 14.12.2021 08:49, Jan Beulich wrote:
>> Attempting to wait when the backend hasn't been created yet can't work:
>> the function will complain "Backend ... does not exist". Move the
>> waiting past the creation of the backend (and that of other related
>> nodes), hoping that there are no other dependencies that would now be
>> broken.
>>
>> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are reflected in the config")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Just to make it explicit: I have no idea why the waiting is needed in
>> the first place. It's been there from the very introduction of PCI
>> passthrough support (commit b0a1af61678b). I therefore can't exclude
>> that an even better fix would be to simply omit the 2nd hunk here.
> 
> Anyone, be it an ack or an alternative?
> 

You can add my R-b FWIW, but of course I am not a maintainer.

   Paul

> Jan
> 
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -157,11 +157,6 @@ static int libxl__device_pci_add_xenstor
>>       if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
>>           return ERROR_FAIL;
>>   
>> -    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
>> -        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected)) < 0)
>> -            return ERROR_FAIL;
>> -    }
>> -
>>       back = flexarray_make(gc, 16, 1);
>>   
>>       LOGD(DEBUG, domid, "Adding new pci device to xenstore");
>> @@ -213,6 +208,9 @@ static int libxl__device_pci_add_xenstor
>>           if (rc < 0) goto out;
>>       }
>>   
>> +    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV)
>> +        rc = libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected));
>> +
>>   out:
>>       libxl__xs_transaction_abort(gc, &t);
>>       if (lock) libxl__unlock_file(lock);
>>
>>
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest
  2021-12-14 13:52   ` Jan Beulich
@ 2022-01-07 15:20     ` Anthony PERARD
  2022-01-10  8:28       ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Anthony PERARD @ 2022-01-07 15:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jason Andryuk, xen-devel, Wei Liu, Juergen Gross, Paul Durrant,
	Stefano Stabellini

On Tue, Dec 14, 2021 at 02:52:43PM +0100, Jan Beulich wrote:
> On 14.12.2021 14:34, Jason Andryuk wrote:
> > On Tue, Dec 14, 2021 at 2:50 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> Attempting to wait when the backend hasn't been created yet can't work:
> >> the function will complain "Backend ... does not exist". Move the
> >> waiting past the creation of the backend (and that of other related
> >> nodes), hoping that there are no other dependencies that would now be
> >> broken.
> >>
> >> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are reflected in the config")
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> Just to make it explicit: I have no idea why the waiting is needed in
> >> the first place. It's been there from the very introduction of PCI
> >> passthrough support (commit b0a1af61678b). I therefore can't exclude
> >> that an even better fix would be to simply omit the 2nd hunk here.
> > 
> > The first time a device is attached, the backend does not exist, and
> > the wait is not needed.  However, when a second device is attached,
> > the backend does exist.  Since pciback goes through Reconfiguring and
> > Reconfigured, I believe the wait exists to let the frontend/backend
> > settle back to Connected before modifying the xenstore entries to add
> > the additional device.  I could be wrong, but that is my best answer
> > for why someone went to the trouble of adding a wait in the first
> > place.
> 
> If things are as you describe them, then the change here is wrong: The
> waiting gets moved from before the creation of the new device's nodes
> to immediately after. Yet then I also can't see how else I should
> address the issue at hand, so I'd have to defer to someone else; this
> may involve undoing / redoing some of what the commit referenced by
> the Fixes: tag did.
> 
> However, since all new nodes get added in a single transaction, I
> can't see why waiting for the completion of a prior reconfigure would
> be necessary: That'll either notice (and process) the new nodes, or
> it won't. If it does, the next reconfigure would simply be a no-op.

Well, the current code is checking that the backend is in a known state:
"Connected". Without this, the backend could be in any state like
"Closing" or other error, not just reconfiguring. We probably want to
keep checking that the backend can expect more devices.

Looking at Linux PCI PV backend implementation, I think linux reads
"num_devs", takes time to do configuration of new devs, then set "state"
to "reconfigured". So if libxl set's "num_devs" and "states" while
Linux takes time to config new devs, Linux will never check "num_devs"
again and ignore new added devices. So I guess it doesn't matter if we
wait before or after to read "state"=="connected".

There is no real documentation on this PV PCI passthrough, so it is hard
to tell what libxl can do. The pci backend xenstore path isn't even in
"xenstore-paths.pandoc".

But overall, maybe Jason's proposed change would be better, that is to
wait on the backend before adding a new device but only when there's
already a device which mean the backend would exist. (It would be better
to me as it doesn't change when the waiting is done.)

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest
  2022-01-07 15:20     ` Anthony PERARD
@ 2022-01-10  8:28       ` Jan Beulich
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2022-01-10  8:28 UTC (permalink / raw)
  To: Anthony PERARD, Jason Andryuk
  Cc: xen-devel, Wei Liu, Juergen Gross, Paul Durrant, Stefano Stabellini

On 07.01.2022 16:20, Anthony PERARD wrote:
> On Tue, Dec 14, 2021 at 02:52:43PM +0100, Jan Beulich wrote:
>> On 14.12.2021 14:34, Jason Andryuk wrote:
>>> On Tue, Dec 14, 2021 at 2:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> Attempting to wait when the backend hasn't been created yet can't work:
>>>> the function will complain "Backend ... does not exist". Move the
>>>> waiting past the creation of the backend (and that of other related
>>>> nodes), hoping that there are no other dependencies that would now be
>>>> broken.
>>>>
>>>> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are reflected in the config")
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> Just to make it explicit: I have no idea why the waiting is needed in
>>>> the first place. It's been there from the very introduction of PCI
>>>> passthrough support (commit b0a1af61678b). I therefore can't exclude
>>>> that an even better fix would be to simply omit the 2nd hunk here.
>>>
>>> The first time a device is attached, the backend does not exist, and
>>> the wait is not needed.  However, when a second device is attached,
>>> the backend does exist.  Since pciback goes through Reconfiguring and
>>> Reconfigured, I believe the wait exists to let the frontend/backend
>>> settle back to Connected before modifying the xenstore entries to add
>>> the additional device.  I could be wrong, but that is my best answer
>>> for why someone went to the trouble of adding a wait in the first
>>> place.
>>
>> If things are as you describe them, then the change here is wrong: The
>> waiting gets moved from before the creation of the new device's nodes
>> to immediately after. Yet then I also can't see how else I should
>> address the issue at hand, so I'd have to defer to someone else; this
>> may involve undoing / redoing some of what the commit referenced by
>> the Fixes: tag did.
>>
>> However, since all new nodes get added in a single transaction, I
>> can't see why waiting for the completion of a prior reconfigure would
>> be necessary: That'll either notice (and process) the new nodes, or
>> it won't. If it does, the next reconfigure would simply be a no-op.
> 
> Well, the current code is checking that the backend is in a known state:
> "Connected". Without this, the backend could be in any state like
> "Closing" or other error, not just reconfiguring. We probably want to
> keep checking that the backend can expect more devices.

Perhaps; I wonder though whether that's enough. The backend may also not
expect (or successfully deal with) new devices for other reasons. IOW
some kind of check of the success of the "addition" would seem to be
needed anyway.

> Looking at Linux PCI PV backend implementation, I think linux reads
> "num_devs", takes time to do configuration of new devs, then set "state"
> to "reconfigured". So if libxl set's "num_devs" and "states" while
> Linux takes time to config new devs, Linux will never check "num_devs"
> again and ignore new added devices. So I guess it doesn't matter if we
> wait before or after to read "state"=="connected".
> 
> There is no real documentation on this PV PCI passthrough, so it is hard
> to tell what libxl can do. The pci backend xenstore path isn't even in
> "xenstore-paths.pandoc".
> 
> But overall, maybe Jason's proposed change would be better, that is to
> wait on the backend before adding a new device but only when there's
> already a device which mean the backend would exist. (It would be better
> to me as it doesn't change when the waiting is done.)

It's hard for me to tell without having seen Jason's full patch. I also
understand it has been submitted earlier than mine, so I wonder what its
status is.

Jan



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-10  8:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14  7:49 [PATCH] libxl/PCI: defer backend wait upon attaching to PV guest Jan Beulich
2021-12-14 13:34 ` Jason Andryuk
2021-12-14 13:52   ` Jan Beulich
2022-01-07 15:20     ` Anthony PERARD
2022-01-10  8:28       ` Jan Beulich
2022-01-04  7:53 ` Ping: " Jan Beulich
2022-01-04  8:18   ` Durrant, Paul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.