xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Paul Durrant <xadimgnik@gmail.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
	"Durrant, Paul" <pdurrant@amazon.co.uk>
Cc: Michael Brown <mbrown@fensystems.co.uk>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	Ian Jackson <iwj@xenproject.org>, Wei Liu <wl@xen.org>,
	Anthony PERARD <anthony.perard@citrix.com>
Subject: Re: [PATCH] xen-netback: Check for hotplug-status existence before watching
Date: Tue, 18 May 2021 07:57:16 +0100	[thread overview]
Message-ID: <8b7a9cd5-3696-65c2-5656-a1c8eb174344@xen.org> (raw)
In-Reply-To: <YKLjoALdw4oKSZ04@mail-itl>

On 17/05/2021 22:43, Marek Marczykowski-Górecki wrote:
> On Tue, May 11, 2021 at 12:46:38PM +0000, Durrant, Paul wrote:
>>> -----Original Message-----
>>> From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>> Sent: 11 May 2021 11:45
>>> To: Durrant, Paul <pdurrant@amazon.co.uk>
>>> Cc: Michael Brown <mbrown@fensystems.co.uk>; paul@xen.org; xen-devel@lists.xenproject.org;
>>> netdev@vger.kernel.org; wei.liu@kernel.org
>>> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status existence before watching
>>>
>>> On Tue, May 11, 2021 at 12:40:54PM +0200, Marek Marczykowski-Górecki wrote:
>>>> On Tue, May 11, 2021 at 07:06:55AM +0000, Durrant, Paul wrote:
>>>>>> -----Original Message-----
>>>>>> From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>>>> Sent: 10 May 2021 20:43
>>>>>> To: Michael Brown <mbrown@fensystems.co.uk>; paul@xen.org
>>>>>> Cc: paul@xen.org; xen-devel@lists.xenproject.org; netdev@vger.kernel.org; wei.liu@kernel.org;
>>> Durrant,
>>>>>> Paul <pdurrant@amazon.co.uk>
>>>>>> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status existence before watching
>>>>>>
>>>>>> On Mon, May 10, 2021 at 08:06:55PM +0100, Michael Brown wrote:
>>>>>>> If you have a suggested patch, I'm happy to test that it doesn't reintroduce
>>>>>>> the regression bug that was fixed by this commit.
>>>>>>
>>>>>> Actually, I've just tested with a simple reloading xen-netfront module. It
>>>>>> seems in this case, the hotplug script is not re-executed. In fact, I
>>>>>> think it should not be re-executed at all, since the vif interface
>>>>>> remains in place (it just gets NO-CARRIER flag).
>>>>>>
>>>>>> This brings a question, why removing hotplug-status in the first place?
>>>>>> The interface remains correctly configured by the hotplug script after
>>>>>> all. From the commit message:
>>>>>>
>>>>>>      xen-netback: remove 'hotplug-status' once it has served its purpose
>>>>>>
>>>>>>      Removing the 'hotplug-status' node in netback_remove() is wrong; the script
>>>>>>      may not have completed. Only remove the node once the watch has fired and
>>>>>>      has been unregistered.
>>>>>>
>>>>>> I think the intention was to remove 'hotplug-status' node _later_ in
>>>>>> case of quickly adding and removing the interface. Is that right, Paul?
>>>>>
>>>>> The removal was done to allow unbind/bind to function correctly. IIRC before the original patch
>>> doing a bind would stall forever waiting for the hotplug status to change, which would never happen.
>>>>
>>>> Hmm, in that case maybe don't remove it at all in the backend, and let
>>>> it be cleaned up by the toolstack, when it removes other backend-related
>>>> nodes?
>>>
>>> No, unbind/bind _does_ require hotplug script to be called again.
>>>
>>
>> Yes, sorry I was misremembering. My memory is hazy but there was definitely a problem at the time with leaving the node in place.
>>
>>> When exactly vif interface appears in the system (starts to be available
>>> for the hotplug script)? Maybe remove 'hotplug-status' just before that
>>> point?
>>>
>>
>> I really can't remember any detail. Perhaps try reverting both patches then and check that the unbind/rmmod/modprobe/bind sequence still works (and the backend actually makes it into connected state).
> 
> Ok, I've tried this. I've reverted both commits, then used your test
> script from the 9476654bd5e8ad42abe8ee9f9e90069ff8e60c17:
>      
>      This has been tested by running iperf as a server in the test VM and
>      then running a client against it in a continuous loop, whilst also
>      running:
>      
>      while true;
>        do echo vif-$DOMID-$VIF >unbind;
>        echo down;
>        rmmod xen-netback;
>        echo unloaded;
>        modprobe xen-netback;
>        cd $(pwd);
>        brctl addif xenbr0 vif$DOMID.$VIF;
>        ip link set vif$DOMID.$VIF up;
>        echo up;
>        sleep 5;
>        done
>      
>      in dom0 from /sys/bus/xen-backend/drivers/vif to continuously unbind,
>      unload, re-load, re-bind and re-plumb the backend.
>      
> In fact, the need to call `brctl` and `ip link` manually is exactly
> because the hotplug script isn't executed. When I execute it manually,
> the backend properly gets back to working. So, removing 'hotplug-status'
> was in the correct place (netback_remove). The missing part is the toolstack
> calling the hotplug script on xen-netback re-bind.
> 

Why is that missing? We're going behind the back of the toolstack to do 
the unbind and bind so why should we expect it to re-execute a hotplug 
script?

> In this case, I'm not sure what is the proper way. If I restart
> xendriverdomain service (I do run the backend in domU), it properly
> executes hotplug script and the backend interface gets properly
> configured. But it doesn't do it on its own. It seems to be related to
> device "state" in xenstore. The specific part of the libxl is
> backend_watch_callback():
> https://github.com/xen-project/xen/blob/master/tools/libs/light/libxl_device.c#L1664
> 
>      ddev = search_for_device(dguest, dev);
>      if (ddev == NULL && state == XenbusStateClosed) {
>          /*
>           * Spurious state change, device has already been disconnected
>           * or never attached.
>           */
>          goto skip;
>      } else if (ddev == NULL) {
>          rc = add_device(egc, nested_ao, dguest, dev);
>          if (rc > 0)
>              free_ao = true;
>      } else if (state == XenbusStateClosed && online == 0) {
>          rc = remove_device(egc, nested_ao, dguest, ddev);
>          if (rc > 0)
>              free_ao = true;
>          check_and_maybe_remove_guest(gc, ddomain, dguest);
>      }
> 
> In short: if device gets XenbusStateInitWait for the first time (ddev ==
> NULL case), it goes to add_device() which executes the hotplug script
> and stores the device.
> Then, if device goes to XenbusStateClosed + online==0 state, then it
> executes hotplug script again (with "offline" parameter) and forgets the
> device. If you unbind the driver, the device stays in
> XenbusStateConnected state (in xenstore), and after you bind it again,
> it goes to XenbusStateInitWait. It don't think it goes through
> XenbusStateClosed, and online stays at 1 too, so libxl doesn't execute
> the hotplug script again.

This is pretty key. The frontend should not notice an unbind/bind i.e. 
there should be no evidence of it happening by examining states in 
xenstore (from the guest side).

   Paul

> 
> Some solution could be to add an extra case at the end, like "ddev !=
> NULL && state == XenbusStateInitWait && hotplug-status != connected".
> And make sure xl devd won't call the same hotplug script multiple times
> for the same device _at the same time_ (I'm not sure about the async
> machinery here).
> 
> But even if xl devd (aka xendriverdomain service) gets "fixes" to
> execute hotplug script in that case, I don't think it would work in
> backend in dom0 case - there, I think nothing watches already configured
> vif interfaces (there is no xl devd daemon in dom0, and xl background
> process watches only domain death and cdrom eject events).
> 
> I'm adding toolstack maintainers, maybe they'll have some idea...
> 
> In any case, the issue is not calling the hotplug script, responsible
> for configuring newly created vif interface. Not kernel waiting for it.
> So, I think both commits should still be reverted.
> 



  parent reply	other threads:[~2021-05-18  6:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-10 18:25 xen-netback hotplug-status regression bug Michael Brown
2021-04-13  7:12 ` Paul Durrant
2021-04-13 10:48   ` Michael Brown
2021-04-13 10:55     ` Paul Durrant
2021-04-13 15:14       ` Michael Brown
2021-04-13 15:25       ` [PATCH] xen-netback: Check for hotplug-status existence before watching Michael Brown
2021-04-13 19:12         ` Paul Durrant
2021-04-13 22:30         ` patchwork-bot+netdevbpf
2021-05-10 18:32         ` Marek Marczykowski-Górecki
2021-05-10 18:47           ` Michael Brown
2021-05-10 18:53             ` Marek Marczykowski-Górecki
2021-05-10 19:06               ` Michael Brown
2021-05-10 19:42                 ` Marek Marczykowski-Górecki
2021-05-11  7:06                   ` Durrant, Paul
2021-05-11 10:40                     ` Marek Marczykowski-Górecki
2021-05-11 10:45                       ` Marek Marczykowski-Górecki
2021-05-11 12:46                         ` Durrant, Paul
2021-05-17 21:43                           ` Marek Marczykowski-Górecki
2021-05-17 21:51                             ` Michael Brown
2021-05-17 21:58                               ` Marek Marczykowski-Górecki
2021-05-18  6:57                             ` Paul Durrant [this message]
2021-05-18  9:18                               ` Marek Marczykowski-Górecki
     [not found]                                 ` <887f9533f5c54bfabfbff7231eb99b08@EX13D32EUC003.ant.amazon.com>
     [not found]                                   ` <YKOMpXwcnr9QiXy8@mail-itl>
     [not found]                                     ` <2c23e102b6254e42877eb1e8fe68a4f7@EX13D32EUC003.ant.amazon.com>
2021-05-18 10:42                                       ` Marek Marczykowski-Górecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b7a9cd5-3696-65c2-5656-a1c8eb174344@xen.org \
    --to=xadimgnik@gmail.com \
    --cc=anthony.perard@citrix.com \
    --cc=iwj@xenproject.org \
    --cc=marmarek@invisiblethingslab.com \
    --cc=mbrown@fensystems.co.uk \
    --cc=netdev@vger.kernel.org \
    --cc=paul@xen.org \
    --cc=pdurrant@amazon.co.uk \
    --cc=wei.liu@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).