From: Paul Durrant <xadimgnik@gmail.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
"Durrant, Paul" <pdurrant@amazon.co.uk>
Cc: Michael Brown <mbrown@fensystems.co.uk>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"wei.liu@kernel.org" <wei.liu@kernel.org>,
Ian Jackson <iwj@xenproject.org>, Wei Liu <wl@xen.org>,
Anthony PERARD <anthony.perard@citrix.com>
Subject: Re: [PATCH] xen-netback: Check for hotplug-status existence before watching
Date: Tue, 18 May 2021 07:57:16 +0100 [thread overview]
Message-ID: <8b7a9cd5-3696-65c2-5656-a1c8eb174344@xen.org> (raw)
In-Reply-To: <YKLjoALdw4oKSZ04@mail-itl>
On 17/05/2021 22:43, Marek Marczykowski-Górecki wrote:
> On Tue, May 11, 2021 at 12:46:38PM +0000, Durrant, Paul wrote:
>>> -----Original Message-----
>>> From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>> Sent: 11 May 2021 11:45
>>> To: Durrant, Paul <pdurrant@amazon.co.uk>
>>> Cc: Michael Brown <mbrown@fensystems.co.uk>; paul@xen.org; xen-devel@lists.xenproject.org;
>>> netdev@vger.kernel.org; wei.liu@kernel.org
>>> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status existence before watching
>>>
>>> On Tue, May 11, 2021 at 12:40:54PM +0200, Marek Marczykowski-Górecki wrote:
>>>> On Tue, May 11, 2021 at 07:06:55AM +0000, Durrant, Paul wrote:
>>>>>> -----Original Message-----
>>>>>> From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>>>> Sent: 10 May 2021 20:43
>>>>>> To: Michael Brown <mbrown@fensystems.co.uk>; paul@xen.org
>>>>>> Cc: paul@xen.org; xen-devel@lists.xenproject.org; netdev@vger.kernel.org; wei.liu@kernel.org;
>>> Durrant,
>>>>>> Paul <pdurrant@amazon.co.uk>
>>>>>> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status existence before watching
>>>>>>
>>>>>> On Mon, May 10, 2021 at 08:06:55PM +0100, Michael Brown wrote:
>>>>>>> If you have a suggested patch, I'm happy to test that it doesn't reintroduce
>>>>>>> the regression bug that was fixed by this commit.
>>>>>>
>>>>>> Actually, I've just tested with a simple reloading xen-netfront module. It
>>>>>> seems in this case, the hotplug script is not re-executed. In fact, I
>>>>>> think it should not be re-executed at all, since the vif interface
>>>>>> remains in place (it just gets NO-CARRIER flag).
>>>>>>
>>>>>> This brings a question, why removing hotplug-status in the first place?
>>>>>> The interface remains correctly configured by the hotplug script after
>>>>>> all. From the commit message:
>>>>>>
>>>>>> xen-netback: remove 'hotplug-status' once it has served its purpose
>>>>>>
>>>>>> Removing the 'hotplug-status' node in netback_remove() is wrong; the script
>>>>>> may not have completed. Only remove the node once the watch has fired and
>>>>>> has been unregistered.
>>>>>>
>>>>>> I think the intention was to remove 'hotplug-status' node _later_ in
>>>>>> case of quickly adding and removing the interface. Is that right, Paul?
>>>>>
>>>>> The removal was done to allow unbind/bind to function correctly. IIRC before the original patch
>>> doing a bind would stall forever waiting for the hotplug status to change, which would never happen.
>>>>
>>>> Hmm, in that case maybe don't remove it at all in the backend, and let
>>>> it be cleaned up by the toolstack, when it removes other backend-related
>>>> nodes?
>>>
>>> No, unbind/bind _does_ require hotplug script to be called again.
>>>
>>
>> Yes, sorry I was misremembering. My memory is hazy but there was definitely a problem at the time with leaving the node in place.
>>
>>> When exactly vif interface appears in the system (starts to be available
>>> for the hotplug script)? Maybe remove 'hotplug-status' just before that
>>> point?
>>>
>>
>> I really can't remember any detail. Perhaps try reverting both patches then and check that the unbind/rmmod/modprobe/bind sequence still works (and the backend actually makes it into connected state).
>
> Ok, I've tried this. I've reverted both commits, then used your test
> script from the 9476654bd5e8ad42abe8ee9f9e90069ff8e60c17:
>
> This has been tested by running iperf as a server in the test VM and
> then running a client against it in a continuous loop, whilst also
> running:
>
> while true;
> do echo vif-$DOMID-$VIF >unbind;
> echo down;
> rmmod xen-netback;
> echo unloaded;
> modprobe xen-netback;
> cd $(pwd);
> brctl addif xenbr0 vif$DOMID.$VIF;
> ip link set vif$DOMID.$VIF up;
> echo up;
> sleep 5;
> done
>
> in dom0 from /sys/bus/xen-backend/drivers/vif to continuously unbind,
> unload, re-load, re-bind and re-plumb the backend.
>
> In fact, the need to call `brctl` and `ip link` manually is exactly
> because the hotplug script isn't executed. When I execute it manually,
> the backend properly gets back to working. So, removing 'hotplug-status'
> was in the correct place (netback_remove). The missing part is the toolstack
> calling the hotplug script on xen-netback re-bind.
>
Why is that missing? We're going behind the back of the toolstack to do
the unbind and bind so why should we expect it to re-execute a hotplug
script?
> In this case, I'm not sure what is the proper way. If I restart
> xendriverdomain service (I do run the backend in domU), it properly
> executes hotplug script and the backend interface gets properly
> configured. But it doesn't do it on its own. It seems to be related to
> device "state" in xenstore. The specific part of the libxl is
> backend_watch_callback():
> https://github.com/xen-project/xen/blob/master/tools/libs/light/libxl_device.c#L1664
>
> ddev = search_for_device(dguest, dev);
> if (ddev == NULL && state == XenbusStateClosed) {
> /*
> * Spurious state change, device has already been disconnected
> * or never attached.
> */
> goto skip;
> } else if (ddev == NULL) {
> rc = add_device(egc, nested_ao, dguest, dev);
> if (rc > 0)
> free_ao = true;
> } else if (state == XenbusStateClosed && online == 0) {
> rc = remove_device(egc, nested_ao, dguest, ddev);
> if (rc > 0)
> free_ao = true;
> check_and_maybe_remove_guest(gc, ddomain, dguest);
> }
>
> In short: if device gets XenbusStateInitWait for the first time (ddev ==
> NULL case), it goes to add_device() which executes the hotplug script
> and stores the device.
> Then, if device goes to XenbusStateClosed + online==0 state, then it
> executes hotplug script again (with "offline" parameter) and forgets the
> device. If you unbind the driver, the device stays in
> XenbusStateConnected state (in xenstore), and after you bind it again,
> it goes to XenbusStateInitWait. It don't think it goes through
> XenbusStateClosed, and online stays at 1 too, so libxl doesn't execute
> the hotplug script again.
This is pretty key. The frontend should not notice an unbind/bind i.e.
there should be no evidence of it happening by examining states in
xenstore (from the guest side).
Paul
>
> Some solution could be to add an extra case at the end, like "ddev !=
> NULL && state == XenbusStateInitWait && hotplug-status != connected".
> And make sure xl devd won't call the same hotplug script multiple times
> for the same device _at the same time_ (I'm not sure about the async
> machinery here).
>
> But even if xl devd (aka xendriverdomain service) gets "fixes" to
> execute hotplug script in that case, I don't think it would work in
> backend in dom0 case - there, I think nothing watches already configured
> vif interfaces (there is no xl devd daemon in dom0, and xl background
> process watches only domain death and cdrom eject events).
>
> I'm adding toolstack maintainers, maybe they'll have some idea...
>
> In any case, the issue is not calling the hotplug script, responsible
> for configuring newly created vif interface. Not kernel waiting for it.
> So, I think both commits should still be reverted.
>
next prev parent reply other threads:[~2021-05-18 6:57 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-10 18:25 xen-netback hotplug-status regression bug Michael Brown
2021-04-13 7:12 ` Paul Durrant
2021-04-13 10:48 ` Michael Brown
2021-04-13 10:55 ` Paul Durrant
2021-04-13 15:14 ` Michael Brown
2021-04-13 15:25 ` [PATCH] xen-netback: Check for hotplug-status existence before watching Michael Brown
2021-04-13 19:12 ` Paul Durrant
2021-04-13 22:30 ` patchwork-bot+netdevbpf
2021-05-10 18:32 ` Marek Marczykowski-Górecki
2021-05-10 18:47 ` Michael Brown
2021-05-10 18:53 ` Marek Marczykowski-Górecki
2021-05-10 19:06 ` Michael Brown
2021-05-10 19:42 ` Marek Marczykowski-Górecki
2021-05-11 7:06 ` Durrant, Paul
2021-05-11 10:40 ` Marek Marczykowski-Górecki
2021-05-11 10:45 ` Marek Marczykowski-Górecki
2021-05-11 12:46 ` Durrant, Paul
2021-05-17 21:43 ` Marek Marczykowski-Górecki
2021-05-17 21:51 ` Michael Brown
2021-05-17 21:58 ` Marek Marczykowski-Górecki
2021-05-18 6:57 ` Paul Durrant [this message]
2021-05-18 9:18 ` Marek Marczykowski-Górecki
[not found] ` <887f9533f5c54bfabfbff7231eb99b08@EX13D32EUC003.ant.amazon.com>
[not found] ` <YKOMpXwcnr9QiXy8@mail-itl>
[not found] ` <2c23e102b6254e42877eb1e8fe68a4f7@EX13D32EUC003.ant.amazon.com>
2021-05-18 10:42 ` Marek Marczykowski-Górecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8b7a9cd5-3696-65c2-5656-a1c8eb174344@xen.org \
--to=xadimgnik@gmail.com \
--cc=anthony.perard@citrix.com \
--cc=iwj@xenproject.org \
--cc=marmarek@invisiblethingslab.com \
--cc=mbrown@fensystems.co.uk \
--cc=netdev@vger.kernel.org \
--cc=paul@xen.org \
--cc=pdurrant@amazon.co.uk \
--cc=wei.liu@kernel.org \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).