All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] hv_netvsc: Unbind exits before the VFs bound to it are unregistered
@ 2021-06-03 12:34 Leonid Bloch
  2021-06-03 18:04 ` Dexuan Cui
  0 siblings, 1 reply; 4+ messages in thread
From: Leonid Bloch @ 2021-06-03 12:34 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu, Dexuan Cui
  Cc: linux-hyperv, netdev

Hi,

When I try to unbind a network interface from hv_netvsc and bind it to 
uio_hv_generic, once in a while I get the following kernel panic (please 
note the first two lines: it seems as uio_hv_generic is registered 
before the VF bound to hv_netvsc is unregistered):

[Jun 3 09:04] hv_vmbus: registering driver uio_hv_generic
[  +0.002215] hv_netvsc 5e089342-8a78-4b76-9729-25c81bd338fc eth2: VF 
unregistering: eth5
[  +1.088078] BUG: scheduling while atomic: swapper/8/0/0x00010003
[  +0.000001] BUG: scheduling while atomic: swapper/3/0/0x00010003
[  +0.000001] BUG: scheduling while atomic: swapper/6/0/0x00010003
[  +0.000000] BUG: scheduling while atomic: swapper/7/0/0x00010003
[  +0.000005] Modules linked in:
[  +0.000001] Modules linked in:
[  +0.000001]  uio_hv_generic
[  +0.000000] Modules linked in:
[  +0.000000] Modules linked in:
[  +0.000001]  uio_hv_generic uio
[  +0.000001]  uio
[  +0.000000]  uio_hv_generic
[  +0.000000]  uio_hv_generic
...

I run kernel 5.10.27, unmodified, besides RT patch v36, on Azure Stack 
Edge platform, software version 2105 (2.2.1606.3320).

I perform the bind-unbind using the following script (please note the 
comment inline):

net_uuid="f8615163-df3e-46c5-913f-f2d2f965ed0e"
dev_uuid="$(basename "$(readlink "/sys/class/net/eth1/device")")"
modprobe uio_hv_generic
echo "${net_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id
printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/hv_netvsc/unbind
### If I insert 'sleep 1' here - all works correctly
printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/bind


Thanks,
Leonid.
___

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are unregistered
  2021-06-03 12:34 [BUG] hv_netvsc: Unbind exits before the VFs bound to it are unregistered Leonid Bloch
@ 2021-06-03 18:04 ` Dexuan Cui
  2021-06-04  8:14   ` Leonid Bloch
  0 siblings, 1 reply; 4+ messages in thread
From: Dexuan Cui @ 2021-06-03 18:04 UTC (permalink / raw)
  To: Leonid Bloch, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Long Li
  Cc: linux-hyperv, netdev

> From: Leonid Bloch <leonidb@asocscloud.com>
> Sent: Thursday, June 3, 2021 5:35 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Dexuan Cui
> <decui@microsoft.com>
> Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org
> Subject: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are
> unregistered
> 
> Hi,
> 
> When I try to unbind a network interface from hv_netvsc and bind it to
> uio_hv_generic, once in a while I get the following kernel panic (please
> note the first two lines: it seems as uio_hv_generic is registered
> before the VF bound to hv_netvsc is unregistered):
> 
> [Jun 3 09:04] hv_vmbus: registering driver uio_hv_generic
> [  +0.002215] hv_netvsc 5e089342-8a78-4b76-9729-25c81bd338fc eth2: VF
> unregistering: eth5
> [  +1.088078] BUG: scheduling while atomic: swapper/8/0/0x00010003
> [  +0.000001] BUG: scheduling while atomic: swapper/3/0/0x00010003
> [  +0.000001] BUG: scheduling while atomic: swapper/6/0/0x00010003
> [  +0.000000] BUG: scheduling while atomic: swapper/7/0/0x00010003
> [  +0.000005] Modules linked in:
> [  +0.000001] Modules linked in:
> [  +0.000001]  uio_hv_generic
> [  +0.000000] Modules linked in:
> [  +0.000000] Modules linked in:
> [  +0.000001]  uio_hv_generic uio
> [  +0.000001]  uio
> [  +0.000000]  uio_hv_generic
> [  +0.000000]  uio_hv_generic
> ...
> 
> I run kernel 5.10.27, unmodified, besides RT patch v36, on Azure Stack
> Edge platform, software version 2105 (2.2.1606.3320).
> 
> I perform the bind-unbind using the following script (please note the
> comment inline):
> 
> net_uuid="f8615163-df3e-46c5-913f-f2d2f965ed0e"
> dev_uuid="$(basename "$(readlink "/sys/class/net/eth1/device")")"
> modprobe uio_hv_generic
> echo "${net_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id
> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/hv_netvsc/unbind
> ### If I insert 'sleep 1' here - all works correctly
> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/bind
> 
> 
> Thanks,
> Leonid.

It would be great if you can test the mainline kernel, which I suspect also
has the bug.

It looks like netvsc_remove() -> netvsc_unregister_vf() does the unbinding work
in a synchronous mannter. I don't know why the bug happens.

Right now I don't have a DPDK setup to test this, but I think the bug can
be worked around by unbinding the PCI VF device from the pci-hyperv driver
before unbinding the netvsc device, and re-binding the VF device after binding
the netvsc device to uio_hv_generic.

Thanks,
-- Dexuan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are unregistered
  2021-06-03 18:04 ` Dexuan Cui
@ 2021-06-04  8:14   ` Leonid Bloch
  2021-06-04 17:25     ` Dexuan Cui
  0 siblings, 1 reply; 4+ messages in thread
From: Leonid Bloch @ 2021-06-04  8:14 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Long Li
  Cc: linux-hyperv, netdev

On 6/3/21 9:04 PM, Dexuan Cui wrote:
>> From: Leonid Bloch <leonidb@asocscloud.com>
>> Sent: Thursday, June 3, 2021 5:35 AM
>> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
>> <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Dexuan Cui
>> <decui@microsoft.com>
>> Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org
>> Subject: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are
>> unregistered
>>
>> Hi,
>>
>> When I try to unbind a network interface from hv_netvsc and bind it to
>> uio_hv_generic, once in a while I get the following kernel panic (please
>> note the first two lines: it seems as uio_hv_generic is registered
>> before the VF bound to hv_netvsc is unregistered):
>>
>> [Jun 3 09:04] hv_vmbus: registering driver uio_hv_generic
>> [  +0.002215] hv_netvsc 5e089342-8a78-4b76-9729-25c81bd338fc eth2: VF
>> unregistering: eth5
>> [  +1.088078] BUG: scheduling while atomic: swapper/8/0/0x00010003
>> [  +0.000001] BUG: scheduling while atomic: swapper/3/0/0x00010003
>> [  +0.000001] BUG: scheduling while atomic: swapper/6/0/0x00010003
>> [  +0.000000] BUG: scheduling while atomic: swapper/7/0/0x00010003
>> [  +0.000005] Modules linked in:
>> [  +0.000001] Modules linked in:
>> [  +0.000001]  uio_hv_generic
>> [  +0.000000] Modules linked in:
>> [  +0.000000] Modules linked in:
>> [  +0.000001]  uio_hv_generic uio
>> [  +0.000001]  uio
>> [  +0.000000]  uio_hv_generic
>> [  +0.000000]  uio_hv_generic
>> ...
>>
>> I run kernel 5.10.27, unmodified, besides RT patch v36, on Azure Stack
>> Edge platform, software version 2105 (2.2.1606.3320).
>>
>> I perform the bind-unbind using the following script (please note the
>> comment inline):
>>
>> net_uuid="f8615163-df3e-46c5-913f-f2d2f965ed0e"
>> dev_uuid="$(basename "$(readlink "/sys/class/net/eth1/device")")"
>> modprobe uio_hv_generic
>> echo "${net_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id
>> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/hv_netvsc/unbind
>> ### If I insert 'sleep 1' here - all works correctly
>> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/bind
>>
>>
>> Thanks,
>> Leonid.
> 
> It would be great if you can test the mainline kernel, which I suspect also
> has the bug.
> 
> It looks like netvsc_remove() -> netvsc_unregister_vf() does the unbinding work
> in a synchronous mannter. I don't know why the bug happens.
> 
> Right now I don't have a DPDK setup to test this, but I think the bug can
> be worked around by unbinding the PCI VF device from the pci-hyperv driver
> before unbinding the netvsc device, and re-binding the VF device after binding
> the netvsc device to uio_hv_generic.
> 
> Thanks,
> -- Dexuan
> 

Hi Dexuan,

Thanks for your reply. I can check for myself only next week, as I am 
out of office now, but do you think that the reason might be using 
cancel_delayed_work_sync(), instead of cancel_delayed_work() in 
netvsc_unregister_vf()?

And if the above is not correct, can you please advise on a way of 
finding the corresponding VF device from userspace, given the kernel 
name of the parent device? I did not find it in sysfs so far.

Thanks,
Leonid.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are unregistered
  2021-06-04  8:14   ` Leonid Bloch
@ 2021-06-04 17:25     ` Dexuan Cui
  0 siblings, 0 replies; 4+ messages in thread
From: Dexuan Cui @ 2021-06-04 17:25 UTC (permalink / raw)
  To: Leonid Bloch, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Long Li
  Cc: linux-hyperv, netdev

> From: Leonid Bloch <leonidb@asocscloud.com>
> Sent: Friday, June 4, 2021 1:14 AM
> To: Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Long Li
> <longli@microsoft.com>
> Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are
> unregistered
> 
> On 6/3/21 9:04 PM, Dexuan Cui wrote:
> >> From: Leonid Bloch <leonidb@asocscloud.com>
> >> Sent: Thursday, June 3, 2021 5:35 AM
> >> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> >> <haiyangz@microsoft.com>; Stephen Hemminger
> >> <sthemmin@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Dexuan Cui
> >> <decui@microsoft.com>
> >> Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org
> >> Subject: [BUG] hv_netvsc: Unbind exits before the VFs bound to it are
> >> unregistered
> >>
> >> Hi,
> >>
> >> When I try to unbind a network interface from hv_netvsc and bind it to
> >> uio_hv_generic, once in a while I get the following kernel panic (please
> >> note the first two lines: it seems as uio_hv_generic is registered
> >> before the VF bound to hv_netvsc is unregistered):
> >>
> >> [Jun 3 09:04] hv_vmbus: registering driver uio_hv_generic
> >> [  +0.002215] hv_netvsc 5e089342-8a78-4b76-9729-25c81bd338fc eth2:
> VF
> >> unregistering: eth5
> >> [  +1.088078] BUG: scheduling while atomic: swapper/8/0/0x00010003
> >> [  +0.000001] BUG: scheduling while atomic: swapper/3/0/0x00010003
> >> [  +0.000001] BUG: scheduling while atomic: swapper/6/0/0x00010003
> >> [  +0.000000] BUG: scheduling while atomic: swapper/7/0/0x00010003
> >> [  +0.000005] Modules linked in:
> >> [  +0.000001] Modules linked in:
> >> [  +0.000001]  uio_hv_generic
> >> [  +0.000000] Modules linked in:
> >> [  +0.000000] Modules linked in:
> >> [  +0.000001]  uio_hv_generic uio
> >> [  +0.000001]  uio
> >> [  +0.000000]  uio_hv_generic
> >> [  +0.000000]  uio_hv_generic
> >> ...
> >>
> >> I run kernel 5.10.27, unmodified, besides RT patch v36, on Azure Stack
> >> Edge platform, software version 2105 (2.2.1606.3320).
> >>
> >> I perform the bind-unbind using the following script (please note the
> >> comment inline):
> >>
> >> net_uuid="f8615163-df3e-46c5-913f-f2d2f965ed0e"
> >> dev_uuid="$(basename "$(readlink "/sys/class/net/eth1/device")")"
> >> modprobe uio_hv_generic
> >> echo "${net_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id
> >> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/hv_netvsc/unbind
> >> ### If I insert 'sleep 1' here - all works correctly
> >> printf "%s" "${dev_uuid}" > /sys/bus/vmbus/drivers/uio_hv_generic/bind
> >>
> >>
> >> Thanks,
> >> Leonid.
> >
> > It would be great if you can test the mainline kernel, which I suspect also
> > has the bug.
> >
> > It looks like netvsc_remove() -> netvsc_unregister_vf() does the unbinding
> work
> > in a synchronous mannter. I don't know why the bug happens.
> >
> > Right now I don't have a DPDK setup to test this, but I think the bug can
> > be worked around by unbinding the PCI VF device from the pci-hyperv driver
> > before unbinding the netvsc device, and re-binding the VF device after
> binding
> > the netvsc device to uio_hv_generic.
> >
> > Thanks,
> > -- Dexuan
> >
> 
> Hi Dexuan,
> 
> Thanks for your reply. I can check for myself only next week, as I am
> out of office now, but do you think that the reason might be using
> cancel_delayed_work_sync(), instead of cancel_delayed_work() in
> netvsc_unregister_vf()?

I'm not sure. I don't understand how the error happens:
[  +1.088078] BUG: scheduling while atomic: swapper/8/0/0x00010003
 
> And if the above is not correct, can you please advise on a way of
> finding the corresponding VF device from userspace, given the kernel
> name of the parent device? I did not find it in sysfs so far.
> 
> Thanks,
> Leonid.

The VF NIC interface's MAC address is the same as the that of the matching 
netvsc NIC. We should be able to find the <netvsc NIC, VF NIC> pair by
checking /sys/class/net/*/address.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-04 17:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 12:34 [BUG] hv_netvsc: Unbind exits before the VFs bound to it are unregistered Leonid Bloch
2021-06-03 18:04 ` Dexuan Cui
2021-06-04  8:14   ` Leonid Bloch
2021-06-04 17:25     ` Dexuan Cui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.