All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
@ 2015-07-13 18:57 clsoto
  2015-07-13 21:05 ` Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: clsoto @ 2015-07-13 18:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, brking, j.vosburgh, gospo, Carol L Soto

From: Carol L Soto <clsoto@linux.vnet.ibm.com>

Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
trace at rmmod bonding.

[68830.202239] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[68830.202257] ------------[ cut here ]------------
[68830.202260] WARNING: at fs/proc/generic.c:562
[68830.202412] NIP [c0000000002abf6c] .remove_proc_entry+0x1fc/0x240
[68830.202416] LR [c0000000002abf68] .remove_proc_entry+0x1f8/0x240
[68830.202419] PACATMSCRATCH [8000000000009032]
[68830.202421] Call Trace:
[68830.202424] [c000000179277940] [c0000000002abf68] 
.remove_proc_entry+0x1f8/0x240 (unreliable)
[68830.202434] [c0000001792779f0] [d0000000053229a4] 
.bond_destroy_proc_dir+0x34/0x54 [bonding]
[68830.202440] [c000000179277a70] [d0000000053130e0] 
.bond_net_exit+0x90/0x120 [bonding]
[68830.202445] [c000000179277b10] [c00000000059944c] 
.ops_exit_list.isra.0+0x6c/0xd0
[68830.202450] [c000000179277ba0] [c000000000599774] 
.unregister_pernet_operations+0x94/0x100
[68830.202454] [c000000179277c40] [c000000000599814] 
.unregister_pernet_subsys+0x34/0x60
[68830.202460] [c000000179277cc0] [d000000005323758] 
.bonding_exit+0x48/0x2328 [bonding]
[68830.202466] [c000000179277d30] [c00000000010dcc4] 
.SyS_delete_module+0x1f4/0x340
[68830.202471] [c000000179277e30] [c000000000009e7c] 
syscall_exit+0x0/0x7c
[68830.202491] ---[ end trace 9bd1d810219c9875 ]---

Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
---
 drivers/net/bonding/bond_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990..ace105a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev,
 		dev_set_mac_address(slave_dev, &addr);
 	}
 
+	bond_remove_proc_entry(bond);
+
 	dev_set_mtu(slave_dev, slave->original_mtu);
 
 	slave_dev->priv_flags &= ~IFF_BONDING;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
  2015-07-13 18:57 [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one clsoto
@ 2015-07-13 21:05 ` Nikolay Aleksandrov
  2015-07-13 21:10   ` Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-13 21:05 UTC (permalink / raw)
  To: clsoto, davem; +Cc: netdev, brking, j.vosburgh, gospo

On 07/13/2015 08:57 PM, clsoto@linux.vnet.ibm.com wrote:
> From: Carol L Soto <clsoto@linux.vnet.ibm.com>
> 
> Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
> trace at rmmod bonding.
> 
> [68830.202239] remove_proc_entry: removing non-empty directory
> 'net/bonding', leaking at least 'bond0'
> [68830.202257] ------------[ cut here ]------------
> [68830.202260] WARNING: at fs/proc/generic.c:562
> [68830.202412] NIP [c0000000002abf6c] .remove_proc_entry+0x1fc/0x240
> [68830.202416] LR [c0000000002abf68] .remove_proc_entry+0x1f8/0x240
> [68830.202419] PACATMSCRATCH [8000000000009032]
> [68830.202421] Call Trace:
> [68830.202424] [c000000179277940] [c0000000002abf68] 
> .remove_proc_entry+0x1f8/0x240 (unreliable)
> [68830.202434] [c0000001792779f0] [d0000000053229a4] 
> .bond_destroy_proc_dir+0x34/0x54 [bonding]
> [68830.202440] [c000000179277a70] [d0000000053130e0] 
> .bond_net_exit+0x90/0x120 [bonding]
> [68830.202445] [c000000179277b10] [c00000000059944c] 
> .ops_exit_list.isra.0+0x6c/0xd0
> [68830.202450] [c000000179277ba0] [c000000000599774] 
> .unregister_pernet_operations+0x94/0x100
> [68830.202454] [c000000179277c40] [c000000000599814] 
> .unregister_pernet_subsys+0x34/0x60
> [68830.202460] [c000000179277cc0] [d000000005323758] 
> .bonding_exit+0x48/0x2328 [bonding]
> [68830.202466] [c000000179277d30] [c00000000010dcc4] 
> .SyS_delete_module+0x1f4/0x340
> [68830.202471] [c000000179277e30] [c000000000009e7c] 
> syscall_exit+0x0/0x7c
> [68830.202491] ---[ end trace 9bd1d810219c9875 ]---
> 
> Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
> ---
>  drivers/net/bonding/bond_main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 19eb990..ace105a 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev,
>  		dev_set_mac_address(slave_dev, &addr);
>  	}
>  
> +	bond_remove_proc_entry(bond);
> +
>  	dev_set_mtu(slave_dev, slave->original_mtu);
>  
>  	slave_dev->priv_flags &= ~IFF_BONDING;
> 

This is incorrect, it tries to remove the bond entry on every slave release
so if we have a bonding device with >= 2 slaves and release one of them then
the whole bond device entry will be removed from /proc/net/bonding.
You can hit this case only if you had created a bonding device while doing the
rmmod bonding (it's an old race condition which was fixed long time ago, but
the procfs was apparently missed) and only after the notifier has been
unregistered but before the sysfs has been removed.

Since the bonding netdevice notifier is handling the procfs creation/destruction
we could try moving the unregister after the pernet destruction which should
help avoid such problems. Could you try the following patch:


diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990d398c..d515ee38b77f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4682,12 +4682,10 @@ err_link:
 
 static void __exit bonding_exit(void)
 {
-	unregister_netdevice_notifier(&bond_netdev_notifier);
-
 	bond_destroy_debugfs();
-
 	bond_netlink_fini();
 	unregister_pernet_subsys(&bond_net_ops);
+	unregister_netdevice_notifier(&bond_netdev_notifier);
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	/* Make sure we don't have an imbalance on our netpoll blocking */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
  2015-07-13 21:05 ` Nikolay Aleksandrov
@ 2015-07-13 21:10   ` Nikolay Aleksandrov
  2015-07-15 17:49     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-13 21:10 UTC (permalink / raw)
  To: clsoto, davem; +Cc: netdev, brking, j.vosburgh, gospo

On 07/13/2015 11:05 PM, Nikolay Aleksandrov wrote:
> On 07/13/2015 08:57 PM, clsoto@linux.vnet.ibm.com wrote:
>> From: Carol L Soto <clsoto@linux.vnet.ibm.com>
>>
>> Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
>> trace at rmmod bonding.
>>
>> [68830.202239] remove_proc_entry: removing non-empty directory
>> 'net/bonding', leaking at least 'bond0'
>> [68830.202257] ------------[ cut here ]------------
>> [68830.202260] WARNING: at fs/proc/generic.c:562
>> [68830.202412] NIP [c0000000002abf6c] .remove_proc_entry+0x1fc/0x240
>> [68830.202416] LR [c0000000002abf68] .remove_proc_entry+0x1f8/0x240
>> [68830.202419] PACATMSCRATCH [8000000000009032]
>> [68830.202421] Call Trace:
>> [68830.202424] [c000000179277940] [c0000000002abf68] 
>> .remove_proc_entry+0x1f8/0x240 (unreliable)
>> [68830.202434] [c0000001792779f0] [d0000000053229a4] 
>> .bond_destroy_proc_dir+0x34/0x54 [bonding]
>> [68830.202440] [c000000179277a70] [d0000000053130e0] 
>> .bond_net_exit+0x90/0x120 [bonding]
>> [68830.202445] [c000000179277b10] [c00000000059944c] 
>> .ops_exit_list.isra.0+0x6c/0xd0
>> [68830.202450] [c000000179277ba0] [c000000000599774] 
>> .unregister_pernet_operations+0x94/0x100
>> [68830.202454] [c000000179277c40] [c000000000599814] 
>> .unregister_pernet_subsys+0x34/0x60
>> [68830.202460] [c000000179277cc0] [d000000005323758] 
>> .bonding_exit+0x48/0x2328 [bonding]
>> [68830.202466] [c000000179277d30] [c00000000010dcc4] 
>> .SyS_delete_module+0x1f4/0x340
>> [68830.202471] [c000000179277e30] [c000000000009e7c] 
>> syscall_exit+0x0/0x7c
>> [68830.202491] ---[ end trace 9bd1d810219c9875 ]---
>>
>> Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
>> ---
>>  drivers/net/bonding/bond_main.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 19eb990..ace105a 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev,
>>  		dev_set_mac_address(slave_dev, &addr);
>>  	}
>>  
>> +	bond_remove_proc_entry(bond);
>> +
>>  	dev_set_mtu(slave_dev, slave->original_mtu);
>>  
>>  	slave_dev->priv_flags &= ~IFF_BONDING;
>>
> 
> This is incorrect, it tries to remove the bond entry on every slave release
> so if we have a bonding device with >= 2 slaves and release one of them then
> the whole bond device entry will be removed from /proc/net/bonding.
<<<<>>>>
> You can hit this case only if you had created a bonding device while doing the
> rmmod bonding (it's an old race condition which was fixed long time ago, but
> the procfs was apparently missed) and only after the notifier has been
> unregistered but before the sysfs has been removed.
> 
Scratch this part, it should be triggered in a different way.
Could you provide a way to reproduce ?

> Since the bonding netdevice notifier is handling the procfs creation/destruction
> we could try moving the unregister after the pernet destruction which should
> help avoid such problems. Could you try the following patch:
> 
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 19eb990d398c..d515ee38b77f 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4682,12 +4682,10 @@ err_link:
>  
>  static void __exit bonding_exit(void)
>  {
> -	unregister_netdevice_notifier(&bond_netdev_notifier);
> -
>  	bond_destroy_debugfs();
> -
>  	bond_netlink_fini();
>  	unregister_pernet_subsys(&bond_net_ops);
> +	unregister_netdevice_notifier(&bond_netdev_notifier);
>  
>  #ifdef CONFIG_NET_POLL_CONTROLLER
>  	/* Make sure we don't have an imbalance on our netpoll blocking */
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
  2015-07-13 21:10   ` Nikolay Aleksandrov
@ 2015-07-15 17:49     ` Nikolay Aleksandrov
  2015-07-15 19:52       ` [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether Nikolay Aleksandrov
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-15 17:49 UTC (permalink / raw)
  To: clsoto, davem; +Cc: netdev, brking, j.vosburgh, gospo

On 07/13/2015 11:10 PM, Nikolay Aleksandrov wrote:
> On 07/13/2015 11:05 PM, Nikolay Aleksandrov wrote:
>> On 07/13/2015 08:57 PM, clsoto@linux.vnet.ibm.com wrote:
>>> From: Carol L Soto <clsoto@linux.vnet.ibm.com>
>>>
>>> Add function bond_remove_proc_entry at __bond_release_one to avoid stack 
>>> trace at rmmod bonding.
>>>
>>> [68830.202239] remove_proc_entry: removing non-empty directory
>>> 'net/bonding', leaking at least 'bond0'
>>> [68830.202257] ------------[ cut here ]------------
>>> [68830.202260] WARNING: at fs/proc/generic.c:562
>>> [68830.202412] NIP [c0000000002abf6c] .remove_proc_entry+0x1fc/0x240
>>> [68830.202416] LR [c0000000002abf68] .remove_proc_entry+0x1f8/0x240
>>> [68830.202419] PACATMSCRATCH [8000000000009032]
>>> [68830.202421] Call Trace:
>>> [68830.202424] [c000000179277940] [c0000000002abf68] 
>>> .remove_proc_entry+0x1f8/0x240 (unreliable)
>>> [68830.202434] [c0000001792779f0] [d0000000053229a4] 
>>> .bond_destroy_proc_dir+0x34/0x54 [bonding]
>>> [68830.202440] [c000000179277a70] [d0000000053130e0] 
>>> .bond_net_exit+0x90/0x120 [bonding]
>>> [68830.202445] [c000000179277b10] [c00000000059944c] 
>>> .ops_exit_list.isra.0+0x6c/0xd0
>>> [68830.202450] [c000000179277ba0] [c000000000599774] 
>>> .unregister_pernet_operations+0x94/0x100
>>> [68830.202454] [c000000179277c40] [c000000000599814] 
>>> .unregister_pernet_subsys+0x34/0x60
>>> [68830.202460] [c000000179277cc0] [d000000005323758] 
>>> .bonding_exit+0x48/0x2328 [bonding]
>>> [68830.202466] [c000000179277d30] [c00000000010dcc4] 
>>> .SyS_delete_module+0x1f4/0x340
>>> [68830.202471] [c000000179277e30] [c000000000009e7c] 
>>> syscall_exit+0x0/0x7c
>>> [68830.202491] ---[ end trace 9bd1d810219c9875 ]---
>>>
>>> Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 19eb990..ace105a 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev,
>>>  		dev_set_mac_address(slave_dev, &addr);
>>>  	}
>>>  
>>> +	bond_remove_proc_entry(bond);
>>> +
>>>  	dev_set_mtu(slave_dev, slave->original_mtu);
>>>  
>>>  	slave_dev->priv_flags &= ~IFF_BONDING;
>>>
>>
>> This is incorrect, it tries to remove the bond entry on every slave release
>> so if we have a bonding device with >= 2 slaves and release one of them then
>> the whole bond device entry will be removed from /proc/net/bonding.
> <<<<>>>>
>> You can hit this case only if you had created a bonding device while doing the
>> rmmod bonding (it's an old race condition which was fixed long time ago, but
>> the procfs was apparently missed) and only after the notifier has been
>> unregistered but before the sysfs has been removed.
>>
> Scratch this part, it should be triggered in a different way.
> Could you provide a way to reproduce ?
> 
>> Since the bonding netdevice notifier is handling the procfs creation/destruction
>> we could try moving the unregister after the pernet destruction which should
>> help avoid such problems. Could you try the following patch:
>>
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 19eb990d398c..d515ee38b77f 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -4682,12 +4682,10 @@ err_link:
>>  
>>  static void __exit bonding_exit(void)
>>  {
>> -	unregister_netdevice_notifier(&bond_netdev_notifier);
>> -
>>  	bond_destroy_debugfs();
>> -
>>  	bond_netlink_fini();
>>  	unregister_pernet_subsys(&bond_net_ops);
>> +	unregister_netdevice_notifier(&bond_netdev_notifier);
>>  
>>  #ifdef CONFIG_NET_POLL_CONTROLLER
>>  	/* Make sure we don't have an imbalance on our netpoll blocking */
>>

After we had a private discussion I was able to reproduce this issue with
tap devices (!= ARPHRD_ETHER and can be enslaved):

[14446.539000] bond0: Releasing active interface tun1
[14446.548333] bond0 (unregistering): Released all slaves
[14446.564200] ------------[ cut here ]------------
[14446.564208] WARNING: CPU: 0 PID: 6319 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160()
[14446.564211] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0'
[14446.564212] Modules linked in: tun bonding(-) eql(O) 9p stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev hid_generic usbhid hid snd_hda_codec_generic ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse pcspkr snd_hda_intel evdev snd_hda_codec snd_hwdep qxl snd_hda_core serio_raw snd_pcm snd_timer snd 9pnet_virtio 9pnet virtio_balloon soundcore i2c_piix4 drm_kms_helper ttm drm i2c_core virtio_console pvpanic parport_pc parport acpi_cpufreq processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net e1000 ata_piix floppy ehci_pci uhci_hcd ehci_hcd libata scsi_mod usbcore usb_common virtio_pci virtio_ring virtio [l
 ast unloaded: bridge]

[14446.564295] CPU: 0 PID: 6319 Comm: rmmod Tainted: G           O    4.2.0-rc2+ #6
[14446.564296] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[14446.564299]  0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff880035bcbda8
[14446.564302]  ffffffff8106c521 ffff8800367c5f78 ffff8800367c5f40 ffff88003e3a4280
[14446.564304]  ffffffffa05a5040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0
[14446.564307] Call Trace:
[14446.564313]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[14446.564317]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[14446.564320]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[14446.564323]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[14446.564329]  [<ffffffffa059d0d6>] ? bond_destroy_proc_dir+0x26/0x30 [bonding]
[14446.564332]  [<ffffffffa058d40e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[14446.564336]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[14446.564340]  [<ffffffff8142f52d>] ? unregister_pernet_operations+0x8d/0xd0
[14446.564343]  [<ffffffff8142f58d>] ? unregister_pernet_subsys+0x1d/0x30
[14446.564346]  [<ffffffffa059d259>] ? bonding_exit+0x23/0xdca [bonding]
[14446.564350]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[14446.564354]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[14446.564357]  [<ffffffff8152b732>] ? entry_SYSCALL_64_fastpath+0x16/0x75
[14446.564360] ---[ end trace a911dbcedf315986 ]---

The problem is in bond_release_and_destroy() and I'll post a proper fix
for -net in a few minutes after I run some tests.

Cheers,
 Nik

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether
  2015-07-15 17:49     ` Nikolay Aleksandrov
@ 2015-07-15 19:52       ` Nikolay Aleksandrov
  2015-07-15 21:01         ` Carol Soto
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-15 19:52 UTC (permalink / raw)
  To: netdev
  Cc: j.vosburgh, gospo, vfalico, clsoto, ebiederm, davem, Nikolay Aleksandrov

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):
[  908.963051] eql: event: 9
[  908.963052] eql: IFF_SLAVE
[  908.963054] eql: event: 2
[  908.963056] eql: IFF_SLAVE
[  908.963058] eql: event: 6
[  908.963059] eql: IFF_SLAVE
[  908.963110] bond0: Releasing active interface eql
[  908.976168] bond0: Destroying bond bond0
[  908.976266] bond0 (unregistering): Released all slaves
[  908.984097] ------------[ cut here ]------------
[  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[  908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]

[  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
4.2.0-rc2+ #8
[  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
ffff8800358dfda8
[  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
ffff88003e3a4280
[  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
ffffffff8172ebd0
[  908.984181] Call Trace:
[  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
[bonding]
[  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[  908.984225]  [<ffffffff8142f52d>] ?
unregister_pernet_operations+0x8d/0xd0
[  908.984228]  [<ffffffff8142f58d>] ?
unregister_pernet_subsys+0x1d/0x30
[  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
[  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[  908.984244]  [<ffffffff8152b732>] ?
entry_SYSCALL_64_fastpath+0x16/0x75
[  908.984247] ---[ end trace 7c006ed4abbef24b ]---

Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit
f9399814927ad ("bonding: Don't allow bond devices to change network
namespaces.") that can't happen anyway.

Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
                      the netdev events")
---
 drivers/net/bonding/bond_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 317a49480475..ec1404ec4d2f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
 		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
 		netdev_info(bond_dev, "Destroying bond %s\n",
 			    bond_dev->name);
+		bond_remove_proc_entry(bond);
 		unregister_netdevice(bond_dev);
 	}
 	return ret;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether
  2015-07-15 19:52       ` [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether Nikolay Aleksandrov
@ 2015-07-15 21:01         ` Carol Soto
  2015-07-15 22:39         ` Eric W. Biederman
  2015-07-20 19:56         ` David Miller
  2 siblings, 0 replies; 11+ messages in thread
From: Carol Soto @ 2015-07-15 21:01 UTC (permalink / raw)
  To: Nikolay Aleksandrov, netdev
  Cc: j.vosburgh, gospo, vfalico, ebiederm, davem, Nikolay Aleksandrov



On 7/15/2015 2:52 PM, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
> When the bonding is being unloaded and the netdevice notifier is
> unregistered it executes NETDEV_UNREGISTER for each device which should
> remove the bond's proc entry but if the device enslaved is not of
> ARPHRD_ETHER type and is in front of the bonding, it may execute
> bond_release_and_destroy() first which would release the last slave and
> destroy the bond device leaving the proc entry and thus we will get the
> following error (with dynamic debug on for bond_netdev_event to see the
> events order):
> [  908.963051] eql: event: 9
> [  908.963052] eql: IFF_SLAVE
> [  908.963054] eql: event: 2
> [  908.963056] eql: IFF_SLAVE
> [  908.963058] eql: event: 6
> [  908.963059] eql: IFF_SLAVE
> [  908.963110] bond0: Releasing active interface eql
> [  908.976168] bond0: Destroying bond bond0
> [  908.976266] bond0 (unregistering): Released all slaves
> [  908.984097] ------------[ cut here ]------------
> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
> remove_proc_entry+0x112/0x160()
> [  908.984110] remove_proc_entry: removing non-empty directory
> 'net/bonding', leaking at least 'bond0'
> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
> usb_common [last unloaded: bonding]
>
> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
> 4.2.0-rc2+ #8
> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
> ffff8800358dfda8
> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
> ffff88003e3a4280
> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
> ffffffff8172ebd0
> [  908.984181] Call Trace:
> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
> [bonding]
> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
> [  908.984225]  [<ffffffff8142f52d>] ?
> unregister_pernet_operations+0x8d/0xd0
> [  908.984228]  [<ffffffff8142f58d>] ?
> unregister_pernet_subsys+0x1d/0x30
> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
> [  908.984244]  [<ffffffff8152b732>] ?
> entry_SYSCALL_64_fastpath+0x16/0x75
> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>
> Thus remove the proc entry manually if bond_release_and_destroy() is
> used. Because of the checks in bond_remove_proc_entry() it's not a
> problem for a bond device to change namespaces (the bug fixed by the
> Fixes commit) but since commit
> f9399814927ad ("bonding: Don't allow bond devices to change network
> namespaces.") that can't happen anyway.
>
> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>                        the netdev events")
> ---
>   drivers/net/bonding/bond_main.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 317a49480475..ec1404ec4d2f 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>   		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>   		netdev_info(bond_dev, "Destroying bond %s\n",
>   			    bond_dev->name);
> +		bond_remove_proc_entry(bond);
>   		unregister_netdevice(bond_dev);
>   	}
>   	return ret;
Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether
  2015-07-15 19:52       ` [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether Nikolay Aleksandrov
  2015-07-15 21:01         ` Carol Soto
@ 2015-07-15 22:39         ` Eric W. Biederman
  2015-07-15 22:54           ` Nikolay Aleksandrov
  2015-07-20 19:56         ` David Miller
  2 siblings, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 2015-07-15 22:39 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, j.vosburgh, gospo, vfalico, clsoto, davem, Nikolay Aleksandrov

Nikolay Aleksandrov <razor@blackwall.org> writes:

> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
> When the bonding is being unloaded and the netdevice notifier is
> unregistered it executes NETDEV_UNREGISTER for each device which should
> remove the bond's proc entry but if the device enslaved is not of
> ARPHRD_ETHER type and is in front of the bonding, it may execute
> bond_release_and_destroy() first which would release the last slave and
> destroy the bond device leaving the proc entry and thus we will get the
> following error (with dynamic debug on for bond_netdev_event to see the
> events order):

I see the failure below.  I am perplexed at the description.  Does the
bonding driver actually make sense on a non-ethernet device?

Would it not be better to make more sense to limit bonding to ethernet
devices so we don't get weird behavior?  I imagine there might be other
problems with bonding non-ethernet devices that no one has spotted,
or cares about.

Eric


> [  908.963051] eql: event: 9
> [  908.963052] eql: IFF_SLAVE
> [  908.963054] eql: event: 2
> [  908.963056] eql: IFF_SLAVE
> [  908.963058] eql: event: 6
> [  908.963059] eql: IFF_SLAVE
> [  908.963110] bond0: Releasing active interface eql
> [  908.976168] bond0: Destroying bond bond0
> [  908.976266] bond0 (unregistering): Released all slaves
> [  908.984097] ------------[ cut here ]------------
> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
> remove_proc_entry+0x112/0x160()
> [  908.984110] remove_proc_entry: removing non-empty directory
> 'net/bonding', leaking at least 'bond0'
> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
> usb_common [last unloaded: bonding]
>
> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
> 4.2.0-rc2+ #8
> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
> ffff8800358dfda8
> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
> ffff88003e3a4280
> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
> ffffffff8172ebd0
> [  908.984181] Call Trace:
> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
> [bonding]
> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
> [  908.984225]  [<ffffffff8142f52d>] ?
> unregister_pernet_operations+0x8d/0xd0
> [  908.984228]  [<ffffffff8142f58d>] ?
> unregister_pernet_subsys+0x1d/0x30
> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
> [  908.984244]  [<ffffffff8152b732>] ?
> entry_SYSCALL_64_fastpath+0x16/0x75
> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>
> Thus remove the proc entry manually if bond_release_and_destroy() is
> used. Because of the checks in bond_remove_proc_entry() it's not a
> problem for a bond device to change namespaces (the bug fixed by the
> Fixes commit) but since commit
> f9399814927ad ("bonding: Don't allow bond devices to change network
> namespaces.") that can't happen anyway.
>
> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>                       the netdev events")
> ---
>  drivers/net/bonding/bond_main.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 317a49480475..ec1404ec4d2f 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>  		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>  		netdev_info(bond_dev, "Destroying bond %s\n",
>  			    bond_dev->name);
> +		bond_remove_proc_entry(bond);
>  		unregister_netdevice(bond_dev);
>  	}
>  	return ret;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether
  2015-07-15 22:39         ` Eric W. Biederman
@ 2015-07-15 22:54           ` Nikolay Aleksandrov
  2015-07-15 22:58             ` Carol Soto
  2015-07-16  6:14             ` Veaceslav Falico
  0 siblings, 2 replies; 11+ messages in thread
From: Nikolay Aleksandrov @ 2015-07-15 22:54 UTC (permalink / raw)
  To: Eric W. Biederman, Nikolay Aleksandrov
  Cc: netdev, j.vosburgh, gospo, vfalico, clsoto, davem

On 07/16/2015 12:39 AM, Eric W. Biederman wrote:
> Nikolay Aleksandrov <razor@blackwall.org> writes:
> 
>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>
>> When the bonding is being unloaded and the netdevice notifier is
>> unregistered it executes NETDEV_UNREGISTER for each device which should
>> remove the bond's proc entry but if the device enslaved is not of
>> ARPHRD_ETHER type and is in front of the bonding, it may execute
>> bond_release_and_destroy() first which would release the last slave and
>> destroy the bond device leaving the proc entry and thus we will get the
>> following error (with dynamic debug on for bond_netdev_event to see the
>> events order):
> 
> I see the failure below.  I am perplexed at the description.  Does the
> bonding driver actually make sense on a non-ethernet device?
> 
Sometimes it does, I've seen it used with infiniband devices a lot, also
there were cases of some ppp users.

> Would it not be better to make more sense to limit bonding to ethernet
> devices so we don't get weird behavior?  I imagine there might be other
> problems with bonding non-ethernet devices that no one has spotted,
> or cares about.
> 
> Eric
> 
> 
My personal opinion would be to disable non-ethernet devices, but support was
already added and has been there for a long time so we have to fix this for
the older releases, I don't mind removing non-ethernet device support for net-next
but I'm guessing there're people still using that like the case that started
this thread.

Cheers,
 Nik

>> [  908.963051] eql: event: 9
>> [  908.963052] eql: IFF_SLAVE
>> [  908.963054] eql: event: 2
>> [  908.963056] eql: IFF_SLAVE
>> [  908.963058] eql: event: 6
>> [  908.963059] eql: IFF_SLAVE
>> [  908.963110] bond0: Releasing active interface eql
>> [  908.976168] bond0: Destroying bond bond0
>> [  908.976266] bond0 (unregistering): Released all slaves
>> [  908.984097] ------------[ cut here ]------------
>> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
>> remove_proc_entry+0x112/0x160()
>> [  908.984110] remove_proc_entry: removing non-empty directory
>> 'net/bonding', leaking at least 'bond0'
>> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
>> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
>> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
>> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
>> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
>> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
>> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
>> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
>> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
>> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
>> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
>> usb_common [last unloaded: bonding]
>>
>> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
>> 4.2.0-rc2+ #8
>> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
>> ffff8800358dfda8
>> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
>> ffff88003e3a4280
>> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
>> ffffffff8172ebd0
>> [  908.984181] Call Trace:
>> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
>> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
>> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
>> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
>> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
>> [bonding]
>> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
>> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
>> [  908.984225]  [<ffffffff8142f52d>] ?
>> unregister_pernet_operations+0x8d/0xd0
>> [  908.984228]  [<ffffffff8142f58d>] ?
>> unregister_pernet_subsys+0x1d/0x30
>> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
>> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
>> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
>> [  908.984244]  [<ffffffff8152b732>] ?
>> entry_SYSCALL_64_fastpath+0x16/0x75
>> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>>
>> Thus remove the proc entry manually if bond_release_and_destroy() is
>> used. Because of the checks in bond_remove_proc_entry() it's not a
>> problem for a bond device to change namespaces (the bug fixed by the
>> Fixes commit) but since commit
>> f9399814927ad ("bonding: Don't allow bond devices to change network
>> namespaces.") that can't happen anyway.
>>
>> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>>                       the netdev events")
>> ---
>>  drivers/net/bonding/bond_main.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 317a49480475..ec1404ec4d2f 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>>  		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>>  		netdev_info(bond_dev, "Destroying bond %s\n",
>>  			    bond_dev->name);
>> +		bond_remove_proc_entry(bond);
>>  		unregister_netdevice(bond_dev);
>>  	}
>>  	return ret;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether
  2015-07-15 22:54           ` Nikolay Aleksandrov
@ 2015-07-15 22:58             ` Carol Soto
  2015-07-16  6:14             ` Veaceslav Falico
  1 sibling, 0 replies; 11+ messages in thread
From: Carol Soto @ 2015-07-15 22:58 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Eric W. Biederman, Nikolay Aleksandrov
  Cc: netdev, j.vosburgh, gospo, vfalico, davem



On 7/15/2015 5:54 PM, Nikolay Aleksandrov wrote:
> On 07/16/2015 12:39 AM, Eric W. Biederman wrote:
>> Nikolay Aleksandrov <razor@blackwall.org> writes:
>>
>>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>
>>> When the bonding is being unloaded and the netdevice notifier is
>>> unregistered it executes NETDEV_UNREGISTER for each device which should
>>> remove the bond's proc entry but if the device enslaved is not of
>>> ARPHRD_ETHER type and is in front of the bonding, it may execute
>>> bond_release_and_destroy() first which would release the last slave and
>>> destroy the bond device leaving the proc entry and thus we will get the
>>> following error (with dynamic debug on for bond_netdev_event to see the
>>> events order):
>> I see the failure below.  I am perplexed at the description.  Does the
>> bonding driver actually make sense on a non-ethernet device?
>>
> Sometimes it does, I've seen it used with infiniband devices a lot, also
> there were cases of some ppp users.
>
>> Would it not be better to make more sense to limit bonding to ethernet
>> devices so we don't get weird behavior?  I imagine there might be other
>> problems with bonding non-ethernet devices that no one has spotted,
>> or cares about.
>>
>> Eric
>>
>>
> My personal opinion would be to disable non-ethernet devices, but support was
> already added and has been there for a long time so we have to fix this for
> the older releases, I don't mind removing non-ethernet device support for net-next
> but I'm guessing there're people still using that like the case that started
> this thread.
>
> Cheers,
>   Nik
Yes, there are Infiniband users that uses bonding.

Carol
>>> [  908.963051] eql: event: 9
>>> [  908.963052] eql: IFF_SLAVE
>>> [  908.963054] eql: event: 2
>>> [  908.963056] eql: IFF_SLAVE
>>> [  908.963058] eql: event: 6
>>> [  908.963059] eql: IFF_SLAVE
>>> [  908.963110] bond0: Releasing active interface eql
>>> [  908.976168] bond0: Destroying bond bond0
>>> [  908.976266] bond0 (unregistering): Released all slaves
>>> [  908.984097] ------------[ cut here ]------------
>>> [  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
>>> remove_proc_entry+0x112/0x160()
>>> [  908.984110] remove_proc_entry: removing non-empty directory
>>> 'net/bonding', leaking at least 'bond0'
>>> [  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
>>> oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
>>> crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
>>> snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
>>> gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
>>> psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
>>> drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
>>> pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
>>> autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
>>> ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
>>> virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
>>> usb_common [last unloaded: bonding]
>>>
>>> [  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
>>> 4.2.0-rc2+ #8
>>> [  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> [  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
>>> ffff8800358dfda8
>>> [  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
>>> ffff88003e3a4280
>>> [  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
>>> ffffffff8172ebd0
>>> [  908.984181] Call Trace:
>>> [  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
>>> [  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
>>> [  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
>>> [  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
>>> [  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
>>> [bonding]
>>> [  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
>>> [  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
>>> [  908.984225]  [<ffffffff8142f52d>] ?
>>> unregister_pernet_operations+0x8d/0xd0
>>> [  908.984228]  [<ffffffff8142f58d>] ?
>>> unregister_pernet_subsys+0x1d/0x30
>>> [  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
>>> [  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
>>> [  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
>>> [  908.984244]  [<ffffffff8152b732>] ?
>>> entry_SYSCALL_64_fastpath+0x16/0x75
>>> [  908.984247] ---[ end trace 7c006ed4abbef24b ]---
>>>
>>> Thus remove the proc entry manually if bond_release_and_destroy() is
>>> used. Because of the checks in bond_remove_proc_entry() it's not a
>>> problem for a bond device to change namespaces (the bug fixed by the
>>> Fixes commit) but since commit
>>> f9399814927ad ("bonding: Don't allow bond devices to change network
>>> namespaces.") that can't happen anyway.
>>>
>>> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>>>                        the netdev events")
>>> ---
>>>   drivers/net/bonding/bond_main.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 317a49480475..ec1404ec4d2f 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1916,6 +1916,7 @@ static int  bond_release_and_destroy(struct net_device *bond_dev,
>>>   		bond_dev->priv_flags |= IFF_DISABLE_NETPOLL;
>>>   		netdev_info(bond_dev, "Destroying bond %s\n",
>>>   			    bond_dev->name);
>>> +		bond_remove_proc_entry(bond);
>>>   		unregister_netdevice(bond_dev);
>>>   	}
>>>   	return ret;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether
  2015-07-15 22:54           ` Nikolay Aleksandrov
  2015-07-15 22:58             ` Carol Soto
@ 2015-07-16  6:14             ` Veaceslav Falico
  1 sibling, 0 replies; 11+ messages in thread
From: Veaceslav Falico @ 2015-07-16  6:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Eric W. Biederman, Nikolay Aleksandrov, netdev, j.vosburgh,
	gospo, clsoto, davem

On Thu, Jul 16, 2015 at 12:54:06AM +0200, Nikolay Aleksandrov wrote:
...snip...
>My personal opinion would be to disable non-ethernet devices, but support was
>already added and has been there for a long time so we have to fix this for
>the older releases, I don't mind removing non-ethernet device support for net-next
>but I'm guessing there're people still using that like the case that started
>this thread.

Yep, last time non-ethernet devices broke there were quite a few reports
for ppp users, and infiniband is indeed used a lot too. No options here, we
must support it...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether
  2015-07-15 19:52       ` [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether Nikolay Aleksandrov
  2015-07-15 21:01         ` Carol Soto
  2015-07-15 22:39         ` Eric W. Biederman
@ 2015-07-20 19:56         ` David Miller
  2 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2015-07-20 19:56 UTC (permalink / raw)
  To: razor; +Cc: netdev, j.vosburgh, gospo, vfalico, clsoto, ebiederm, nikolay

From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Wed, 15 Jul 2015 21:52:51 +0200

> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> When the bonding is being unloaded and the netdevice notifier is
> unregistered it executes NETDEV_UNREGISTER for each device which should
> remove the bond's proc entry but if the device enslaved is not of
> ARPHRD_ETHER type and is in front of the bonding, it may execute
> bond_release_and_destroy() first which would release the last slave and
> destroy the bond device leaving the proc entry and thus we will get the
> following error (with dynamic debug on for bond_netdev_event to see the
> events order):
 ...
> Thus remove the proc entry manually if bond_release_and_destroy() is
> used. Because of the checks in bond_remove_proc_entry() it's not a
> problem for a bond device to change namespaces (the bug fixed by the
> Fixes commit) but since commit
> f9399814927ad ("bonding: Don't allow bond devices to change network
> namespaces.") that can't happen anyway.
> 
> Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
>                       the netdev events")

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-07-20 19:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-13 18:57 [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one clsoto
2015-07-13 21:05 ` Nikolay Aleksandrov
2015-07-13 21:10   ` Nikolay Aleksandrov
2015-07-15 17:49     ` Nikolay Aleksandrov
2015-07-15 19:52       ` [PATCH net] bonding: fix destruction of bond with devices different from arphrd_ether Nikolay Aleksandrov
2015-07-15 21:01         ` Carol Soto
2015-07-15 22:39         ` Eric W. Biederman
2015-07-15 22:54           ` Nikolay Aleksandrov
2015-07-15 22:58             ` Carol Soto
2015-07-16  6:14             ` Veaceslav Falico
2015-07-20 19:56         ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.