* bonding: link state question @ 2021-08-07 21:26 Jonathan Toppins 2021-08-07 22:42 ` Jay Vosburgh 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Toppins @ 2021-08-07 21:26 UTC (permalink / raw) To: netdev Cc: Veaceslav Falico, Jay Vosburgh, Andy Gospodarek, David S. Miller, Jakub Kicinski, LKML Is there any reason why bonding should have an operstate of up when none of its slaves are in an up state? In this particular scenario it seems like the bonding device should at least assert NO-CARRIER, thoughts? $ ip -o -d link show | grep "bond5" 2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc fq_codel master bond5 state DOWN mode DEFAULT group default qlen 1000\ link/ether 8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9000 \ bond_slave state ACTIVE mii_status UP link_failure_count 0 perm_hwaddr 8c:8c:aa:f8:62:16 queue_id 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 41: bond5: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000\ link/ether 8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 \ bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 $ cat /sys/class/net/enp0s31f6/operstate down $ cat /sys/class/net/bond5/operstate up This is an older kernel (4.18.0-305.7.1.el8_4.x86_64) but I do not see any changes upstream that would indicate a change in this operation. Thanks, -Jon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonding: link state question 2021-08-07 21:26 bonding: link state question Jonathan Toppins @ 2021-08-07 22:42 ` Jay Vosburgh 2021-08-08 0:09 ` Jonathan Toppins 0 siblings, 1 reply; 6+ messages in thread From: Jay Vosburgh @ 2021-08-07 22:42 UTC (permalink / raw) To: Jonathan Toppins Cc: netdev, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, LKML Jonathan Toppins <jtoppins@redhat.com> wrote: >Is there any reason why bonding should have an operstate of up when none >of its slaves are in an up state? In this particular scenario it seems >like the bonding device should at least assert NO-CARRIER, thoughts? > >$ ip -o -d link show | grep "bond5" >2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc >fq_codel master bond5 state DOWN mode DEFAULT group default qlen 1000\ >link/ether 8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 >maxmtu 9000 \ bond_slave state ACTIVE mii_status UP link_failure_count >0 perm_hwaddr 8c:8c:aa:f8:62:16 queue_id 0 numtxqueues 1 numrxqueues 1 >gso_max_size 65536 gso_max_segs 65535 >41: bond5: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue >state UP mode DEFAULT group default qlen 1000\ link/ether >8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu >65535 \ bond mode balance-xor miimon 0 updelay 0 downdelay 0 >peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none I'm going to speculate that your problem is that miimon and arp_interval are both 0, and the bond then doesn't have any active mechanism to monitor the link state of its interfaces. There might be a warning in dmesg to this effect. Do you see what you'd consider to be correct behavior if miimon is set to 100? -J >arp_all_targets any primary_reselect always fail_over_mac none >xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 >min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select >stable tlb_dynamic_lb 1 numtxqueues 16 numrxqueues 16 gso_max_size 65536 >gso_max_segs 65535 > >$ cat /sys/class/net/enp0s31f6/operstate >down > >$ cat /sys/class/net/bond5/operstate >up > >This is an older kernel (4.18.0-305.7.1.el8_4.x86_64) but I do not see any >changes upstream that would indicate a change in this operation. > >Thanks, >-Jon --- -Jay Vosburgh, jay.vosburgh@canonical.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonding: link state question 2021-08-07 22:42 ` Jay Vosburgh @ 2021-08-08 0:09 ` Jonathan Toppins 2021-08-08 4:49 ` Willy Tarreau 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Toppins @ 2021-08-08 0:09 UTC (permalink / raw) To: Jay Vosburgh Cc: netdev, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, LKML On 8/7/21 6:42 PM, Jay Vosburgh wrote: > Jonathan Toppins <jtoppins@redhat.com> wrote: > >> Is there any reason why bonding should have an operstate of up when none >> of its slaves are in an up state? In this particular scenario it seems >> like the bonding device should at least assert NO-CARRIER, thoughts? >> >> $ ip -o -d link show | grep "bond5" >> 2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc >> fq_codel master bond5 state DOWN mode DEFAULT group default qlen 1000\ >> link/ether 8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 >> maxmtu 9000 \ bond_slave state ACTIVE mii_status UP link_failure_count >> 0 perm_hwaddr 8c:8c:aa:f8:62:16 queue_id 0 numtxqueues 1 numrxqueues 1 >> gso_max_size 65536 gso_max_segs 65535 >> 41: bond5: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue >> state UP mode DEFAULT group default qlen 1000\ link/ether >> 8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu >> 65535 \ bond mode balance-xor miimon 0 updelay 0 downdelay 0 >> peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none > > I'm going to speculate that your problem is that miimon and > arp_interval are both 0, and the bond then doesn't have any active > mechanism to monitor the link state of its interfaces. There might be a > warning in dmesg to this effect. > > Do you see what you'd consider to be correct behavior if miimon > is set to 100? > setting miimon = 100 does appear to fix it. It is interesting that there is no link monitor on by default. For example when I enslave enp0s31f6 to a new bond with miimon == 0, enp0s31f6 starts admin down and will never de-assert NO-CARRIER the bond always results in an operstate of up. It seems like miimon = 100 should be the default since some modes cannot use arpmon. Thank you for the discussion, see below for the steps taken. $ sudo ip link set dev enp0s31f6 nomaster $ sudo ip link add dev bond6 type bond mode balance-xor $ sudo ip -o -d link set dev bond6 up $ ip -o -d link show dev bond6 62: bond6: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000\ link/ether 3e:12:01:8a:ed:b1 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 \ bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 $ ip -o -d link show dev enp0s31f6 2: enp0s31f6: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000\ link/ether 8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 9000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 $ sudo ip -o -d link set dev enp0s31f6 master bond6 $ ip -o -d link show dev bond6 62: bond6: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000\ link/ether 8c:8c:aa:f8:62:16 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 \ bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 $ sudo ip link set dev enp0s31f6 nomaster $ ip -o -d link show dev bond6 62: bond6: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000\ link/ether ae:b8:6e:b3:ca:3f brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 \ bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonding: link state question 2021-08-08 0:09 ` Jonathan Toppins @ 2021-08-08 4:49 ` Willy Tarreau 2021-08-09 1:31 ` Jonathan Toppins 0 siblings, 1 reply; 6+ messages in thread From: Willy Tarreau @ 2021-08-08 4:49 UTC (permalink / raw) To: Jonathan Toppins Cc: Jay Vosburgh, netdev, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, LKML On Sat, Aug 07, 2021 at 08:09:31PM -0400, Jonathan Toppins wrote: > setting miimon = 100 does appear to fix it. > > It is interesting that there is no link monitor on by default. For example > when I enslave enp0s31f6 to a new bond with miimon == 0, enp0s31f6 starts > admin down and will never de-assert NO-CARRIER the bond always results in an > operstate of up. It seems like miimon = 100 should be the default since some > modes cannot use arpmon. Historically when miimon was implemented, not all NICs nor drivers had support for link state checking at all! In addition, there are certain deployments where you could rely on many devices by having a bond device on top of a vlan or similar device, and where monitoring could cost a lot of resources and you'd prefer to rely on external monitoring to set all of them up or down at once. I do think however that there remains a case with a missing state transition in the driver: on my laptop I have a bond interface attached to eth0, and I noticed that if I suspend the laptop with the link up, when I wake it up with no interface connected, the bond will not turn down, regardless of miimon. I have not looked closer yet, but I suspect that we're relying too much on a state change between previous and current and that one historically impossible transition does not exist there and/or used to work because it was handled as part of another change. I'll eventually have a look. Willy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonding: link state question 2021-08-08 4:49 ` Willy Tarreau @ 2021-08-09 1:31 ` Jonathan Toppins 2021-08-09 10:10 ` Willy Tarreau 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Toppins @ 2021-08-09 1:31 UTC (permalink / raw) To: Willy Tarreau Cc: Jay Vosburgh, netdev, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, LKML On 8/8/21 12:49 AM, Willy Tarreau wrote: > On Sat, Aug 07, 2021 at 08:09:31PM -0400, Jonathan Toppins wrote: >> setting miimon = 100 does appear to fix it. >> >> It is interesting that there is no link monitor on by default. For example >> when I enslave enp0s31f6 to a new bond with miimon == 0, enp0s31f6 starts >> admin down and will never de-assert NO-CARRIER the bond always results in an >> operstate of up. It seems like miimon = 100 should be the default since some >> modes cannot use arpmon. > > Historically when miimon was implemented, not all NICs nor drivers had > support for link state checking at all! In addition, there are certain > deployments where you could rely on many devices by having a bond device > on top of a vlan or similar device, and where monitoring could cost a > lot of resources and you'd prefer to rely on external monitoring to set > all of them up or down at once. > > I do think however that there remains a case with a missing state > transition in the driver: on my laptop I have a bond interface attached > to eth0, and I noticed that if I suspend the laptop with the link up, > when I wake it up with no interface connected, the bond will not turn > down, regardless of miimon. I have not looked closer yet, but I > suspect that we're relying too much on a state change between previous > and current and that one historically impossible transition does not > exist there and/or used to work because it was handled as part of > another change. I'll eventually have a look. > > Willy > I am likely very wrong but the lack of a recalculation of the bond carrier state after a lower notifies of an up/down event seemed incorrect. Maybe a place to start? diff --git i/drivers/net/bonding/bond_main.c w/drivers/net/bonding/bond_main.c index 9018fcc59f78..2b2c4b937142 100644 --- i/drivers/net/bonding/bond_main.c +++ w/drivers/net/bonding/bond_main.c @@ -3308,6 +3308,7 @@ static int bond_slave_netdev_event(unsigned long event, */ if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); + bond_set_carrier(bond); break; case NETDEV_CHANGEMTU: /* TODO: Should slaves be allowed to ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: bonding: link state question 2021-08-09 1:31 ` Jonathan Toppins @ 2021-08-09 10:10 ` Willy Tarreau 0 siblings, 0 replies; 6+ messages in thread From: Willy Tarreau @ 2021-08-09 10:10 UTC (permalink / raw) To: Jonathan Toppins Cc: Jay Vosburgh, netdev, Veaceslav Falico, Andy Gospodarek, David S. Miller, Jakub Kicinski, LKML Hi Jonathan, On Sun, Aug 08, 2021 at 09:31:39PM -0400, Jonathan Toppins wrote: > I am likely very wrong but the lack of a recalculation of the bond carrier > state after a lower notifies of an up/down event seemed incorrect. Maybe a > place to start? Thanks for the test, it could have been a good candidate but it does not work :-) That's what I have after the following sequence: - link is up - suspend-to-ram - unplug the cable - resume $ ip -br li eth0 DOWN e8:6a:64:5d:19:ed <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> eth0.2@eth0 UP e8:6a:64:5d:19:ed <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> bond0 UP e8:6a:64:5d:19:ed <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> My bond interface uses eth0 and eth0.2 in active-backup scenario allowing me to instantly switch between tagged/untagged network depending on the port I'm connecting to. I just figured the problem. It's not the bonding driver which is causing this issue, the issue is with the VLAN interface which incorrectly shows up while it ought not to, as can be seen above, and the bond naturally selected it: Primary Slave: eth0 (primary_reselect always) Currently Active Slave: eth0.2 MII Status: up MII Polling Interval (ms): 200 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0 So the bond driver works well, I'll have to dig into the 802.1q code and/or see how the no-carrier state is propagated upstream. So you were not very wrong at all and put me on the right track :-) Cheers, Willy ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-09 10:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-07 21:26 bonding: link state question Jonathan Toppins 2021-08-07 22:42 ` Jay Vosburgh 2021-08-08 0:09 ` Jonathan Toppins 2021-08-08 4:49 ` Willy Tarreau 2021-08-09 1:31 ` Jonathan Toppins 2021-08-09 10:10 ` Willy Tarreau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).