netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] bonding: couple of bug fixes
@ 2013-06-11 22:07 nikolay
  2013-06-11 22:07 ` [PATCH v2 1/2] bonding: reset master mac on first enslave failure nikolay
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: nikolay @ 2013-06-11 22:07 UTC (permalink / raw)
  To: netdev; +Cc: andy, davem, fubar

From: Nikolay Aleksandrov <nikolay@redhat.com>

Hello,
Patch 01 resets the master's mac if the first enslave fails and the slave's
mac was set to the master's prior.
Patch 02 fixes a type bug of igmp_retrans (which is the counter of igmp_resend
param and was silently not working if the value was >127) so it can go up to 255
(as per documentation). It also fixes two tricky race conditions which were
hidden because of the previous bug.

V2: Updated patch 02 commit message as per Jay's comment

Best regards,
 Nikolay Aleksandrov

Nikolay Aleksandrov (2):
  bonding: reset master mac on first enslave failure
  bonding: fix igmp_retrans type and two related races

 drivers/net/bonding/bond_main.c | 19 +++++++++++++++----
 drivers/net/bonding/bonding.h   |  2 +-
 2 files changed, 16 insertions(+), 5 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] bonding: reset master mac on first enslave failure
  2013-06-11 22:07 [PATCH v2 0/2] bonding: couple of bug fixes nikolay
@ 2013-06-11 22:07 ` nikolay
  2013-06-11 22:07 ` [PATCH v2 2/2] bonding: fix igmp_retrans type and two related races nikolay
  2013-06-11 22:14 ` [PATCH v2 0/2] bonding: couple of bug fixes Jay Vosburgh
  2 siblings, 0 replies; 5+ messages in thread
From: nikolay @ 2013-06-11 22:07 UTC (permalink / raw)
  To: netdev; +Cc: andy, davem, fubar

From: Nikolay Aleksandrov <nikolay@redhat.com>

If the bond device is supposed to get the first slave's MAC address and
the first enslavement fails then we need to reset the master's MAC
otherwise it will stay the same as the failed slave device. We do it
after err_undo_flags since that is the first place where the MAC can be
changed and we check if it should've been the first slave and if the
bond's MAC was set to it because that err place is used by multiple
locations prior to changing the master's MAC address.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 29b846c..473633a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1957,6 +1957,10 @@ err_free:
 
 err_undo_flags:
 	bond_compute_features(bond);
+	/* Enslave of first slave has failed and we need to fix master's mac */
+	if (bond->slave_cnt == 0 &&
+	    ether_addr_equal(bond_dev->dev_addr, slave_dev->dev_addr))
+		eth_hw_addr_random(bond_dev);
 
 	return res;
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] bonding: fix igmp_retrans type and two related races
  2013-06-11 22:07 [PATCH v2 0/2] bonding: couple of bug fixes nikolay
  2013-06-11 22:07 ` [PATCH v2 1/2] bonding: reset master mac on first enslave failure nikolay
@ 2013-06-11 22:07 ` nikolay
  2013-06-11 22:14 ` [PATCH v2 0/2] bonding: couple of bug fixes Jay Vosburgh
  2 siblings, 0 replies; 5+ messages in thread
From: nikolay @ 2013-06-11 22:07 UTC (permalink / raw)
  To: netdev; +Cc: andy, davem, fubar

From: Nikolay Aleksandrov <nikolay@redhat.com>

First the type of igmp_retrans (which is the actual counter of
igmp_resend parameter) is changed to u8 to be able to store values up
to 255 (as per documentation). There are two races that were hidden
there and which are easy to trigger after the previous fix, the first is
between bond_resend_igmp_join_requests and bond_change_active_slave
where igmp_retrans is set and can be altered by the periodic. The second
race condition is between multiple running instances of the periodic
(upon execution it can be scheduled again for immediate execution which
can cause the counter to go < 0 which in the unsigned case leads to
unnecessary igmp retransmissions).
Since in bond_change_active_slave bond->lock is held for reading and
curr_slave_lock for writing, we use curr_slave_lock for mutual
exclusion. We can't drop them as there're cases where RTNL is not held
when bond_change_active_slave is called. RCU is unlocked in
bond_resend_igmp_join_requests before getting curr_slave_lock since we
don't need it there and it's pointless to delay.
The decrement is moved inside the "if" block because if we decrement
unconditionally there's still a possibility for a race condition although
it is much more difficult to hit (many changes have to happen in
a very short period in order to trigger) which in the case of 3 parallel
running instances of this function and igmp_retrans == 1
(with check bond->igmp_retrans-- > 1) is:
f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
f3 does the unnecessary retransmissions.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v2: updated commit message

 drivers/net/bonding/bond_main.c | 15 +++++++++++----
 drivers/net/bonding/bonding.h   |  2 +-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 473633a..02d9ae7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -764,8 +764,8 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 	struct net_device *bond_dev, *vlan_dev, *upper_dev;
 	struct vlan_entry *vlan;
 
-	rcu_read_lock();
 	read_lock(&bond->lock);
+	rcu_read_lock();
 
 	bond_dev = bond->dev;
 
@@ -787,12 +787,19 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 		if (vlan_dev)
 			__bond_resend_igmp_join_requests(vlan_dev);
 	}
+	rcu_read_unlock();
 
-	if (--bond->igmp_retrans > 0)
+	/* We use curr_slave_lock to protect against concurrent access to
+	 * igmp_retrans from multiple running instances of this function and
+	 * bond_change_active_slave
+	 */
+	write_lock_bh(&bond->curr_slave_lock);
+	if (bond->igmp_retrans > 1) {
+		bond->igmp_retrans--;
 		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
-
+	}
+	write_unlock_bh(&bond->curr_slave_lock);
 	read_unlock(&bond->lock);
-	rcu_read_unlock();
 }
 
 static void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 2baec24..f989e15 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -225,7 +225,7 @@ struct bonding {
 	rwlock_t curr_slave_lock;
 	u8	 send_peer_notif;
 	s8	 setup_by_slave;
-	s8       igmp_retrans;
+	u8       igmp_retrans;
 #ifdef CONFIG_PROC_FS
 	struct   proc_dir_entry *proc_entry;
 	char     proc_file_name[IFNAMSIZ];
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/2] bonding: couple of bug fixes
  2013-06-11 22:07 [PATCH v2 0/2] bonding: couple of bug fixes nikolay
  2013-06-11 22:07 ` [PATCH v2 1/2] bonding: reset master mac on first enslave failure nikolay
  2013-06-11 22:07 ` [PATCH v2 2/2] bonding: fix igmp_retrans type and two related races nikolay
@ 2013-06-11 22:14 ` Jay Vosburgh
  2013-06-13  9:33   ` David Miller
  2 siblings, 1 reply; 5+ messages in thread
From: Jay Vosburgh @ 2013-06-11 22:14 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, andy, davem

nikolay@redhat.com wrote:

>From: Nikolay Aleksandrov <nikolay@redhat.com>
>
>Hello,
>Patch 01 resets the master's mac if the first enslave fails and the slave's
>mac was set to the master's prior.
>Patch 02 fixes a type bug of igmp_retrans (which is the counter of igmp_resend
>param and was silently not working if the value was >127) so it can go up to 255
>(as per documentation). It also fixes two tricky race conditions which were
>hidden because of the previous bug.
>
>V2: Updated patch 02 commit message as per Jay's comment

	For both patches:

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/2] bonding: couple of bug fixes
  2013-06-11 22:14 ` [PATCH v2 0/2] bonding: couple of bug fixes Jay Vosburgh
@ 2013-06-13  9:33   ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2013-06-13  9:33 UTC (permalink / raw)
  To: fubar; +Cc: nikolay, netdev, andy

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Tue, 11 Jun 2013 15:14:04 -0700

> nikolay@redhat.com wrote:
> 
>>From: Nikolay Aleksandrov <nikolay@redhat.com>
>>
>>Hello,
>>Patch 01 resets the master's mac if the first enslave fails and the slave's
>>mac was set to the master's prior.
>>Patch 02 fixes a type bug of igmp_retrans (which is the counter of igmp_resend
>>param and was silently not working if the value was >127) so it can go up to 255
>>(as per documentation). It also fixes two tricky race conditions which were
>>hidden because of the previous bug.
>>
>>V2: Updated patch 02 commit message as per Jay's comment
> 
> 	For both patches:
> 
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

Series applied, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-06-13  9:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-11 22:07 [PATCH v2 0/2] bonding: couple of bug fixes nikolay
2013-06-11 22:07 ` [PATCH v2 1/2] bonding: reset master mac on first enslave failure nikolay
2013-06-11 22:07 ` [PATCH v2 2/2] bonding: fix igmp_retrans type and two related races nikolay
2013-06-11 22:14 ` [PATCH v2 0/2] bonding: couple of bug fixes Jay Vosburgh
2013-06-13  9:33   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).