netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] bonding: fix arp monitoring with vlan slaves
@ 2013-08-02 16:41 Nikolay Aleksandrov
  2013-08-02 22:32 ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Aleksandrov @ 2013-08-02 16:41 UTC (permalink / raw)
  To: netdev; +Cc: andy, davem, fubar

From: Nikolay Aleksandrov <Nikolay Aleksandrov nikolay@redhat.com>

When arp monitoring is enabled the bonding relies on slaves'
dev_trans_start() value to check if the slave link is up or not, but for
8021q devices that value is either stale or 0, and can't be used. So use
the 8021q's underlying device value.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_main.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 07f257d4..5f2bad3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -665,6 +665,17 @@ static int bond_check_dev_link(struct bonding *bond,
 	return reporting ? -1 : BMSR_LSTATUS;
 }
 
+/* For 8021q devices dev_trans_start() returns 0 or a stale value, so use the
+ * 8021q's underlying device value
+ */
+static unsigned long bond_dev_trans_start(struct net_device *dev)
+{
+	while (dev->priv_flags & IFF_802_1Q_VLAN)
+		dev = vlan_dev_real_dev(dev);
+
+	return dev_trans_start(dev);
+}
+
 /*----------------------------- Multicast list ------------------------------*/
 
 /*
@@ -2750,7 +2761,7 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
 	 *       so it can wait
 	 */
 	bond_for_each_slave(bond, slave, i) {
-		unsigned long trans_start = dev_trans_start(slave->dev);
+		unsigned long trans_start = bond_dev_trans_start(slave->dev);
 
 		if (slave->link != BOND_LINK_UP) {
 			if (time_in_range(jiffies,
@@ -2912,7 +2923,7 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
 		 * - (more than 2*delta since receive AND
 		 *    the bond has an IP address)
 		 */
-		trans_start = dev_trans_start(slave->dev);
+		trans_start = bond_dev_trans_start(slave->dev);
 		if (bond_is_active_slave(slave) &&
 		    (!time_in_range(jiffies,
 			trans_start - delta_in_ticks,
@@ -2947,7 +2958,7 @@ static void bond_ab_arp_commit(struct bonding *bond, int delta_in_ticks)
 			continue;
 
 		case BOND_LINK_UP:
-			trans_start = dev_trans_start(slave->dev);
+			trans_start = bond_dev_trans_start(slave->dev);
 			if ((!bond->curr_active_slave &&
 			     time_in_range(jiffies,
 					   trans_start - delta_in_ticks,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bonding: fix arp monitoring with vlan slaves
  2013-08-02 16:41 [PATCH net] bonding: fix arp monitoring with vlan slaves Nikolay Aleksandrov
@ 2013-08-02 22:32 ` David Miller
  2013-08-02 22:40   ` Nikolay Aleksandrov
  2013-08-02 23:17   ` Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: David Miller @ 2013-08-02 22:32 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, andy, fubar

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Fri,  2 Aug 2013 18:41:05 +0200

> From: Nikolay Aleksandrov <Nikolay Aleksandrov nikolay@redhat.com>
> 
> When arp monitoring is enabled the bonding relies on slaves'
> dev_trans_start() value to check if the slave link is up or not, but for
> 8021q devices that value is either stale or 0, and can't be used. So use
> the 8021q's underlying device value.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Handling this specially in bonding isn't really ideal.

Please either hide this detail in dev_trans_start(), or (preferrably)
have vlan_dev_hard_start_xmit() set the trans_start timestamp
properly thus making this just work for everything.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bonding: fix arp monitoring with vlan slaves
  2013-08-02 22:32 ` David Miller
@ 2013-08-02 22:40   ` Nikolay Aleksandrov
  2013-08-02 23:17   ` Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Nikolay Aleksandrov @ 2013-08-02 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, andy, fubar

On 08/03/2013 12:32 AM, David Miller wrote:
> From: Nikolay Aleksandrov <nikolay@redhat.com>
> Date: Fri,  2 Aug 2013 18:41:05 +0200
> 
>> From: Nikolay Aleksandrov <Nikolay Aleksandrov nikolay@redhat.com>
>>
>> When arp monitoring is enabled the bonding relies on slaves'
>> dev_trans_start() value to check if the slave link is up or not, but for
>> 8021q devices that value is either stale or 0, and can't be used. So use
>> the 8021q's underlying device value.
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> 
> Handling this specially in bonding isn't really ideal.
> 
Indeed, it's not really a bonding problem.

> Please either hide this detail in dev_trans_start(), or (preferrably)
> have vlan_dev_hard_start_xmit() set the trans_start timestamp
> properly thus making this just work for everything.
> 
> Thanks.
> 
Yep, I prefer option 2 as well. I will submit a patch tomorrow, thank you
for the suggestions.

Nik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bonding: fix arp monitoring with vlan slaves
  2013-08-02 22:32 ` David Miller
  2013-08-02 22:40   ` Nikolay Aleksandrov
@ 2013-08-02 23:17   ` Eric Dumazet
  2013-08-02 23:29     ` Nikolay Aleksandrov
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-08-02 23:17 UTC (permalink / raw)
  To: David Miller; +Cc: nikolay, netdev, andy, fubar

On Fri, 2013-08-02 at 15:32 -0700, David Miller wrote:

> Handling this specially in bonding isn't really ideal.
> 
> Please either hide this detail in dev_trans_start(), or (preferrably)
> have vlan_dev_hard_start_xmit() set the trans_start timestamp
> properly thus making this just work for everything.

vlan is LLTX, so setting the timestamp would incur false sharing on
multiqueue.

But it's true we need a helper, because many callers do

if (dev->priv_flags & IFF_802_1Q_VLAN)
       dev = vlan_dev_real_dev(dev);

or the slighly better

if (is_vlan_dev(dev))
   dev = vlan_dev_real_dev(dev);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bonding: fix arp monitoring with vlan slaves
  2013-08-02 23:17   ` Eric Dumazet
@ 2013-08-02 23:29     ` Nikolay Aleksandrov
  2013-08-02 23:59       ` Jay Vosburgh
  2013-08-03  0:03       ` Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: Nikolay Aleksandrov @ 2013-08-02 23:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, andy, fubar

On 08/03/2013 01:17 AM, Eric Dumazet wrote:
> On Fri, 2013-08-02 at 15:32 -0700, David Miller wrote:
> 
>> Handling this specially in bonding isn't really ideal.
>>
>> Please either hide this detail in dev_trans_start(), or (preferrably)
>> have vlan_dev_hard_start_xmit() set the trans_start timestamp
>> properly thus making this just work for everything.
> 
> vlan is LLTX, so setting the timestamp would incur false sharing on
> multiqueue.
> 
Yeah, this statement actually explains a lot for me, thanks :-)
I knew it was because of the LLTX, but I was wondering about the possible
reasons for the xmit_lock_owner check.
So basically the arp monitoring (or any dev_trans_start code) won't work
with LLTX devices because they don't get their trans_start updated (not the
txq trans_start nor the dev->trans_start), is this correct ?
But what if the txqs get bound to a particular CPU, then the txq
trans_start is okay to be updated I suppose.

Nik
> But it's true we need a helper, because many callers do
> 
> if (dev->priv_flags & IFF_802_1Q_VLAN)
>        dev = vlan_dev_real_dev(dev);
> 
> or the slighly better
> 
> if (is_vlan_dev(dev))
>    dev = vlan_dev_real_dev(dev);
> 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bonding: fix arp monitoring with vlan slaves
  2013-08-02 23:29     ` Nikolay Aleksandrov
@ 2013-08-02 23:59       ` Jay Vosburgh
  2013-08-03  0:03       ` Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Jay Vosburgh @ 2013-08-02 23:59 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Eric Dumazet, David Miller, netdev, andy

Nikolay Aleksandrov <nikolay@redhat.com> wrote:

>On 08/03/2013 01:17 AM, Eric Dumazet wrote:
>> On Fri, 2013-08-02 at 15:32 -0700, David Miller wrote:
>> 
>>> Handling this specially in bonding isn't really ideal.
>>>
>>> Please either hide this detail in dev_trans_start(), or (preferrably)
>>> have vlan_dev_hard_start_xmit() set the trans_start timestamp
>>> properly thus making this just work for everything.
>> 
>> vlan is LLTX, so setting the timestamp would incur false sharing on
>> multiqueue.
>> 
>Yeah, this statement actually explains a lot for me, thanks :-)
>I knew it was because of the LLTX, but I was wondering about the possible
>reasons for the xmit_lock_owner check.
>So basically the arp monitoring (or any dev_trans_start code) won't work
>with LLTX devices because they don't get their trans_start updated (not the
>txq trans_start nor the dev->trans_start), is this correct ?

	I think what Eric means is that it'll be a performance problem,
because the cache line with the trans_start field will be thrashed
between CPUs as updates compete with each other or inspection from
dev_trans_start.  I suspect it will work from a functional point of view
(unless I'm missing something).

	As a practical matter, it looks like bonding is at present the
only caller of dev_trans_start that passes either (a) a VLAN device, or
(b) a device other than itself (most drivers seem to use it for transmit
timeout detection on their own device).

	Having software devices set trans_start seems unnecessary
(bonding doesn't set it, either), since they'll generally have a
hardware device underneath with a real trans_start value.  It might be
hard to hunt down for tun, though.

>But what if the txqs get bound to a particular CPU, then the txq
>trans_start is okay to be updated I suppose.
>
>Nik
>> But it's true we need a helper, because many callers do
>> 
>> if (dev->priv_flags & IFF_802_1Q_VLAN)
>>        dev = vlan_dev_real_dev(dev);
>> 
>> or the slighly better
>> 
>> if (is_vlan_dev(dev))
>>    dev = vlan_dev_real_dev(dev);

	For just the bonding dev_trans_start() for a VLAN case, one of
the above in dev_trans_start() ought to be sufficient.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bonding: fix arp monitoring with vlan slaves
  2013-08-02 23:29     ` Nikolay Aleksandrov
  2013-08-02 23:59       ` Jay Vosburgh
@ 2013-08-03  0:03       ` Eric Dumazet
  2013-08-03  0:21         ` Nikolay Aleksandrov
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-08-03  0:03 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: David Miller, netdev, andy, fubar

On Sat, 2013-08-03 at 01:29 +0200, Nikolay Aleksandrov wrote:

> I knew it was because of the LLTX, but I was wondering about the possible
> reasons for the xmit_lock_owner check.
> So basically the arp monitoring (or any dev_trans_start code) won't work
> with LLTX devices because they don't get their trans_start updated (not the
> txq trans_start nor the dev->trans_start), is this correct ?

Nope : LLTX devices are supposed to update their own dev->trans_start
I added for these case a special comment as in :

drivers/net/ethernet/atheros/atl1e/atl1e_main.c:1883:   netdev->trans_start = jiffies; /* NETIF_F_LLTX driver :( */

trans_start is a txq property, and vlan devices have a single txq.

Updating the vlandev->trans_start or the vlantxq->trans_start would be a
performance killer on MQ ethernet.

And frankly, as this trans_start is really seldom queried, it makes no
sense to set it on fast path on the vlan device, if its properly done on
the real device anyway.

> But what if the txqs get bound to a particular CPU, then the txq
> trans_start is okay to be updated I suppose.

Not sure what you are saying. vlan xmit can be called from any cpus.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] bonding: fix arp monitoring with vlan slaves
  2013-08-03  0:03       ` Eric Dumazet
@ 2013-08-03  0:21         ` Nikolay Aleksandrov
  0 siblings, 0 replies; 8+ messages in thread
From: Nikolay Aleksandrov @ 2013-08-03  0:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, andy, fubar

On 08/03/2013 02:03 AM, Eric Dumazet wrote:
> On Sat, 2013-08-03 at 01:29 +0200, Nikolay Aleksandrov wrote:
> 
>> I knew it was because of the LLTX, but I was wondering about the possible
>> reasons for the xmit_lock_owner check.
>> So basically the arp monitoring (or any dev_trans_start code) won't work
>> with LLTX devices because they don't get their trans_start updated (not the
>> txq trans_start nor the dev->trans_start), is this correct ?
> 
> Nope : LLTX devices are supposed to update their own dev->trans_start
> I added for these case a special comment as in :
> 
> drivers/net/ethernet/atheros/atl1e/atl1e_main.c:1883:   netdev->trans_start = jiffies; /* NETIF_F_LLTX driver :( */
> 
Ah, didn't know about that, so there could be an LLTX device with proper
trans_start if it updates it itself. Fair enough.

> trans_start is a txq property, and vlan devices have a single txq.
> 
> Updating the vlandev->trans_start or the vlantxq->trans_start would be a
> performance killer on MQ ethernet.
> 
Yeah, that part I got, that's why I said it explains a lot for me earlier :-)

> And frankly, as this trans_start is really seldom queried, it makes no
> sense to set it on fast path on the vlan device, if its properly done on
> the real device anyway.
> 
>> But what if the txqs get bound to a particular CPU, then the txq
>> trans_start is okay to be updated I suppose.
> 
> Not sure what you are saying. vlan xmit can be called from any cpus.
> 
Never mind, I didn't notice that the 8021q dev has a single txq as you said.

Thanks for taking the time to explain all this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-08-03  0:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02 16:41 [PATCH net] bonding: fix arp monitoring with vlan slaves Nikolay Aleksandrov
2013-08-02 22:32 ` David Miller
2013-08-02 22:40   ` Nikolay Aleksandrov
2013-08-02 23:17   ` Eric Dumazet
2013-08-02 23:29     ` Nikolay Aleksandrov
2013-08-02 23:59       ` Jay Vosburgh
2013-08-03  0:03       ` Eric Dumazet
2013-08-03  0:21         ` Nikolay Aleksandrov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).