All of lore.kernel.org
 help / color / mirror / Atom feed
* bug in bonding driver
@ 2010-03-24 18:23 Chris Friesen
  2010-03-24 19:13 ` Jay Vosburgh
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Friesen @ 2010-03-24 18:23 UTC (permalink / raw)
  To: netdev, fubar, bonding-devel


Hi,

One of our guys pointed out what appears to be a bug in
bond_ab_arp_inspect().  There's a chunk of code that looks like this:

	/*
	 * Give slaves 2*delta after being enslaved or made
	 * active.  This avoids bouncing, as the last receive
	 * times need a full ARP monitor cycle to be updated.
	 */
	if (!time_after_eq(jiffies, slave->jiffies +
			   2 * delta_in_ticks))
		continue;

The catch here is that slave->jiffies may not ever get updated after
being set initially, and on long-running systems jiffies will overflow.
 That could cause this check to be true for a substantial amount of time
rather than for just a short period.

One way to fix it would be a boolean which tracks whether or not we've
gone past the time, and if we have then we don't bother actually
checking the time anymore.

Chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: bug in bonding driver
  2010-03-24 18:23 bug in bonding driver Chris Friesen
@ 2010-03-24 19:13 ` Jay Vosburgh
  2010-03-24 19:47   ` Chris Friesen
  0 siblings, 1 reply; 3+ messages in thread
From: Jay Vosburgh @ 2010-03-24 19:13 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev, bonding-devel

Chris Friesen <cfriesen@nortel.com> wrote:
>One of our guys pointed out what appears to be a bug in
>bond_ab_arp_inspect().  There's a chunk of code that looks like this:
>
>	/*
>	 * Give slaves 2*delta after being enslaved or made
>	 * active.  This avoids bouncing, as the last receive
>	 * times need a full ARP monitor cycle to be updated.
>	 */
>	if (!time_after_eq(jiffies, slave->jiffies +
>			   2 * delta_in_ticks))
>		continue;
>
>The catch here is that slave->jiffies may not ever get updated after
>being set initially, and on long-running systems jiffies will overflow.
> That could cause this check to be true for a substantial amount of time
>rather than for just a short period.

	The definition for time_after in include/linux/jiffies.h claims
to handle timer wrapping, but even so, there presumably has to be a
cutoff at which "after" becomes "before" again.  

	Some quick fooling around suggests that if, for example,
slave->jiffies is near the top of the range (ULONG_MAX - a few hundred),
when jiffies gets up to around ULONG_MAX / 2 time_after_eq will flip
from "after" to "before."

	I don't think this is a particularly farfetched example, since
jiffies is intentionally started near the top of the range, so
slave->jiffies is likely to be high in the range after bonding is
configured at boot.

>One way to fix it would be a boolean which tracks whether or not we've
>gone past the time, and if we have then we don't bother actually
>checking the time anymore.

	It might be clearer to make the slave->jiffies some kind of
countdown instead, perhaps reusing the slave->delay used for
updelay/downdelay and eliminating slave->jiffies entirely.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: bug in bonding driver
  2010-03-24 19:13 ` Jay Vosburgh
@ 2010-03-24 19:47   ` Chris Friesen
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Friesen @ 2010-03-24 19:47 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, bonding-devel

On 03/24/2010 01:13 PM, Jay Vosburgh wrote:
> Chris Friesen <cfriesen@nortel.com> wrote:

>> The catch here is that slave->jiffies may not ever get updated after
>> being set initially, and on long-running systems jiffies will overflow.
>> That could cause this check to be true for a substantial amount of time
>> rather than for just a short period.

> 	Some quick fooling around suggests that if, for example,
> slave->jiffies is near the top of the range (ULONG_MAX - a few hundred),
> when jiffies gets up to around ULONG_MAX / 2 time_after_eq will flip
> from "after" to "before."
> 
> 	I don't think this is a particularly farfetched example, since
> jiffies is intentionally started near the top of the range, so
> slave->jiffies is likely to be high in the range after bonding is
> configured at boot.

Agreed.  If I understand it right the result of time_after_eq is only
valid if the result of the subtraction is less than LONG_MAX.

>> One way to fix it would be a boolean which tracks whether or not we've
>> gone past the time, and if we have then we don't bother actually
>> checking the time anymore.
> 
> 	It might be clearer to make the slave->jiffies some kind of
> countdown instead, perhaps reusing the slave->delay used for
> updelay/downdelay and eliminating slave->jiffies entirely.

Yes, that seems reasonable as well.

Chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-03-24 19:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-24 18:23 bug in bonding driver Chris Friesen
2010-03-24 19:13 ` Jay Vosburgh
2010-03-24 19:47   ` Chris Friesen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.