* bug in bonding driver
@ 2010-03-24 18:23 Chris Friesen
2010-03-24 19:13 ` Jay Vosburgh
0 siblings, 1 reply; 3+ messages in thread
From: Chris Friesen @ 2010-03-24 18:23 UTC (permalink / raw)
To: netdev, fubar, bonding-devel
Hi,
One of our guys pointed out what appears to be a bug in
bond_ab_arp_inspect(). There's a chunk of code that looks like this:
/*
* Give slaves 2*delta after being enslaved or made
* active. This avoids bouncing, as the last receive
* times need a full ARP monitor cycle to be updated.
*/
if (!time_after_eq(jiffies, slave->jiffies +
2 * delta_in_ticks))
continue;
The catch here is that slave->jiffies may not ever get updated after
being set initially, and on long-running systems jiffies will overflow.
That could cause this check to be true for a substantial amount of time
rather than for just a short period.
One way to fix it would be a boolean which tracks whether or not we've
gone past the time, and if we have then we don't bother actually
checking the time anymore.
Chris
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: bug in bonding driver
2010-03-24 18:23 bug in bonding driver Chris Friesen
@ 2010-03-24 19:13 ` Jay Vosburgh
2010-03-24 19:47 ` Chris Friesen
0 siblings, 1 reply; 3+ messages in thread
From: Jay Vosburgh @ 2010-03-24 19:13 UTC (permalink / raw)
To: Chris Friesen; +Cc: netdev, bonding-devel
Chris Friesen <cfriesen@nortel.com> wrote:
>One of our guys pointed out what appears to be a bug in
>bond_ab_arp_inspect(). There's a chunk of code that looks like this:
>
> /*
> * Give slaves 2*delta after being enslaved or made
> * active. This avoids bouncing, as the last receive
> * times need a full ARP monitor cycle to be updated.
> */
> if (!time_after_eq(jiffies, slave->jiffies +
> 2 * delta_in_ticks))
> continue;
>
>The catch here is that slave->jiffies may not ever get updated after
>being set initially, and on long-running systems jiffies will overflow.
> That could cause this check to be true for a substantial amount of time
>rather than for just a short period.
The definition for time_after in include/linux/jiffies.h claims
to handle timer wrapping, but even so, there presumably has to be a
cutoff at which "after" becomes "before" again.
Some quick fooling around suggests that if, for example,
slave->jiffies is near the top of the range (ULONG_MAX - a few hundred),
when jiffies gets up to around ULONG_MAX / 2 time_after_eq will flip
from "after" to "before."
I don't think this is a particularly farfetched example, since
jiffies is intentionally started near the top of the range, so
slave->jiffies is likely to be high in the range after bonding is
configured at boot.
>One way to fix it would be a boolean which tracks whether or not we've
>gone past the time, and if we have then we don't bother actually
>checking the time anymore.
It might be clearer to make the slave->jiffies some kind of
countdown instead, perhaps reusing the slave->delay used for
updelay/downdelay and eliminating slave->jiffies entirely.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: bug in bonding driver
2010-03-24 19:13 ` Jay Vosburgh
@ 2010-03-24 19:47 ` Chris Friesen
0 siblings, 0 replies; 3+ messages in thread
From: Chris Friesen @ 2010-03-24 19:47 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: netdev, bonding-devel
On 03/24/2010 01:13 PM, Jay Vosburgh wrote:
> Chris Friesen <cfriesen@nortel.com> wrote:
>> The catch here is that slave->jiffies may not ever get updated after
>> being set initially, and on long-running systems jiffies will overflow.
>> That could cause this check to be true for a substantial amount of time
>> rather than for just a short period.
> Some quick fooling around suggests that if, for example,
> slave->jiffies is near the top of the range (ULONG_MAX - a few hundred),
> when jiffies gets up to around ULONG_MAX / 2 time_after_eq will flip
> from "after" to "before."
>
> I don't think this is a particularly farfetched example, since
> jiffies is intentionally started near the top of the range, so
> slave->jiffies is likely to be high in the range after bonding is
> configured at boot.
Agreed. If I understand it right the result of time_after_eq is only
valid if the result of the subtraction is less than LONG_MAX.
>> One way to fix it would be a boolean which tracks whether or not we've
>> gone past the time, and if we have then we don't bother actually
>> checking the time anymore.
>
> It might be clearer to make the slave->jiffies some kind of
> countdown instead, perhaps reusing the slave->delay used for
> updelay/downdelay and eliminating slave->jiffies entirely.
Yes, that seems reasonable as well.
Chris
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-03-24 19:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-24 18:23 bug in bonding driver Chris Friesen
2010-03-24 19:13 ` Jay Vosburgh
2010-03-24 19:47 ` Chris Friesen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.