* Re: 2.4.28(+?): Strange ARP problem
2005-01-13 14:50 2.4.28(+?): Strange ARP problem Scott Doty
@ 2005-01-13 12:09 ` Marcelo Tosatti
2005-01-13 16:18 ` Scott Doty
2005-01-13 21:01 ` Herbert Xu
0 siblings, 2 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2005-01-13 12:09 UTC (permalink / raw)
To: Scott Doty; +Cc: linux-kernel, davem, Herbert Xu
On Thu, Jan 13, 2005 at 06:50:29AM -0800, Scott Doty wrote:
> Hi,
>
> We use Linux extensively here at Sonic.net. Our web servers have two
> NIC's -- a NIC with a public IP address, and a NIC on our SAN (with NetApps).
>
> When we tried to upgrade to 2.4.28, we encountered a problem with NetApp
> reachability, which turns out to have been a problem with ARP: we
> were seeing two ARP entries for the NetApp IP's. One would be correct, and
> one would be "incomplete".
>
> Occasionally, a system would glom onto the incomplete entry, and NFS
> connectivity would tank. This doesn't happen with 2.4.27.
>
> We'd like to upgrade to 2.4.29-rc2, but we have much trepidation about doing
> so. I certainly don't want to treat the list as "our own personal help
> desk" (as warned about in the FAQ), but was hoping someone could shed some
> light on the problem. I think either myself or one of our guys can write a
> patch to fix it, if someone would point us in the right direction.
>
> Thank you,
Scott,
I have no idea of what might be causing such regression - I see a few ARP
related changelogs on v2.4.28-rc2:
o [IPV4]: Set ARP hw type correctly for BOOTP over FDDI
o [IPV4]: Permit the official ARP hw type in SIOCSARP for FDDI
Maybe you can try earlier v2.4.28's (-rc1 for one) to check where
the problem starts to happen?
David, Herbert, any ideas?
^ permalink raw reply [flat|nested] 6+ messages in thread
* 2.4.28(+?): Strange ARP problem
@ 2005-01-13 14:50 Scott Doty
2005-01-13 12:09 ` Marcelo Tosatti
0 siblings, 1 reply; 6+ messages in thread
From: Scott Doty @ 2005-01-13 14:50 UTC (permalink / raw)
To: linux-kernel
Hi,
We use Linux extensively here at Sonic.net. Our web servers have two
NIC's -- a NIC with a public IP address, and a NIC on our SAN (with NetApps).
When we tried to upgrade to 2.4.28, we encountered a problem with NetApp
reachability, which turns out to have been a problem with ARP: we
were seeing two ARP entries for the NetApp IP's. One would be correct, and
one would be "incomplete".
Occasionally, a system would glom onto the incomplete entry, and NFS
connectivity would tank. This doesn't happen with 2.4.27.
We'd like to upgrade to 2.4.29-rc2, but we have much trepidation about doing
so. I certainly don't want to treat the list as "our own personal help
desk" (as warned about in the FAQ), but was hoping someone could shed some
light on the problem. I think either myself or one of our guys can write a
patch to fix it, if someone would point us in the right direction.
Thank you,
-Scott
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.28(+?): Strange ARP problem
2005-01-13 12:09 ` Marcelo Tosatti
@ 2005-01-13 16:18 ` Scott Doty
2005-01-13 21:01 ` Herbert Xu
1 sibling, 0 replies; 6+ messages in thread
From: Scott Doty @ 2005-01-13 16:18 UTC (permalink / raw)
To: Marcelo Tosatti
On Thu, Jan 13, 2005 at 10:09:00AM -0200, Marcelo Tosatti wrote:
> Maybe you can try earlier v2.4.28's (-rc1 for one) to check where
> the problem starts to happen?
Good plan, we'll try different kernel versions to pin down exactly where the
problem starts.
-Scott
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.28(+?): Strange ARP problem
2005-01-13 12:09 ` Marcelo Tosatti
2005-01-13 16:18 ` Scott Doty
@ 2005-01-13 21:01 ` Herbert Xu
2005-01-13 22:01 ` Scott Doty
1 sibling, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2005-01-13 21:01 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Scott Doty, linux-kernel, davem
On Thu, Jan 13, 2005 at 10:09:00AM -0200, Marcelo Tosatti wrote:
>
> Maybe you can try earlier v2.4.28's (-rc1 for one) to check where
> the problem starts to happen?
The symptom sounds like the bug in the 2.4 backport of the neighbour
hash updates. In neigh_create, hash_val needs to be computed inside
the lock (and after the growing), not outside.
Someone even posted a patch for it. I'll dig it up tonight if it
doesn't show up by then.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.28(+?): Strange ARP problem
2005-01-13 21:01 ` Herbert Xu
@ 2005-01-13 22:01 ` Scott Doty
2005-01-13 22:15 ` Herbert Xu
0 siblings, 1 reply; 6+ messages in thread
From: Scott Doty @ 2005-01-13 22:01 UTC (permalink / raw)
To: Herbert Xu; +Cc: Marcelo Tosatti, linux-kernel, davem
[-- Attachment #1: Type: text/plain, Size: 600 bytes --]
On Fri, Jan 14, 2005 at 08:01:42AM +1100, Herbert Xu wrote:
> On Thu, Jan 13, 2005 at 10:09:00AM -0200, Marcelo Tosatti wrote:
> >
> > Maybe you can try earlier v2.4.28's (-rc1 for one) to check where
> > the problem starts to happen?
>
> The symptom sounds like the bug in the 2.4 backport of the neighbour
> hash updates. In neigh_create, hash_val needs to be computed inside
> the lock (and after the growing), not outside.
>
> Someone even posted a patch for it. I'll dig it up tonight if it
> doesn't show up by then.
I just built a patch from your description -- it's attached.
-Scott
[-- Attachment #2: patch-arp --]
[-- Type: text/plain, Size: 606 bytes --]
--- linux-2.4.29-rc2/net/core/neighbour.c 2005/01/13 21:55:06 1.1
+++ linux-2.4.29-rc2/net/core/neighbour.c 2005/01/13 21:56:32
@@ -427,11 +427,12 @@
n->confirmed = jiffies - (n->parms->base_reachable_time<<1);
- hash_val = tbl->hash(pkey, dev) & tbl->hash_mask;
-
write_lock_bh(&tbl->lock);
if (atomic_read(&tbl->entries) > (tbl->hash_mask + 1))
neigh_hash_grow(tbl, (tbl->hash_mask + 1) << 1);
+
+ hash_val = tbl->hash(pkey, dev) & tbl->hash_mask;
+
for (n1 = tbl->hash_buckets[hash_val]; n1; n1 = n1->next) {
if (dev == n1->dev &&
memcmp(n1->primary_key, pkey, key_len) == 0) {
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.28(+?): Strange ARP problem
2005-01-13 22:01 ` Scott Doty
@ 2005-01-13 22:15 ` Herbert Xu
0 siblings, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2005-01-13 22:15 UTC (permalink / raw)
To: Scott Doty; +Cc: Marcelo Tosatti, linux-kernel, davem
On Thu, Jan 13, 2005 at 02:01:00PM -0800, Scott Doty wrote:
>
> I just built a patch from your description -- it's attached.
Yep, that looks good. Please let us know if that fixes your problems.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-01-14 0:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-13 14:50 2.4.28(+?): Strange ARP problem Scott Doty
2005-01-13 12:09 ` Marcelo Tosatti
2005-01-13 16:18 ` Scott Doty
2005-01-13 21:01 ` Herbert Xu
2005-01-13 22:01 ` Scott Doty
2005-01-13 22:15 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).