linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.4.28(+?): Strange ARP problem
  2005-01-13 14:50 2.4.28(+?): Strange ARP problem Scott Doty
@ 2005-01-13 12:09 ` Marcelo Tosatti
  2005-01-13 16:18   ` Scott Doty
  2005-01-13 21:01   ` Herbert Xu
  0 siblings, 2 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2005-01-13 12:09 UTC (permalink / raw)
  To: Scott Doty; +Cc: linux-kernel, davem, Herbert Xu

On Thu, Jan 13, 2005 at 06:50:29AM -0800, Scott Doty wrote:
> Hi,
> 
> We use Linux extensively here at Sonic.net.  Our web servers have two
> NIC's -- a NIC with a public IP address, and a NIC on our SAN (with NetApps).
> 
> When we tried to upgrade to 2.4.28, we encountered a problem with NetApp
> reachability, which turns out to have been a problem with ARP:  we
> were seeing two ARP entries for the NetApp IP's.  One would be correct, and
> one would be "incomplete".
> 
> Occasionally, a system would glom onto the incomplete entry, and NFS
> connectivity would tank.  This doesn't happen with 2.4.27.
> 
> We'd like to upgrade to 2.4.29-rc2, but we have much trepidation about doing
> so.  I certainly don't want to treat the list as "our own personal help
> desk" (as warned about in the FAQ), but was hoping someone could shed some
> light on the problem.  I think either myself or one of our guys can write a
> patch to fix it, if someone would point us in the right direction.
> 
> Thank you,

Scott, 

I have no idea of what might be causing such regression - I see a few ARP
related changelogs on v2.4.28-rc2:

  o [IPV4]: Set ARP hw type correctly for BOOTP over FDDI
  o [IPV4]: Permit the official ARP hw type in SIOCSARP for FDDI

Maybe you can try earlier v2.4.28's (-rc1 for one) to check where 
the problem starts to happen?

David, Herbert, any ideas?



^ permalink raw reply	[flat|nested] 6+ messages in thread

* 2.4.28(+?): Strange ARP problem
@ 2005-01-13 14:50 Scott Doty
  2005-01-13 12:09 ` Marcelo Tosatti
  0 siblings, 1 reply; 6+ messages in thread
From: Scott Doty @ 2005-01-13 14:50 UTC (permalink / raw)
  To: linux-kernel

Hi,

We use Linux extensively here at Sonic.net.  Our web servers have two
NIC's -- a NIC with a public IP address, and a NIC on our SAN (with NetApps).

When we tried to upgrade to 2.4.28, we encountered a problem with NetApp
reachability, which turns out to have been a problem with ARP:  we
were seeing two ARP entries for the NetApp IP's.  One would be correct, and
one would be "incomplete".

Occasionally, a system would glom onto the incomplete entry, and NFS
connectivity would tank.  This doesn't happen with 2.4.27.

We'd like to upgrade to 2.4.29-rc2, but we have much trepidation about doing
so.  I certainly don't want to treat the list as "our own personal help
desk" (as warned about in the FAQ), but was hoping someone could shed some
light on the problem.  I think either myself or one of our guys can write a
patch to fix it, if someone would point us in the right direction.

Thank you,

 -Scott

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.28(+?): Strange ARP problem
  2005-01-13 12:09 ` Marcelo Tosatti
@ 2005-01-13 16:18   ` Scott Doty
  2005-01-13 21:01   ` Herbert Xu
  1 sibling, 0 replies; 6+ messages in thread
From: Scott Doty @ 2005-01-13 16:18 UTC (permalink / raw)
  To: Marcelo Tosatti

On Thu, Jan 13, 2005 at 10:09:00AM -0200, Marcelo Tosatti wrote:
> Maybe you can try earlier v2.4.28's (-rc1 for one) to check where 
> the problem starts to happen?

Good plan, we'll try different kernel versions to pin down exactly where the
problem starts.

 -Scott

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.28(+?): Strange ARP problem
  2005-01-13 12:09 ` Marcelo Tosatti
  2005-01-13 16:18   ` Scott Doty
@ 2005-01-13 21:01   ` Herbert Xu
  2005-01-13 22:01     ` Scott Doty
  1 sibling, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2005-01-13 21:01 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Scott Doty, linux-kernel, davem

On Thu, Jan 13, 2005 at 10:09:00AM -0200, Marcelo Tosatti wrote:
> 
> Maybe you can try earlier v2.4.28's (-rc1 for one) to check where 
> the problem starts to happen?

The symptom sounds like the bug in the 2.4 backport of the neighbour
hash updates.  In neigh_create, hash_val needs to be computed inside
the lock (and after the growing), not outside.

Someone even posted a patch for it.  I'll dig it up tonight if it
doesn't show up by then.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.28(+?): Strange ARP problem
  2005-01-13 21:01   ` Herbert Xu
@ 2005-01-13 22:01     ` Scott Doty
  2005-01-13 22:15       ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Scott Doty @ 2005-01-13 22:01 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Marcelo Tosatti, linux-kernel, davem

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

On Fri, Jan 14, 2005 at 08:01:42AM +1100, Herbert Xu wrote:
> On Thu, Jan 13, 2005 at 10:09:00AM -0200, Marcelo Tosatti wrote:
> > 
> > Maybe you can try earlier v2.4.28's (-rc1 for one) to check where 
> > the problem starts to happen?
> 
> The symptom sounds like the bug in the 2.4 backport of the neighbour
> hash updates.  In neigh_create, hash_val needs to be computed inside
> the lock (and after the growing), not outside.
> 
> Someone even posted a patch for it.  I'll dig it up tonight if it
> doesn't show up by then.

I just built a patch from your description -- it's attached.

 -Scott

[-- Attachment #2: patch-arp --]
[-- Type: text/plain, Size: 606 bytes --]

--- linux-2.4.29-rc2/net/core/neighbour.c	2005/01/13 21:55:06	1.1
+++ linux-2.4.29-rc2/net/core/neighbour.c	2005/01/13 21:56:32
@@ -427,11 +427,12 @@
 
 	n->confirmed = jiffies - (n->parms->base_reachable_time<<1);
 
-	hash_val = tbl->hash(pkey, dev) & tbl->hash_mask;
-
 	write_lock_bh(&tbl->lock);
 	if (atomic_read(&tbl->entries) > (tbl->hash_mask + 1))
 		neigh_hash_grow(tbl, (tbl->hash_mask + 1) << 1);
+
+	hash_val = tbl->hash(pkey, dev) & tbl->hash_mask;
+
 	for (n1 = tbl->hash_buckets[hash_val]; n1; n1 = n1->next) {
 		if (dev == n1->dev &&
 		    memcmp(n1->primary_key, pkey, key_len) == 0) {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.28(+?): Strange ARP problem
  2005-01-13 22:01     ` Scott Doty
@ 2005-01-13 22:15       ` Herbert Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2005-01-13 22:15 UTC (permalink / raw)
  To: Scott Doty; +Cc: Marcelo Tosatti, linux-kernel, davem

On Thu, Jan 13, 2005 at 02:01:00PM -0800, Scott Doty wrote:
> 
> I just built a patch from your description -- it's attached.

Yep, that looks good.  Please let us know if that fixes your problems.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-01-14  0:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-13 14:50 2.4.28(+?): Strange ARP problem Scott Doty
2005-01-13 12:09 ` Marcelo Tosatti
2005-01-13 16:18   ` Scott Doty
2005-01-13 21:01   ` Herbert Xu
2005-01-13 22:01     ` Scott Doty
2005-01-13 22:15       ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).