linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] 2.4.22-pre9-bk : bonding bug fixes
@ 2003-07-30 14:06 Willy Tarreau
  2003-07-30 23:49 ` David S. Miller
  2003-07-31 18:50 ` Jeff Garzik
  0 siblings, 2 replies; 4+ messages in thread
From: Willy Tarreau @ 2003-07-30 14:06 UTC (permalink / raw)
  To: davem, jgarzik, marcelo; +Cc: netdev, bonding-devel, linux-kernel


Hi Marcelo, David, Jeff...

there are still a few bugs in the current bonding driver. I've reported them
several times now, but perhaps not at the right places...

So :
  - the first patch fixes a typo in the MODULE_PARM_DESC

  - the second one adds a comment and a warning around some code I don't
    understand, but which cannot be executed. It's within function
    bond_xmit_activebackup, and only executes if bond->mode != ACTIVEBACKUP....

  - now the last one fixes a kernel panic due to a cheap hack which was introduced
    to determine the source IP address to use with ARP checks. It takes the first
    address of the first slave, and puts a lock on it. If there's no address, its
    ip_ptr is NULL, and the kernel panics while trying to get the lock. You can
    reproduce it easily this way :

    # modprobe eth0
    # modprobe bonding mode=active-backup miimon=1000
    # ip link set bond0 up
    # ifenslave bond0 eth0
    => kernel panic !

Now here are the patches. I really hope to get them into 2.4.22, since I'm a
bit fed up with my server panicing each time I try a vanilla new kernel which
I forget to patch...

Cheers,
Willy


======== first one ==========

--- linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c	Wed Jul 30 09:49:48 2003
+++ linux-2.4.22-pre9-bk-bond/drivers/net/bonding/bond_main.c	Wed Jul 30 15:09:15 2003
@@ -524,7 +524,7 @@
 MODULE_PARM(miimon, "i");
 MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
 MODULE_PARM(use_carrier, "i");
-MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; 09 for off, 1 for on (default)");
+MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default)");
 MODULE_PARM(mode, "s");
 MODULE_PARM_DESC(mode, "Mode of operation : 0 for round robin, 1 for active-backup, 2 for xor");
 MODULE_PARM(arp_interval, "i");



======== second one ==========

--- linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c	Wed Jul 30 15:12:11 2003
+++ linux-2.4.22-pre9-bk-bond/drivers/net/bonding/bond_main.c	Wed Jul 30 15:31:01 2003
@@ -3281,6 +3281,19 @@
 		memcpy(&my_ip, the_ip, 4);
 	}
 
+	/* w.tarreau - 2003/07/30
+	 * I don't understand the logic here :
+	 * - this code should be run only if we're NOT in active-backup mode, which
+	 *   is the only mode for which this function will be called.
+	 * - the comment says the code tries to avoid sending broadcasts for ARP
+	 *   requests when the destination is known. This is obviously wrong since
+	 *   it will prevent you from changing the dead equipment you were checking
+	 *   without reloading the bonding driver ! High availability and low
+	 *   network usage never mix well ...
+	 */
+#warning "This code may need a fix !"
+#ifdef HOW_CAN_THIS_BE_CALLED
+
 	/* if we are sending arp packets and don't know 
 	 * the target hw address, save it so we don't need 
 	 * to use a broadcast address.
@@ -3302,6 +3315,7 @@
 				memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
 		}
 	}
+#endif
 
 	read_lock(&bond->lock);
 

========= third one ==========


--- linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c	Wed Jul 30 15:09:15 2003
+++ linux-2.4.22-pre9-bk-bond/drivers/net/bonding/bond_main.c	Wed Jul 30 15:12:11 2003
@@ -1594,11 +1594,14 @@
 #endif
 			bond_set_slave_inactive_flags(new_slave);
 		}
-		read_lock_irqsave(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
-		ifap= &(((struct in_device *)slave_dev->ip_ptr)->ifa_list);
-		ifa = *ifap;
-		my_ip = ifa->ifa_address;
-		read_unlock_irqrestore(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
+		if (((struct in_device *)slave_dev->ip_ptr) != NULL) {
+			read_lock_irqsave(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
+			ifap= &(((struct in_device *)slave_dev->ip_ptr)->ifa_list);
+			ifa = *ifap;
+			if (ifa != NULL)
+				my_ip = ifa->ifa_address;
+			read_unlock_irqrestore(&(((struct in_device *)slave_dev->ip_ptr)->lock), rflags);
+		}
 
 		/* if there is a primary slave, remember it */
 		if (primary != NULL) {



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] 2.4.22-pre9-bk : bonding bug fixes
  2003-07-30 14:06 [PATCH] 2.4.22-pre9-bk : bonding bug fixes Willy Tarreau
@ 2003-07-30 23:49 ` David S. Miller
  2003-07-31  0:22   ` Jay Vosburgh
  2003-07-31 18:50 ` Jeff Garzik
  1 sibling, 1 reply; 4+ messages in thread
From: David S. Miller @ 2003-07-30 23:49 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: jgarzik, marcelo, netdev, bonding-devel, linux-kernel

On Wed, 30 Jul 2003 16:06:58 +0200
Willy Tarreau <willy@w.ods.org> wrote:

> there are still a few bugs in the current bonding driver. I've reported them
> several times now, but perhaps not at the right places...

So now we have these few bug fixes, and the backport of the
2.6.x version of the bonding code, both submitted on the same
day in fact :-)

Jeff I'd recommend we put Willy's fixes in if you think they're
OK, then we can think about the 2.6.x backport work for 2.4.23-preX

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] 2.4.22-pre9-bk : bonding bug fixes
  2003-07-30 23:49 ` David S. Miller
@ 2003-07-31  0:22   ` Jay Vosburgh
  0 siblings, 0 replies; 4+ messages in thread
From: Jay Vosburgh @ 2003-07-31  0:22 UTC (permalink / raw)
  To: David S. Miller
  Cc: Willy Tarreau, jgarzik, marcelo, netdev, bonding-devel, linux-kernel

>On Wed, 30 Jul 2003 16:06:58 +0200
>Willy Tarreau <willy@w.ods.org> wrote:
>
>> there are still a few bugs in the current bonding driver. I've reported them
>> several times now, but perhaps not at the right places...
>
>So now we have these few bug fixes, and the backport of the
>2.6.x version of the bonding code, both submitted on the same
>day in fact :-)
>
>Jeff I'd recommend we put Willy's fixes in if you think they're
>OK, then we can think about the 2.6.x backport work for 2.4.23-preX

	I've been looking at Willy's fixes, and the typo (first patch)
and locking fix (third patch) both look good to me.  The second patch
(the dead code warning) points out a real problem, in that the code in
question really has no function, but the patch probably doesn't go far
enough for a final solution (the variable that code would set,
arp_target_hw_addr, is referenced in other places, but ends up always
being NULL because the dead code is the only place it was ever set).

	A more proper solution would be to simply delete the dead code
and the arp_target_hw_addr variable, and replace the variable
references with NULL.  This means that all of the ARP probes sent will
be sent out as broadcasts, which is what's already happening, this
just makes the code clearer.  Patch follows (which replaces Willy's
second patch).

	Does this sound reasonable to everybody?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


--- linux-2.4.22-pre9-bk-wt/drivers/net/bonding/bond_main.c	2003-07-30 17:06:50.000000000 -0700
+++ linux-2.4.22-pre9-bk/drivers/net/bonding/bond_main.c	2003-07-30 17:08:53.000000000 -0700
@@ -463,7 +463,6 @@
 static unsigned long arp_target[MAX_ARP_IP_TARGETS] = { 0, } ;
 static int arp_ip_count = 0;
 static u32 my_ip = 0;
-char *arp_target_hw_addr = NULL;
 
 static char *primary= NULL;
 
@@ -596,8 +595,7 @@
 
 	for (i = 0; (i<MAX_ARP_IP_TARGETS) && arp_target[i]; i++) { 
 		arp_send(ARPOP_REQUEST, ETH_P_ARP, arp_target[i], slave->dev, 
-			 my_ip, arp_target_hw_addr, slave->dev->dev_addr,
-			 arp_target_hw_addr); 
+			 my_ip, NULL, slave->dev->dev_addr, NULL); 
 	} 
 }
  
@@ -1031,10 +1029,6 @@
 	}
 	if (arp_interval> 0) {  /* arp interval, in milliseconds. */
 		del_timer(&bond->arp_timer);
-                if (arp_target_hw_addr != NULL) {
-			kfree(arp_target_hw_addr); 
-			arp_target_hw_addr = NULL;
-		}
 	}
 
 	if (bond_mode == BOND_MODE_8023AD) {
@@ -3281,28 +3275,6 @@
 		memcpy(&my_ip, the_ip, 4);
 	}
 
-	/* if we are sending arp packets and don't know 
-	 * the target hw address, save it so we don't need 
-	 * to use a broadcast address.
-	 * don't do this if in active backup mode because the slaves must 
-	 * receive packets to stay up, and the only ones they receive are 
-	 * broadcasts. 
-	 */
-	if ( (bond_mode != BOND_MODE_ACTIVEBACKUP) && 
-             (arp_ip_count == 1) &&
-	     (arp_interval > 0) && (arp_target_hw_addr == NULL) &&
-	     (skb->protocol == __constant_htons(ETH_P_IP) ) ) {
-		struct ethhdr *eth_hdr = 
-			(struct ethhdr *) (((char *)skb->data));
-		struct iphdr *ip_hdr = (struct iphdr *)(eth_hdr + 1);
-
-		if (arp_target[0] == ip_hdr->daddr) {
-			arp_target_hw_addr = kmalloc(ETH_ALEN, GFP_KERNEL);
-			if (arp_target_hw_addr != NULL)
-				memcpy(arp_target_hw_addr, eth_hdr->h_dest, ETH_ALEN);
-		}
-	}
-
 	read_lock(&bond->lock);
 
 	read_lock(&bond->ptrlock);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] 2.4.22-pre9-bk : bonding bug fixes
  2003-07-30 14:06 [PATCH] 2.4.22-pre9-bk : bonding bug fixes Willy Tarreau
  2003-07-30 23:49 ` David S. Miller
@ 2003-07-31 18:50 ` Jeff Garzik
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff Garzik @ 2003-07-31 18:50 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: davem, marcelo, netdev, bonding-devel, linux-kernel

Applied patches 1 and 3, and will forward to Marcelo today.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-07-31 18:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-30 14:06 [PATCH] 2.4.22-pre9-bk : bonding bug fixes Willy Tarreau
2003-07-30 23:49 ` David S. Miller
2003-07-31  0:22   ` Jay Vosburgh
2003-07-31 18:50 ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).