linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ? -> New Info
       [not found] <OFDDD008B9.906B9BD6-ON88256E14.0038B2E4@us.ibm.com>
@ 2004-01-07 19:24 ` Martin Knoblauch
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Knoblauch @ 2004-01-07 19:24 UTC (permalink / raw)
  To: linux-net; +Cc: linux-kernel, David Stevens

Hi David,

 last message on this issue for today. By inserting printk's I could
show that the packets from the "external" Ganglia/gmond clients never
show up in the "udp_rcv" routine when running the 2.4.22 kernel. So it
seems something goes amiss when receiving certain UDP packets.

 Unfortunatelly this exhausts my clues about the dataflow of ipv4. Next
question from me would be who calls "udp_rcv", or would such call be
dispatched from the inet_protos table.

Cheers
Martin

--- David Stevens <dlstevens@us.ibm.com> wrote:
> 
> 
> 
> 
> BTW, one other thing to try with this new information-- if it works
> with
> 224.0.0.1 (the all-hosts group), then I'd assume it is not a problem
> with
> the bind() or an issue with multicast delivery.
> 
> If that's true, it may be a problem with the driver multicast address
> filter,
> then I'd expect it to not receive anything on the receiver side when
> using
> "tcpdump -i ethX" but it should work if you put the interface in
> promiscuous
> mode with "tcpdump -p -i ethX" (ie, add the "-p" option to tcpdump on
> the
> receiver and see if it works then). Can you try that?
> 
> Also, if you have any other network cards you could plug in to test
> with
> that are not tg3's, it'd be useful to test with those.
> 
>             +-DLS
> 
> 
> Martin Knoblauch <knobi@knobisoft.de>@vger.kernel.org on 01/07/2004
> 02:10:28 AM
> 
> Please respond to knobi@knobisoft.de
> 
> Sent by:    linux-net-owner@vger.kernel.org
> 
> 
> To:    linux-net@vger.kernel.org
> cc:    linux-kernel@vger.kernel.org, David
> Stevens/Beaverton/IBM@IBMUS
> Subject:
> 
> 
> 
> >
> > To rule out further causes, I rebuilt the 2.4.21 version of tg3.o
> >(V1.5) for the 2.4.22 kernel (tg3-V1.6). Unfortunatelly the problem
> did
> >not go away, which point into the direction of the pretty large
> >igmp/multicast changes introduced with 2.4.22. Debugging tg3 would
> have
> >been easier ...
> 
>  Next tidbit: the problem comes when using a multicast group
> different
> from 224.0.0.1. If I change the group used by Ganglia from
> 239.2.11.71
> to 224.0.0.1 all of a sudden the multicasts come in.
> 
>  Hope this helps the specialists to track it down.
> 


=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ? -> New Info
       [not found] <OFF9934921.DF9DB0E3-ON88256E15.00658D89@us.ibm.com>
@ 2004-01-09  8:37 ` Martin Knoblauch
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Knoblauch @ 2004-01-09  8:37 UTC (permalink / raw)
  To: David Stevens; +Cc: linux-kernel, linux-net

> 
> Martin,
>       If forcing it to use IGMPv2 makes it work when you aren't even
> using
> a multicast router, it sounds like you may have a switch that is
> snooping
> IGMP
> packets and doesn't understand IGMPv3. If so, that isn't a bug in
> linux;
> that would
> be your switch software needing an upgrade.
David,

 thanks for your suggestion. I will try to find out what kind of switch
the customer is using in their (pretty much a black hole for anybody)
network infrastructure.

 I am still confused why the switch would matter. The setup is 21
identical nodes, besides 20 are running 2.4.20 and one is running
2.4.22. All on the same switch. All on the same "line card" as far as I
know. The 20 2.4.20 nodes talk happily to each other over the
239.2.11.71 multicast. Only the one node running 2.4.22 does not "see"
the packets from the other 20 nodes (they see the packets from the node
running 2.4.22).

 So you are telling me that the switch somehow decides not to
send/forward 239.2.11.71 packets to the node running 2.4.22? As I said,
I have to check that out with the customers IT.

 The other thing is - why does it work with the 224.0.0.1 group.
Because the switch would always pass those packets?

>       I have no idea what you mean by "timer bug"; IGMP_V2_SEEN()
> only
> applies  when you have a v2 multicast router that is doing IGMPv2
> queries.

 Just a wild guess probably. From looking at the IGMP_Vx_SEEN macros it
seemed to me yesterday that at least for the V1 case the actual jiffie
comparison had been reversed when going from 2.4.21 to 2.4.22+. But
maybe it was already to late... OK, closer look at the source shows
that it was already to late. Nothing changed :-)

> You
> told me you were on a single network, in which case the version of
> IGMP or
> whether you send any IGMP reports at all is irrelevant, except for
> the
> special
> case of "smart" switches that are trying to use IGMP packets for port
> forwarding.

 OK. Maybe the switch is really to smart...

> IGMP reports do not in any way affect multicast group membership on
> the
> host.

 Correct. The group membership seems OK.

>       In any case, I have tested IGMPv2 compatibility mode and, of
> course,
> I found nothing broken-- certainly it WOULD violate the IGMPv3 RFC to
> always do IGMPv2, so I'm not sure what you're suggesting.
> 
>                         +-DLS
> 

 Not really suggesting anything here. I just try to understand an
apparently complex issue. From the outside it just looks like something
in 2.4.22+ breaks an application that has worked for years with Linux.
Now, maybe/likely you are right and the upgrade just shows some
interesting property of our customers setup :-)

 Or maybe I am suggesting an compile/runtime option to disable IGMP_V3.
Just a thought. I know, massaging good code to make other "broken"
stuff work is wrong in principle. But sometimes it is the only way top
go. Maybe I can come up with a patch.

 In any case thanks for your attention/suggestions/help. If anything
comes out of this, I will have learned something new.

Cheers
Martin
PS: Something I never mentioned. You CC linux-kernel and linux-net,but
I never actually saw your posts on the list .... Just in my mailbox.


=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ? -> New Info
       [not found] <OFFB53E1E3.C0B1F8BC-ON88256E14.003560C9@us.ibm.com>
  2004-01-07 14:08 ` Martin Knoblauch
@ 2004-01-08 18:03 ` Martin Knoblauch
  1 sibling, 0 replies; 6+ messages in thread
From: Martin Knoblauch @ 2004-01-08 18:03 UTC (permalink / raw)
  To: David Stevens; +Cc: linux-kernel, linux-net

[-- Attachment #1: Type: text/plain, Size: 951 bytes --]

--- David Stevens <dlstevens@us.ibm.com> wrote:
> 
> 
> 
> 
>       There were some unwanted side-effects in multicast delivery
> because
> of the source filtering but I'm pretty sure those fixes are in the
> 2.4
> line.
> 
David,

 maybe they are not :-)

 After some more playing with printk-s and a bit of gross hacking I
think I am up to something.

 Please look at the appended patch on top of 2.4.22. It adds some
printk-s and also makes some of the V2 pathes trigger by adding "1 ||"
to some statemens.

 With this all of a sudden the external packets to the 239.2.11.71
group of Ganglia come in again.

 Now, what does it mean?

a) IGMP_V2_SEEN does not work as expected ?
b) something with the timer codes is fishy ?
c) whatever ...

 Again, hope this helps to shed light on the problem.

Martin

=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[-- Attachment #2: igmp.diff --]
[-- Type: application/octet-stream, Size: 3679 bytes --]

--- ../../../linux-2.4.22-3-msc/net/ipv4/igmp.c	Mon Aug 25 13:44:44 2003
+++ ./igmp.c	Thu Jan  8 18:39:48 2004
@@ -477,6 +477,7 @@
 	struct sk_buff *skb = 0;
 	int type;
 
+	printk(KERN_DEBUG "igmpv3_send_report\n");
 	if (!pmc) {
 		read_lock(&in_dev->lock);
 		for (pmc=in_dev->mc_list; pmc; pmc=pmc->next) {
@@ -609,6 +610,7 @@
 	u32	group = pmc ? pmc->multiaddr : 0;
 	u32	dst;
 
+	printk(KERN_DEBUG "igmp_send_report: %0x %d\n",group, type);
 	if (type == IGMPV3_HOST_MEMBERSHIP_REPORT)
 		return igmpv3_send_report(in_dev, pmc);
 	else if (type == IGMP_HOST_LEAVE_MESSAGE)
@@ -708,9 +710,10 @@
 	im->reporter = 1;
 	spin_unlock(&im->lock);
 
+	printk(KERN_DEBUG "igmp_timer_expire\n");
 	if (IGMP_V1_SEEN(in_dev))
 		igmp_send_report(in_dev, im, IGMP_HOST_MEMBERSHIP_REPORT);
-	else if (IGMP_V2_SEEN(in_dev))
+	else if (1 || IGMP_V2_SEEN(in_dev))
 		igmp_send_report(in_dev, im, IGMPV2_HOST_MEMBERSHIP_REPORT);
 	else
 		igmp_send_report(in_dev, im, IGMPV3_HOST_MEMBERSHIP_REPORT);
@@ -774,6 +777,7 @@
 				IGMP_V1_Router_Present_Timeout;
 			group = 0;
 		} else {
+			printk(KERN_DEBUG "igmp_heard_query for V2\n");
 			/* v2 router present */
 			max_delay = ih->code*(HZ/IGMP_TIMER_SCALE);
 			in_dev->mr_v2_seen = jiffies +
@@ -840,11 +844,13 @@
 	struct in_device *in_dev = in_dev_get(skb->dev);
 	int len = skb->len;
 
+	printk(KERN_DEBUG "igmp_rcv: entered\n");
 	if (in_dev==NULL) {
 		kfree_skb(skb);
 		return 0;
 	}
 
+	printk(KERN_DEBUG "igmp_rcv: 1\n");
 	if (skb_is_nonlinear(skb)) {
 		if (skb_linearize(skb, GFP_ATOMIC) != 0) {
 			kfree_skb(skb);
@@ -853,12 +859,14 @@
 		ih = skb->h.igmph;
 	}
 
+	printk(KERN_DEBUG "igmp_rcv: 2\n");
 	if (len < sizeof(struct igmphdr) || ip_compute_csum((void *)ih, len)) {
 		in_dev_put(in_dev);
 		kfree_skb(skb);
 		return 0;
 	}
 
+	printk(KERN_DEBUG "igmp_rcv: 3\n");
 	switch (ih->type) {
 	case IGMP_HOST_MEMBERSHIP_QUERY:
 		igmp_heard_query(in_dev, ih, len);
@@ -909,6 +917,7 @@
 	   if (dev->mc_list && dev->flags&IFF_MULTICAST) { do it; }
 	   --ANK
 	   */
+	printk(KERN_DEBUG "ip_mc_filter_add: %s %x\n",dev->name,addr);
 	if (arp_mc_map(addr, buf, dev, 0) == 0)
 		dev_mc_add(dev,buf,dev->addr_len,0);
 }
@@ -1052,7 +1061,7 @@
 	if (in_dev->dev->flags & IFF_UP) {
 		if (IGMP_V1_SEEN(in_dev))
 			goto done;
-		if (IGMP_V2_SEEN(in_dev)) {
+		if (1 || IGMP_V2_SEEN(in_dev)) {
 			if (reporter)
 				igmp_send_report(in_dev, im, IGMP_HOST_LEAVE_MESSAGE);
 			goto done;
@@ -1071,16 +1080,19 @@
 {
 	struct in_device *in_dev = im->interface;
 
+	printk(KERN_DEBUG "igmp_group_added: 1\n");
 	if (im->loaded == 0) {
 		im->loaded = 1;
 		ip_mc_filter_add(in_dev, im->multiaddr);
 	}
 
 #ifdef CONFIG_IP_MULTICAST
+	printk(KERN_DEBUG "igmp_group_added: 2\n");
 	if (im->multiaddr == IGMP_ALL_HOSTS)
 		return;
 
-	if (IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) {
+	printk(KERN_DEBUG "igmp_group_added: 3\n");
+	if (1 || IGMP_V1_SEEN(in_dev) || IGMP_V2_SEEN(in_dev)) {
 		spin_lock_bh(&im->lock);
 		igmp_start_timer(im, IGMP_Initial_Report_Delay);
 		spin_unlock_bh(&im->lock);
@@ -1088,6 +1100,7 @@
 	}
 	/* else, v3 */
 
+	printk(KERN_DEBUG "igmp_group_added: 4\n");
 	im->crcount = in_dev->mr_qrv ? in_dev->mr_qrv :
 		IGMP_Unsolicited_Report_Count;
 	igmp_ifc_event(in_dev);
@@ -1110,6 +1123,7 @@
 
 	ASSERT_RTNL();
 
+	printk(KERN_DEBUG "ip_mc_inc_group: %s %x\n",in_dev->dev->name,addr);
 	for (im=in_dev->mc_list; im; im=im->next) {
 		if (im->multiaddr == addr) {
 			im->users++;
@@ -1135,6 +1149,7 @@
 	im->crcount = 0;
 	atomic_set(&im->refcnt, 1);
 	spin_lock_init(&im->lock);
+	printk(KERN_DEBUG "ip_mc_inc_group: 1\n");
 #ifdef CONFIG_IP_MULTICAST
 	im->tm_running=0;
 	init_timer(&im->timer);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ? -> New Info
       [not found] <OFFB53E1E3.C0B1F8BC-ON88256E14.003560C9@us.ibm.com>
@ 2004-01-07 14:08 ` Martin Knoblauch
  2004-01-08 18:03 ` Martin Knoblauch
  1 sibling, 0 replies; 6+ messages in thread
From: Martin Knoblauch @ 2004-01-07 14:08 UTC (permalink / raw)
  To: David Stevens; +Cc: linux-net, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2874 bytes --]

--- David Stevens <dlstevens@us.ibm.com> wrote:
> 
> 
> 
> 
> Martin,
>       Can you test with a recent 2.6 kernel? There have been
> a number of fixes applied there in recent months, and I
> haven't yet verified that all of those are in the 2.4 line yet.

 Ahhh. That would be a major venture. It would actually be my first
encounter with building 2.6. I am afraid I cannot do that on the
machines in question. Not now :-(

>       All multicast delivery is not broken, and the IGMP version is
> superficially irrelevant to multicast delivery on a local network,
> unless you have a "smart" switch that relies on IGMP reports
> to determine which ports to deliver multicasts
> on (and at the same time, one that doesn't understand IGMPv3). 

 No idea how smart the Cisco box is that the customer provided us with.
But. it is the same switch that works wit 2.4.21.

> But neither of your tcpdump traces show any IGMP traffic, meaning
> that you probably don't have "IP multicasting" turned on. That
> just means it won't do IGMP, which is fine.

 Hmm. What do you mean by "do not have IP multicasting turned on"? It
is definitely compiled in.

CONFIG_IP_MULTICAST=y is set in the kernel config file.Would there be
anything needed at runtime?

>       There were some unwanted side-effects in multicast delivery
> because of the source filtering but I'm pretty sure those fixes
> are in the 2.4 line.
> 
>  I have no idea what your application is supposed to be doing.
> Can you characterize it in some way and/or come up with a
> small test program that reproduces the failure? What are the
> receivers bound to (INADDR_ANY or the multicast address-- 
> "netstat -a" output would be helpful here)? Is the join done
> in the same process on the same socket or a different socket or
> different process?

 The best I can do is pointing you to the source of Ganglia.

http://ganglia.sourceforge.net/

 "netstat -a" from good/bad machines is appended. Not much to see.

> You earlier gave /proc/net/igmp output that showed the join was
> apparently successful. Does "netstat -s" show any "drop"
> statistics going up?

 "netstat -s" does not show anything that I would recognize as "drop
statisitcs". Basically everything looks OK. All counters that I would
associate with failures seem to be stable. In any case, output is
appended for the "bad" machine.

>  Also, can you run tcpdump on both machines when it is failing
> and see if packets sent from one machine are all showing up on the
> receiver machine?

 Hmm. I thought that was visible from the tcpdumps I sent?

>       And, just to be thorough, are you possibly using netfilter?
> 

 It was compiled in. I have now removed it, but it makes no difference.

Cheers
Martin

=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[-- Attachment #2: netstat-a.bad --]
[-- Type: application/octet-stream, Size: 9740 bytes --]

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 *:20000                 *:*                     LISTEN      
tcp        0      0 *:exec                  *:*                     LISTEN      
tcp        0      0 *:32768                 *:*                     LISTEN      
tcp        0      0 *:login                 *:*                     LISTEN      
tcp        0      0 *:32769                 *:*                     LISTEN      
tcp        0      0 *:shell                 *:*                     LISTEN      
tcp        0      0 *:644                   *:*                     LISTEN      
tcp        0      0 *:8649                  *:*                     LISTEN      
tcp        0      0 *:2381                  *:*                     LISTEN      
tcp        0      0 *:sunrpc                *:*                     LISTEN      
tcp        0      0 *:10000                 *:*                     LISTEN      
tcp        0      0 *:6001                  *:*                     LISTEN      
tcp        0      0 *:661                   *:*                     LISTEN      
tcp        0      0 *:ftp                   *:*                     LISTEN      
tcp        0      0 *:ssh                   *:*                     LISTEN      
tcp        0      0 *:telnet                *:*                     LISTEN      
tcp        0      0 *:664                   *:*                     LISTEN      
tcp        0      0 *:2301                  *:*                     LISTEN      
tcp        0      0 *:829                   *:*                     LISTEN      
tcp        0      0 localhost.localdo:33370 localhost.localdoma:644 TIME_WAIT   
tcp        0      0 localhost.localdo:33371 localhost.localdoma:644 TIME_WAIT   
tcp        0      0 lpsdm20.muc:33341       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33340       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33343       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33342       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33339       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33365       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33339       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33365       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33364       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33366       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33361       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33360       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33363       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33362       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33373       lpsdm20.muc:661         ESTABLISHED 
tcp        0      0 lpsdm20.muc:33372       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33369       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33368       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33349       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33348       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33351       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33350       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33347       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33346       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33357       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33356       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33359       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33358       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33353       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33352       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33355       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:33354       lpsdm20.muc:661         TIME_WAIT   
tcp        0      0 lpsdm20.muc:661         lpsdm20.muc:33373       ESTABLISHED 
tcp        0      0 localhost.:kerberos-adm localhost.localdoma:644 TIME_WAIT   
tcp        0      0 localhost.localdoma:756 localhost.localdoma:644 TIME_WAIT   
tcp        0      0 localhost.localdoma:981 localhost.localdoma:644 TIME_WAIT   
tcp        0      0 localhost.localdoma:977 localhost.localdoma:644 TIME_WAIT   
tcp        0      0 localhost.localdoma:989 localhost.localdoma:644 TIME_WAIT   
tcp        0      0 localhost.localdoma:985 localhost.localdoma:644 TIME_WAIT   
tcp        0      0 lpsdm20.muc:ssh         172.17.129.89:35238     ESTABLISHED 
udp        0      0 *:32768                 *:*                                 
udp        0      0 *:641                   *:*                                 
udp        0      0 *:32769                 *:*                                 
udp        0      0 lpsdm20i.muc:32771      239.2.11.71:8649        ESTABLISHED 
udp        0      0 *:781                   *:*                                 
udp        0      0 *:782                   *:*                                 
udp        0      0 *:783                   *:*                                 
udp        0      0 *:10000                 *:*                                 
udp        0      0 *:784                   *:*                                 
udp        0      0 *:785                   *:*                                 
udp        0      0 *:913                   *:*                                 
udp        0      0 *:786                   *:*                                 
udp        0      0 *:787                   *:*                                 
udp        0      0 *:788                   *:*                                 
udp        0      0 *:789                   *:*                                 
udp        0      0 *:661                   *:*                                 
udp        0      0 *:790                   *:*                                 
udp        0      0 *:791                   *:*                                 
udp        0      0 *:792                   *:*                                 
udp        0      0 *:793                   *:*                                 
udp        0      0 *:794                   *:*                                 
udp        0      0 *:795                   *:*                                 
udp        0      0 *:796                   *:*                                 
udp        0      0 *:797                   *:*                                 
udp        0      0 *:670                   *:*                                 
udp        0      0 *:798                   *:*                                 
udp        0      0 localhost.localdo:25375 *:*                                 
udp        0      0 *:799                   *:*                                 
udp        0      0 *:20000                 *:*                                 
udp        0      0 localhost.localdo:25376 *:*                                 
udp        0      0 *:800                   *:*                                 
udp        0      0 *:snmp                  *:*                                 
udp        0      0 localhost.localdo:25378 *:*                                 
udp        0      0 localhost.localdo:25385 *:*                                 
udp        0      0 localhost.localdo:25393 *:*                                 
udp        0      0 *:830                   *:*                                 
udp        0      0 239.2.11.71:8649        *:*                                 
udp        0      0 *:sunrpc                *:*                                 
udp        0      0 lpsdm20i.muc:ntp        *:*                                 
udp        0      0 lpsdm20.muc:ntp         *:*                                 
udp        0      0 localhost.localdoma:ntp *:*                                 
udp        0      0 *:ntp                   *:*                                 
udp        0      0 *:1022                  *:*                                 
udp        0      0 *:1023                  *:*                                 
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  2      [ ACC ]     STREAM     LISTENING     2279   /dev/gpmctl
unix  2      [ ACC ]     STREAM     LISTENING     8206   /tmp/.font-unix/fs7100
unix  2      [ ACC ]     STREAM     LISTENING     2074   /tmp/.X11-unix/X1
unix  14     [ ]         DGRAM                    936    /dev/log
unix  2      [ ]         DGRAM                    17173  
unix  2      [ ]         DGRAM                    8859   
unix  2      [ ]         DGRAM                    8209   
unix  2      [ ]         DGRAM                    3797   
unix  2      [ ]         DGRAM                    3796   
unix  2      [ ]         DGRAM                    2295   
unix  2      [ ]         DGRAM                    1942   
unix  2      [ ]         DGRAM                    1477   
unix  2      [ ]         DGRAM                    1294   
unix  2      [ ]         DGRAM                    1264   
unix  2      [ ]         DGRAM                    1001   
unix  2      [ ]         DGRAM                    945    

[-- Attachment #3: netstat-s.bad --]
[-- Type: application/octet-stream, Size: 2181 bytes --]

Ip:
    47249 total packets received
    0 forwarded
    0 incoming packets discarded
    46592 incoming packets delivered
    39399 requests sent out
    594 reassemblies required
    37 packets reassembled ok
Icmp:
    61 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 18
        echo requests: 43
    61 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 18
        echo replies: 43
Tcp:
    1016 active connections openings
    977 passive connection openings
    0 failed connection attempts
    0 connection resets received
    3 connections established
    17298 segments received
    16914 segments send out
    0 segments retransmited
    0 bad segments received.
    4 resets sent
Udp:
    22405 packets received
    18 packets to unknown port received.
    0 packet receive errors
    22423 packets sent
TcpExt:
    ArpFilter: 0
    934 TCP sockets finished time wait in fast timer
    306 delayed acks sent
    Quick ack mode was activated 4 times
    90 packets directly queued to recvmsg prequeue.
    51 packets directly received from backlog
    267 packets directly received from prequeue
    3844 packets header predicted
    14 packets header predicted and directly queued to user
    TCPPureAcks: 2453
    TCPHPAcks: 5548
    TCPRenoRecovery: 0
    TCPSackRecovery: 0
    TCPSACKReneging: 0
    TCPFACKReorder: 0
    TCPSACKReorder: 0
    TCPRenoReorder: 0
    TCPTSReorder: 0
    TCPFullUndo: 0
    TCPPartialUndo: 0
    TCPDSACKUndo: 0
    TCPLossUndo: 0
    TCPLoss: 0
    TCPLostRetransmit: 0
    TCPRenoFailures: 0
    TCPSackFailures: 0
    TCPLossFailures: 0
    TCPFastRetrans: 0
    TCPForwardRetrans: 0
    TCPSlowStartRetrans: 0
    TCPTimeouts: 0
    TCPRenoRecoveryFail: 0
    TCPSackRecoveryFail: 0
    TCPSchedulerFailed: 0
    TCPRcvCollapsed: 0
    TCPDSACKOldSent: 4
    TCPDSACKOfoSent: 0
    TCPDSACKRecv: 0
    TCPDSACKOfoRecv: 0
    TCPAbortOnSyn: 0
    TCPAbortOnData: 0
    TCPAbortOnClose: 0
    TCPAbortOnMemory: 0
    TCPAbortOnTimeout: 0
    TCPAbortOnLinger: 0
    TCPAbortFailed: 0
    TCPMemoryPressures: 0

[-- Attachment #4: netstat-a.good --]
[-- Type: application/octet-stream, Size: 10874 bytes --]

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 *:20000                 *:*                     LISTEN      
tcp        0      0 *:exec                  *:*                     LISTEN      
tcp        0      0 *:32768                 *:*                     LISTEN      
tcp        0      0 *:login                 *:*                     LISTEN      
tcp        0      0 *:shell                 *:*                     LISTEN      
tcp        0      0 *:676                   *:*                     LISTEN      
tcp        0      0 *:8649                  *:*                     LISTEN      
tcp        0      0 *:2381                  *:*                     LISTEN      
tcp        0      0 *:sunrpc                *:*                     LISTEN      
tcp        0      0 *:10000                 *:*                     LISTEN      
tcp        0      0 *:656                   *:*                     LISTEN      
tcp        0      0 *:6001                  *:*                     LISTEN      
tcp        0      0 *:661                   *:*                     LISTEN      
tcp        0      0 *:885                   *:*                     LISTEN      
tcp        0      0 *:ftp                   *:*                     LISTEN      
tcp        0      0 *:ssh                   *:*                     LISTEN      
tcp        0      0 *:telnet                *:*                     LISTEN      
tcp        0      0 *:2301                  *:*                     LISTEN      
tcp        0      0 lpsdm16.muc:33588       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33589       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33590       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33591       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33584       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33585       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33586       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33587       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33596       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33587       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33596       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33597       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33598       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33599       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33592       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33593       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33594       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33595       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33580       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33582       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33583       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33579       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33604       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33605       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33606       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33607       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33600       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33601       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33602       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33608       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33609       lpsdm16.muc:661         TIME_WAIT   
tcp        0      0 lpsdm16.muc:33610       lpsdm16.muc:661         ESTABLISHED 
tcp        0      0 lpsdm16.muc:800         sdmfs1.muc:nfs          ESTABLISHED 
tcp        0      0 lpsdm16.muc:618         sdmfs1.muc:nfs          TIME_WAIT   
tcp        0      0 lpsdm16.muc:npmp-gui    sdmfs1.muc:nfs          TIME_WAIT   
tcp        0      0 lpsdm16.muc:604         sdmfs1.muc:nfs          TIME_WAIT   
tcp        0      0 lpsdm16.muc:661         lpsdm16.muc:33610       ESTABLISHED 
tcp        0      0 lpsdm16.muc:ssh         lpsdm20.muc:33378       ESTABLISHED 
tcp        0      0 localhost.localdoma:916 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:920 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:924 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:944 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:949 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:948 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:951 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:957 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:958 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:941 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:940 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:943 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:942 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:980 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:987 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:961 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:965 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:964 localhost.localdoma:656 TIME_WAIT   
tcp        0      0 localhost.localdoma:974 localhost.localdoma:656 TIME_WAIT   
udp        0      0 *:32768                 *:*                                 
udp        0      0 *:32769                 *:*                                 
udp        0      0 lpsdm16i.muc:32772      239.2.11.71:8649        ESTABLISHED 
udp        0      0 *:781                   *:*                                 
udp        0      0 *:653                   *:*                                 
udp        0      0 *:782                   *:*                                 
udp        0      0 *:783                   *:*                                 
udp        0      0 *:10000                 *:*                                 
udp        0      0 *:784                   *:*                                 
udp        0      0 *:785                   *:*                                 
udp        0      0 *:786                   *:*                                 
udp        0      0 *:787                   *:*                                 
udp        0      0 *:788                   *:*                                 
udp        0      0 *:789                   *:*                                 
udp        0      0 *:790                   *:*                                 
udp        0      0 *:791                   *:*                                 
udp        0      0 *:792                   *:*                                 
udp        0      0 *:793                   *:*                                 
udp        0      0 *:794                   *:*                                 
udp        0      0 *:795                   *:*                                 
udp        0      0 *:796                   *:*                                 
udp        0      0 *:797                   *:*                                 
udp        0      0 *:925                   *:*                                 
udp        0      0 *:798                   *:*                                 
udp        0      0 localhost.localdo:25375 *:*                                 
udp        0      0 *:799                   *:*                                 
udp        0      0 *:20000                 *:*                                 
udp        0      0 localhost.localdo:25376 *:*                                 
udp        0      0 *:800                   *:*                                 
udp        0      0 *:snmp                  *:*                                 
udp        0      0 *:673                   *:*                                 
udp        0      0 localhost.localdo:25378 *:*                                 
udp        0      0 localhost.localdo:25385 *:*                                 
udp        0      0 localhost.localdo:25393 *:*                                 
udp        0      0 239.2.11.71:8649        *:*                                 
udp        0      0 *:sunrpc                *:*                                 
udp        0      0 *:886                   *:*                                 
udp        0      0 lpsdm16i.muc:ntp        *:*                                 
udp        0      0 lpsdm16.muc:ntp         *:*                                 
udp        0      0 localhost.localdoma:ntp *:*                                 
udp        0      0 *:ntp                   *:*                                 
udp        0      0 *:638                   *:*                                 
udp        0      0 *:1022                  *:*                                 
udp        0      0 *:1023                  *:*                                 
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  2      [ ACC ]     STREAM     LISTENING     1947   /dev/gpmctl
unix  2      [ ACC ]     STREAM     LISTENING     7079   /tmp/.font-unix/fs7100
unix  2      [ ACC ]     STREAM     LISTENING     1721   /tmp/.X11-unix/X1
unix  14     [ ]         DGRAM                    975    /dev/log
unix  2      [ ]         DGRAM                    16749  
unix  2      [ ]         DGRAM                    7183   
unix  2      [ ]         DGRAM                    7082   
unix  2      [ ]         DGRAM                    2748   
unix  2      [ ]         DGRAM                    2747   
unix  2      [ ]         DGRAM                    1968   
unix  2      [ ]         DGRAM                    1555   
unix  2      [ ]         DGRAM                    1509   
unix  2      [ ]         DGRAM                    1334   
unix  2      [ ]         DGRAM                    1299   
unix  2      [ ]         DGRAM                    1040   
unix  2      [ ]         DGRAM                    984    

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ? -> New Info
@ 2004-01-07 10:27 Martin Knoblauch
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Knoblauch @ 2004-01-07 10:27 UTC (permalink / raw)
  To: linux-net; +Cc: linux-kernel, David Stevens

sorry, wrong subject ...

>
> To rule out further causes, I rebuilt the 2.4.21 version of tg3.o
>(V1.5) for the 2.4.22 kernel (tg3-V1.6). Unfortunatelly the problem
did
>not go away, which point into the direction of the pretty large
>igmp/multicast changes introduced with 2.4.22. Debugging tg3 would
have
>been easier ...

Next tidbit: the problem comes when using a multicast group different
from 224.0.0.1. If I change the group used by Ganglia from 239.2.11.71
to 224.0.0.1 all of a sudden the multicasts come in.

Hope this helps the specialists to track it down.

Martin


=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ? -> New Info
@ 2004-01-07  9:28 Martin Knoblauch
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Knoblauch @ 2004-01-07  9:28 UTC (permalink / raw)
  To: linux-net; +Cc: linux-kernel, David Stevens

> The kernels are 2.4.22 and 2.4.23 (now .24) with some NFS patches. In
>the case of 2.4.24 those are:
>
>01-posix_race
>02-fix_commit
>03-fix_osx
>04-fix_lockd3
>06-fix_unlink
>07_seekdir
>
>from http://www.fys.uio.no/~trondmy/src/Linux-2.4.x/2.4.23-rc1 None of
>those looks like it does something to multicasts. In the worst case I
>could try to run with plain 2.4.22/23, but that would have to ait
until
>Wednesday.

 OK, here are some more hints on the problem. To rule out the NFS
pathces and pinpoint the time of problem introduction I tested my setup
with 2.4.21 vanilla and 2.4.22 vanilla. With 2.4.21 vanilla everything
works as before/expected. With 2.4.22 vanilla the Ganglia multicasts
are not seen in the system.

 To rule out further causes, I rebuilt the 2.4.21 version of tg3.o
(V1.5) for the 2.4.22 kernel (tg3-V1.6). Unfortunatelly the problem did
not go away, which point into the direction of the pretty large
igmp/multicast changes introduced with 2.4.22. Debugging tg3 would have
been easier ...

 As before - I am pretty keen on getting this fixed. So any hints are
appreciated.

 Just to avoid the obvious - the kernel-configuration for my 2.4.21 and
2.4.22 builds are almost identical (modulo symbols that got
added/removed to/from 2.4.22). The user-space is identical.

Cheers
Martin

=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-01-09  8:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <OFDDD008B9.906B9BD6-ON88256E14.0038B2E4@us.ibm.com>
2004-01-07 19:24 ` Any changes in Multicast code between 2.4.20 and 2.4.22/23 ? -> New Info Martin Knoblauch
     [not found] <OFF9934921.DF9DB0E3-ON88256E15.00658D89@us.ibm.com>
2004-01-09  8:37 ` Martin Knoblauch
     [not found] <OFFB53E1E3.C0B1F8BC-ON88256E14.003560C9@us.ibm.com>
2004-01-07 14:08 ` Martin Knoblauch
2004-01-08 18:03 ` Martin Knoblauch
2004-01-07 10:27 Martin Knoblauch
  -- strict thread matches above, loose matches on Subject: below --
2004-01-07  9:28 Martin Knoblauch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).