From mboxrd@z Thu Jan 1 00:00:00 1970 From: lsorense@csclub.uwaterloo.ca (Lennart Sorensen) Subject: Re: Re: Strange connection slowdown on pcnet32 Date: Mon, 19 Feb 2007 15:11:36 -0500 Message-ID: <20070219201136.GH7585@csclub.uwaterloo.ca> References: <32943920.1119801171642884331.JavaMail.root@vms226.mailsrvcs.net> <20070216172110.GC7582@csclub.uwaterloo.ca> <20070216202300.GD7585@csclub.uwaterloo.ca> <20070216210157.GE7585@csclub.uwaterloo.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org To: pcnet32@verizon.net Return-path: Received: from caffeine.uwaterloo.ca ([129.97.134.17]:47747 "EHLO caffeine.csclub.uwaterloo.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932567AbXBSULh (ORCPT ); Mon, 19 Feb 2007 15:11:37 -0500 Content-Disposition: inline In-Reply-To: <20070216210157.GE7585@csclub.uwaterloo.ca> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Fri, Feb 16, 2007 at 04:01:57PM -0500, Lennart Sorensen wrote: > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: pcnet32_poll: pcnet32_rx() got 16 packets > eth1: base: 0x05215812 status: 0310 next->status: 0310 > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: netif_receive_skb(skb) > eth1: pcnet32_poll: pcnet32_rx() got 16 packets > eth1: base: 0x04c51812 status: ffff8000 next->status: 0310 > eth1: pcnet32_poll: pcnet32_rx() got 0 packets > eth1: interrupt csr0=0x6f3 new csr=0x33, csr3=0x0000. > eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. > eth1: base: 0x04c51812 status: ffff8000 next->status: 0310 > eth1: pcnet32_poll: pcnet32_rx() got 0 packets > eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x0000. > eth1: exiting interrupt, csr0=0x0033, csr3=0x5f00. > eth1: base: 0x04c51812 status: ffff8000 next->status: 0310 > eth1: pcnet32_poll: pcnet32_rx() got 0 packets > eth1: interrupt csr0=0x4f3 new csr=0x33, csr3=0x0000. > eth1: exiting interrupt, csr0=0x0433, csr3=0x5f00. > eth1: base: 0x04c51812 status: ffff8000 next->status: 0310 > eth1: pcnet32_poll: pcnet32_rx() got 0 packets > > So somehow it ends up that when it reads the status of the descriptor at > address 0x04c51812, it sees the status as 0x8000 (which means owned by > the MAC I believe), even though the next descriptor in the ring has a > sensible status, indicating that the descriptor is ready to be handled > by the driver. Since the descriptor isn't ready, we exit without > handling anything and NAPI reschedules is the next time we get an > interrupt, and after some random number of tries, we finally see the > right status and handle the packet, along with a bunch of other packets > waiting in the descriptor ring. Then we seem to hit the exact same > descriptor address again, with the same problem in the status we read, > and again we are stuck for a while, until finally we see the right > status, and another pile of packets get handled, and we again hit the > same descriptor address and get stuck. I have been poking at things with firescope to see if the MAC is actually writing to system memory or not. The entry that it gets stuch on is _always_ entry 0 in the rx_ring. There does not appear to be any exceptions to this. Here is my firescope (slightly modified for this purpose) dump of the rx_ring of eth1: Descriptor:Address: /--base---\ /buf\ /sta\ /-message-\ /reserved-\ : : | | |len| |tus| | length | | | RXdesc[00]:6694000: 12 18 5f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[01]:6694010: 12 78 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[02]:6694020: 12 a0 52 06 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[03]:6694030: 12 f8 c2 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[04]:6694040: 12 70 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[05]:6694050: 12 e8 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[06]:6694060: 12 e0 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[07]:6694070: 12 e8 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[08]:6694080: 12 e0 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00 RXdesc[09]:6694090: 12 d8 d1 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[10]:66940a0: 12 d0 d1 05 fa f9 40 03 4e 00 00 00 00 00 00 00 RXdesc[11]:66940b0: 12 d8 02 05 fa f9 10 03 40 00 00 00 00 00 00 00 RXdesc[12]:66940c0: 12 d0 02 05 fa f9 40 03 46 00 00 00 00 00 00 00 RXdesc[13]:66940d0: 12 38 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[14]:66940e0: 12 30 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[15]:66940f0: 12 78 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[16]:6694100: 12 a0 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[17]:6694110: 12 b0 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[18]:6694120: 12 b8 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[19]:6694130: 12 70 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[20]:6694140: 12 f8 56 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[21]:6694150: 12 c8 29 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[22]:6694160: 12 20 03 05 fa f9 00 80 ee 05 00 00 00 00 00 00 RXdesc[23]:6694170: 12 60 4c 05 fa f9 00 80 87 05 00 00 00 00 00 00 RXdesc[24]:6694180: 12 98 53 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[25]:6694190: 12 b0 cc 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[26]:66941a0: 12 a8 3f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[27]:66941b0: 12 58 e8 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[28]:66941c0: 12 b0 4d 06 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[29]:66941d0: 12 38 ef 04 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[30]:66941e0: 12 98 1f 05 fa f9 00 80 40 00 00 00 00 00 00 00 RXdesc[31]:66941f0: 12 28 f1 04 fa f9 00 80 40 00 00 00 00 00 00 00 I only ever see entry 0 as status 0080 (0x8000 which is owned by mac), and this is while the driver is checking entry 0 every time it tries to check for any waiting packets. Running tcpdump while pinging gives the interesting result that some packets are ariving out of order making it seem like the driver is processing the packets out of order. Perhaps the driver is wrong to be looking at entry 0, and should be looking at entry 1 and is hence stuck until the whole receive ring has been filled again? 15:06:04.112812 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 1 15:06:05.119799 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 2 15:06:05.120159 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 2 15:06:05.127045 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 1 15:06:06.119862 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 3 15:06:07.119921 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 4 15:06:08.119994 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 5 15:06:08.426400 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 3 15:06:08.427915 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 4 15:06:08.429033 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 5 15:06:09.120053 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 6 15:06:10.120109 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 7 15:06:10.705332 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 6 15:06:10.707258 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 7 15:06:11.120175 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 8 15:06:12.120233 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 9 15:06:13.120297 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 10 15:06:14.120359 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 11 15:06:14.120737 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 11 15:06:14.127064 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 8 15:06:14.127700 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 9 15:06:14.128268 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 10 15:06:15.120426 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 12 15:06:16.120484 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 13 15:06:16.120840 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 13 15:06:16.129476 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 12 15:06:17.120551 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 14 15:06:17.615429 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 14 15:06:18.120612 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 15 15:06:19.120672 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 16 15:06:19.748981 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 15 15:06:19.749976 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 16 15:06:20.120738 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 17 15:06:21.120799 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 18 15:06:22.120859 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 19 15:06:23.120926 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 20 15:06:24.120985 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 21 15:06:24.881809 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 17 15:06:24.882287 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 18 15:06:24.882578 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 19 15:06:24.883347 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 20 15:06:24.884158 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 21 15:06:25.129659 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 22 Does anyone see any reason why the driver would get into such a state after a pile of traffic in a row? So far I have only seen it when NAPI is enabled, so I have to assume so far that it makes a difference to it. -- Len Sorensen