All of lore.kernel.org
 help / color / mirror / Atom feed
From: lsorense@csclub.uwaterloo.ca (Lennart Sorensen)
To: pcnet32@verizon.net
Cc: netdev@vger.kernel.org
Subject: Re: Re: Strange connection slowdown on pcnet32
Date: Mon, 19 Feb 2007 17:29:20 -0500	[thread overview]
Message-ID: <20070219222920.GJ7585@csclub.uwaterloo.ca> (raw)
In-Reply-To: <20070219221845.GI7585@csclub.uwaterloo.ca>

On Mon, Feb 19, 2007 at 05:18:45PM -0500, Lennart Sorensen wrote:
> On Mon, Feb 19, 2007 at 03:11:36PM -0500, Lennart Sorensen wrote:
> > I have been poking at things with firescope to see if the MAC is
> > actually writing to system memory or not.
> > 
> > The entry that it gets stuch on is _always_ entry 0 in the rx_ring.
> > There does not appear to be any exceptions to this.  
> > 
> > Here is my firescope (slightly modified for this purpose) dump of the
> > rx_ring of eth1:
> > 
> > Descriptor:Address: /--base---\ /buf\ /sta\ /-message-\ /reserved-\
> >           :       : |         | |len| |tus| | length  | |         |
> > RXdesc[00]:6694000: 12 18 5f 05 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[01]:6694010: 12 78 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[02]:6694020: 12 a0 52 06 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[03]:6694030: 12 f8 c2 04 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[04]:6694040: 12 70 15 05 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[05]:6694050: 12 e8 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[06]:6694060: 12 e0 37 05 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[07]:6694070: 12 e8 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[08]:6694080: 12 e0 d5 04 fa f9 40 03 ee 05 00 00 00 00 00 00
> > RXdesc[09]:6694090: 12 d8 d1 05 fa f9 40 03 46 00 00 00 00 00 00 00
> > RXdesc[10]:66940a0: 12 d0 d1 05 fa f9 40 03 4e 00 00 00 00 00 00 00
> > RXdesc[11]:66940b0: 12 d8 02 05 fa f9 10 03 40 00 00 00 00 00 00 00
> > RXdesc[12]:66940c0: 12 d0 02 05 fa f9 40 03 46 00 00 00 00 00 00 00
> > RXdesc[13]:66940d0: 12 38 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[14]:66940e0: 12 30 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[15]:66940f0: 12 78 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[16]:6694100: 12 a0 58 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[17]:6694110: 12 b0 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[18]:6694120: 12 b8 04 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[19]:6694130: 12 70 2c 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[20]:6694140: 12 f8 56 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[21]:6694150: 12 c8 29 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[22]:6694160: 12 20 03 05 fa f9 00 80 ee 05 00 00 00 00 00 00
> > RXdesc[23]:6694170: 12 60 4c 05 fa f9 00 80 87 05 00 00 00 00 00 00
> > RXdesc[24]:6694180: 12 98 53 05 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[25]:6694190: 12 b0 cc 04 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[26]:66941a0: 12 a8 3f 05 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[27]:66941b0: 12 58 e8 04 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[28]:66941c0: 12 b0 4d 06 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[29]:66941d0: 12 38 ef 04 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[30]:66941e0: 12 98 1f 05 fa f9 00 80 40 00 00 00 00 00 00 00
> > RXdesc[31]:66941f0: 12 28 f1 04 fa f9 00 80 40 00 00 00 00 00 00 00
> > 
> > I only ever see entry 0 as status 0080 (0x8000 which is owned by mac),
> > and this is while the driver is checking entry 0 every time it tries to
> > check for any waiting packets.
> > 
> > Running tcpdump while pinging gives the interesting result that some
> > packets are ariving out of order making it seem like the driver is
> > processing the packets out of order.  Perhaps the driver is wrong to be
> > looking at entry 0, and should be looking at entry 1 and is hence stuck
> > until the whole receive ring has been filled again?
> > 
> > 15:06:04.112812 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 1
> > 15:06:05.119799 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 2
> > 15:06:05.120159 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 2
> > 15:06:05.127045 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 1
> > 15:06:06.119862 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 3
> > 15:06:07.119921 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 4
> > 15:06:08.119994 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 5
> > 15:06:08.426400 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 3
> > 15:06:08.427915 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 4
> > 15:06:08.429033 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 5
> > 15:06:09.120053 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 6
> > 15:06:10.120109 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 7
> > 15:06:10.705332 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 6
> > 15:06:10.707258 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 7
> > 15:06:11.120175 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 8
> > 15:06:12.120233 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 9
> > 15:06:13.120297 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 10
> > 15:06:14.120359 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 11
> > 15:06:14.120737 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 11
> > 15:06:14.127064 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 8
> > 15:06:14.127700 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 9
> > 15:06:14.128268 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 10
> > 15:06:15.120426 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 12
> > 15:06:16.120484 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 13
> > 15:06:16.120840 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 13
> > 15:06:16.129476 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 12
> > 15:06:17.120551 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 14
> > 15:06:17.615429 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 14
> > 15:06:18.120612 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 15
> > 15:06:19.120672 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 16
> > 15:06:19.748981 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 15
> > 15:06:19.749976 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 16
> > 15:06:20.120738 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 17
> > 15:06:21.120799 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 18
> > 15:06:22.120859 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 19
> > 15:06:23.120926 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 20
> > 15:06:24.120985 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 21
> > 15:06:24.881809 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 17
> > 15:06:24.882287 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 18
> > 15:06:24.882578 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 19
> > 15:06:24.883347 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 20
> > 15:06:24.884158 IP 10.128.10.1 > 10.128.10.254: icmp 64: echo reply seq 21
> > 15:06:25.129659 IP 10.128.10.254 > 10.128.10.1: icmp 64: echo request seq 22
> > 
> > Does anyone see any reason why the driver would get into such a state
> > after a pile of traffic in a row?  So far I have only seen it when NAPI
> > is enabled, so I have to assume so far that it makes a difference to it.
> 
> With more beating of the driver, I have managed to see it fail in places
> other than descriptor 0.  It is as if the MAC doesn't transfer the
> descriptor to the system memory, although I am not yet sure if there is
> packet loss involved involving retransmision or not.
> 
> I tried making this change as a test:
> 
>   static int pcnet32_rx(struct net_device *dev, int quota)
>   {
>           struct pcnet32_private *lp = dev->priv;
>           int entry = lp->cur_rx & lp->rx_mod_mask;
> +         int nextentry = (lp->cur_rx+1) & lp->rx_mod_mask;
>           struct pcnet32_rx_head *rxp = &lp->rx_ring[entry];
> +         struct pcnet32_rx_head *nextrxp = &lp->rx_ring[nextentry];
>           int npackets = 0;
> 
> +         /* Check for confused MAC and resync if necesary */
> +         if ((short)le16_to_cpu(rxp->status) < 0 && (short)le16_to_cpu(nextrxp->status) >= 0) {
> +                 entry = (++lp->cur_rx) & lp->rx_mod_mask;
> +                 rxp = &lp->rx_ring[entry];
> +                 printk("%s: rx_ring bugfix on entry %d\n",dev->name,entry);
> +         }
> 
>           /* If we own the next entry, it's a new packet. Send it up. */
>           while (quota > npackets && (short)le16_to_cpu(rxp->status) >= 0) {
>                   pcnet32_rx_entry(dev, lp, rxp, entry);
>                   npackets += 1;
>                   /*
>                    * The docs say that the buffer length isn't touched, but Andrew
>                    * Boyd of QNX reports that some revs of the 79C965 clear it.
>                    */
>                   rxp->buf_length = le16_to_cpu(2 - PKT_BUF_SZ);
>                   wmb();  /* Make sure owner changes after others are visible */
>                   rxp->status = le16_to_cpu(0x8000);
>                   entry = (++lp->cur_rx) & lp->rx_mod_mask;
>                   rxp = &lp->rx_ring[entry];
>           }
> 
>           return npackets;
>   }
> 
> Now this really can't be a solution.  It still occationally gets "stuck"
> for a short time, but with this change it manages to recover again but
> it does seem to be hitting by printk once in a while.  I am starting to
> wonder of the PCI transfers are failing once in a while.  I still can't
> figure out what NAPI is doing different unless some of the pcnet32_rx
> code should be run with interrupts disabled and isn't in NAPI mode.
> 
> eth1: rx_ring bugfix on entry 28
> eth1: rx_ring bugfix on entry 15
> eth1: rx_ring bugfix on entry 31
> eth1: rx_ring bugfix on entry 2
> eth1: rx_ring bugfix on entry 23
> eth1: rx_ring bugfix on entry 25
> eth1: rx_ring bugfix on entry 2
> eth1: rx_ring bugfix on entry 21
> eth1: rx_ring bugfix on entry 24
> eth1: rx_ring bugfix on entry 27
> eth1: rx_ring bugfix on entry 28
> eth1: rx_ring bugfix on entry 23
> eth1: rx_ring bugfix on entry 11
> eth1: rx_ring bugfix on entry 28
> eth1: rx_ring bugfix on entry 27
> eth1: rx_ring bugfix on entry 24
> eth1: rx_ring bugfix on entry 28
> eth1: rx_ring bugfix on entry 17

I just noticed, it seems almost all these problems occour right at the
start of transfers when the tcp window size is still being worked out
for the connection speed, and I am seeing the error count go up in
ifconfig for the port when it happens too.  Is it possible for an error
to get flagged in a receive descriptor without the owner bit being
updated?

--
Len Sorensen

  reply	other threads:[~2007-02-19 22:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-16 16:21 Re: Strange connection slowdown on pcnet32 pcnet32
2007-02-16 17:21 ` Lennart Sorensen
2007-02-16 20:23   ` Lennart Sorensen
2007-02-16 21:01     ` Lennart Sorensen
2007-02-16 22:27       ` Lennart Sorensen
2007-02-16 22:48         ` MediaGX/GeodeGX1 requires X86_OOSTORE. (Was: Re: Strange connection slowdown on pcnet32) Lennart Sorensen
2007-02-17  0:00           ` Lennart Sorensen
2007-02-17 14:11             ` MediaGX/GeodeGX1 requires X86_OOSTORE takada
2007-02-17 15:07               ` Lennart Sorensen
2007-02-19 14:55               ` Lennart Sorensen
2007-02-19 19:48                 ` Roland Dreier
2007-02-19 19:57                   ` Lennart Sorensen
2007-02-19 23:56                   ` takada
2007-02-20  0:02                     ` Lennart Sorensen
2007-02-20 11:34                       ` takada
2007-02-20 14:48                         ` Lennart Sorensen
2007-03-15  5:39                           ` takada
2007-03-15 17:31                             ` Lennart Sorensen
2007-03-17 13:08                               ` takada
2007-03-19 18:23                                 ` Lennart Sorensen
2007-02-19 20:11       ` Re: Strange connection slowdown on pcnet32 Lennart Sorensen
2007-02-19 22:18         ` Lennart Sorensen
2007-02-19 22:29           ` Lennart Sorensen [this message]
2007-02-19 23:45             ` Lennart Sorensen
2007-02-19 23:59               ` Lennart Sorensen
2007-02-20 21:44                 ` Lennart Sorensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070219222920.GJ7585@csclub.uwaterloo.ca \
    --to=lsorense@csclub.uwaterloo.ca \
    --cc=netdev@vger.kernel.org \
    --cc=pcnet32@verizon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.