linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible BUG where mac80211 fails to stop queues
@ 2009-07-26 22:52 Larry Finger
  2009-07-27  8:48 ` Johannes Berg
  0 siblings, 1 reply; 4+ messages in thread
From: Larry Finger @ 2009-07-26 22:52 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, Michael Buesch, wireless

While stress testing the newest version of the open-source firmware
for BCM43XX devices with the latest pull of wireless-testing, I ran
into a problem of DMA TX queue overrun. Initially I thought this was
due to the firmware change; however, I got the same error with the
standard firmware. I have not seen this before, but it may not be a
regression as it seems to occur only under special circumstances.

The critical code is in b43_dma_tx(), which is called by the .tx
callback routine registered with mac80211.

After the fragment is transmitted by a call to dma_tx_fragment() at
line 1353, the routine checks to see if there are sufficient free
slots (2) to transmit another fragment using the code below:

        if ((free_slots(ring) < TX_SLOTS_PER_FRAME) ||
            should_inject_overflow(ring)) {
                /* This TX ring is full. */
                ieee80211_stop_queue(dev->wl->hw,
				skb_get_queue_mapping(skb));
                ring->stopped = 1;
                if (b43_debug(dev, B43_DBG_DMAVERBOSE)) {
                        b43dbg(dev->wl, "Stopped TX ring %d\n",
			       ring->index);
                }
        }


The problem shows up at line 1340 for the next fragment:

       B43_WARN_ON(ring->stopped);

        if (unlikely(free_slots(ring) < TX_SLOTS_PER_FRAME)) {
                b43warn(dev->wl, "DMA queue overflow\n");
                err = -ENOSPC;
                goto out_unlock;
        }

The system generates the warning for ring->stopped and prints the "DMA
queue overflow" message.

My understanding is that mac80211 serializes the calls for each TX
queue, and that the TX callback should not have been entered for this
case.

If I am not understanding the way that mac80211 works, please correct
me. I would also appreciate any suggestions for further debugging.

Larry

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Possible BUG where mac80211 fails to stop queues
  2009-07-26 22:52 Possible BUG where mac80211 fails to stop queues Larry Finger
@ 2009-07-27  8:48 ` Johannes Berg
  2009-07-27 15:55   ` Larry Finger
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Berg @ 2009-07-27  8:48 UTC (permalink / raw)
  To: Larry Finger; +Cc: John Linville, Michael Buesch, wireless

[-- Attachment #1: Type: text/plain, Size: 2640 bytes --]

On Sun, 2009-07-26 at 17:52 -0500, Larry Finger wrote:
> While stress testing the newest version of the open-source firmware
> for BCM43XX devices with the latest pull of wireless-testing, I ran
> into a problem of DMA TX queue overrun. Initially I thought this was
> due to the firmware change; however, I got the same error with the
> standard firmware. I have not seen this before, but it may not be a
> regression as it seems to occur only under special circumstances.

I've also seen it under extreme stress on Intel hardware, cf.
http://thread.gmane.org/gmane.linux.kernel.wireless.general/36497

> The critical code is in b43_dma_tx(), which is called by the .tx
> callback routine registered with mac80211.
> 
> After the fragment is transmitted by a call to dma_tx_fragment() at
> line 1353, the routine checks to see if there are sufficient free
> slots (2) to transmit another fragment using the code below:
> 
>         if ((free_slots(ring) < TX_SLOTS_PER_FRAME) ||
>             should_inject_overflow(ring)) {
>                 /* This TX ring is full. */
>                 ieee80211_stop_queue(dev->wl->hw,
> 				skb_get_queue_mapping(skb));
>                 ring->stopped = 1;
>                 if (b43_debug(dev, B43_DBG_DMAVERBOSE)) {
>                         b43dbg(dev->wl, "Stopped TX ring %d\n",
> 			       ring->index);
>                 }
>         }
> 
> 
> The problem shows up at line 1340 for the next fragment:
> 
>        B43_WARN_ON(ring->stopped);
> 
>         if (unlikely(free_slots(ring) < TX_SLOTS_PER_FRAME)) {
>                 b43warn(dev->wl, "DMA queue overflow\n");
>                 err = -ENOSPC;
>                 goto out_unlock;
>         }
> 
> The system generates the warning for ring->stopped and prints the "DMA
> queue overflow" message.

Right. Exactly the same behaviour as I'm seeing on Intel hardware.

> My understanding is that mac80211 serializes the calls for each TX
> queue, and that the TX callback should not have been entered for this
> case.
> 
> If I am not understanding the way that mac80211 works, please correct
> me. I would also appreciate any suggestions for further debugging.

I stared at the mac80211 code for a long time and concluded that it was
a race condition and couldn't really be fixed, see my analysis in the
iwlwifi patch. I'd love to be proved wrong though.

Are you seeing this multiple times? I don't think you have fragmentation
on, do you? At least I didn't and still saw the problem, which seemed a
bit strange, but I really couldn't see any other way for it to happen.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Possible BUG where mac80211 fails to stop queues
  2009-07-27  8:48 ` Johannes Berg
@ 2009-07-27 15:55   ` Larry Finger
  2009-07-27 16:04     ` Johannes Berg
  0 siblings, 1 reply; 4+ messages in thread
From: Larry Finger @ 2009-07-27 15:55 UTC (permalink / raw)
  To: Johannes Berg; +Cc: John Linville, Michael Buesch, wireless

Johannes Berg wrote:
> On Sun, 2009-07-26 at 17:52 -0500, Larry Finger wrote:
>> While stress testing the newest version of the open-source firmware
>> for BCM43XX devices with the latest pull of wireless-testing, I ran
>> into a problem of DMA TX queue overrun. Initially I thought this was
>> due to the firmware change; however, I got the same error with the
>> standard firmware. I have not seen this before, but it may not be a
>> regression as it seems to occur only under special circumstances.
> 
> I've also seen it under extreme stress on Intel hardware, cf.
> http://thread.gmane.org/gmane.linux.kernel.wireless.general/36497

Fortunately, the b43 coding was robust enough to prevent queue
overrun, thus we just end up with a warning.
--snip--

>> The system generates the warning for ring->stopped and prints the "DMA
>> queue overflow" message.
> 
> Right. Exactly the same behaviour as I'm seeing on Intel hardware.
> 
>> My understanding is that mac80211 serializes the calls for each TX
>> queue, and that the TX callback should not have been entered for this
>> case.
>>
>> If I am not understanding the way that mac80211 works, please correct
>> me. I would also appreciate any suggestions for further debugging.
> 
> I stared at the mac80211 code for a long time and concluded that it was
> a race condition and couldn't really be fixed, see my analysis in the
> iwlwifi patch. I'd love to be proved wrong though.
> 
> Are you seeing this multiple times? I don't think you have fragmentation
> on, do you? At least I didn't and still saw the problem, which seemed a
> bit strange, but I really couldn't see any other way for it to happen.

When it occurs, I get just a single warning. Fragmentation was not on.

I will prepare a patch that acknowledges that mac80211 might send one
extra fragment after the queues are stopped and only issue a warning
if we get more than one.

Thanks,

Larry

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Possible BUG where mac80211 fails to stop queues
  2009-07-27 15:55   ` Larry Finger
@ 2009-07-27 16:04     ` Johannes Berg
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Berg @ 2009-07-27 16:04 UTC (permalink / raw)
  To: Larry Finger; +Cc: John Linville, Michael Buesch, wireless

[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]

On Mon, 2009-07-27 at 10:55 -0500, Larry Finger wrote:

> > I've also seen it under extreme stress on Intel hardware, cf.
> > http://thread.gmane.org/gmane.linux.kernel.wireless.general/36497
> 
> Fortunately, the b43 coding was robust enough to prevent queue
> overrun, thus we just end up with a warning.

Right. iwlwifi wasn't so lucky.

> > Are you seeing this multiple times? I don't think you have fragmentation
> > on, do you? At least I didn't and still saw the problem, which seemed a
> > bit strange, but I really couldn't see any other way for it to happen.
> 
> When it occurs, I get just a single warning. Fragmentation was not on.

> I will prepare a patch that acknowledges that mac80211 might send one
> extra fragment after the queues are stopped and only issue a warning
> if we get more than one.

Sounds reasonable. You could even change the threshold from 2 to 4 and
enqueue the frame anyway, if still >= 2 slots free, I guess.

However, I think I've seen this happen more than once, which really
makes me think there's a bug in mac80211 too, but I haven't been able to
find that bug if any.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-07-27 16:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-26 22:52 Possible BUG where mac80211 fails to stop queues Larry Finger
2009-07-27  8:48 ` Johannes Berg
2009-07-27 15:55   ` Larry Finger
2009-07-27 16:04     ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).