All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anssi Hannula <anssi.hannula@bitwise.fi>
To: Harini Katakam <harinik@xilinx.com>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>,
	David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org
Subject: Re: [PATCH 2/3] net: macb: fix dropped RX frames due to a race
Date: Mon, 3 Dec 2018 12:31:52 +0200	[thread overview]
Message-ID: <f15d8b59-7e89-225b-9c52-52a61713cc05@bitwise.fi> (raw)
In-Reply-To: <CAFcVEC+_PAS7ZneaYpU-VKhEL+UAQLu8=H=Csptqg1oMSuFk9A@mail.gmail.com>

Hi,

On 3.12.2018 6:52, Harini Katakam wrote:
> Hi Anssi,
> On Fri, Nov 30, 2018 at 11:53 PM Anssi Hannula <anssi.hannula@bitwise.fi> wrote:
>> Bit RX_USED set to 0 in the address field allows the controller to write
>> data to the receive buffer descriptor.
>>
>> The driver does not ensure the ctrl field is ready (cleared) when the
>> controller sees the RX_USED=0 written by the driver. The ctrl field might
>> only be cleared after the controller has already updated it according to
>> a newly received frame, causing the frame to be discarded in gem_rx() due
>> to unexpected ctrl field contents.
>>
>> A message is logged when the above scenario occurs:
>>
>>   macb ff0b0000.ethernet eth0: not whole frame pointed by descriptor
>>
>> Fix the issue by ensuring that when the controller sees RX_USED=0 the
>> ctrl field is already cleared.
>>
>> This issue was observed on a ZynqMP based system.
>>
> Thanks for the patch.
> Could you please describe the test in which this behavior was observed?

Sure. The testcase I used for the patches is:

- RT_FULL kernel,
- CPU-bound SCHED_FF RT priority 15 process (with
rcutree.kthread_prio=20 to avoid RCU starvation),
- Pyropus memtester running for 3GB (system has 4GB memory),
- "ping -f -l 5000 -s 100" running from a PC.

The "not whole frame pointed by descriptor" issue occurs within minutes
and the RX memory corruption within an hour. I did not try to reduce the
testcase to a minimum.

Both were also observed using real production loads (that of course do
not have CPU-bound RT tasks).

> Were you able to confirm that this was because of the ctrl field being
> cleared late? This error can also be observed under stress when RX UBR
> is observed.

I observed that the issue occurred without this patch, and didn't occur
after applying this patch (individually), but I didn't check it further
than that. If you have anything you'd like me to test, let me know.

> I understand it makes sense to clear ctrl field before setting RX used bit.
> But I'm trying to understand if a dmb is necessary in the receive data path.

A barrier is needed as otherwise writes to the ctrl field and addr
fields may be freely reordered as they are just regular writes to
memory, as far as I've understood it, potentially undoing the effect of
changing the order.

If you mean the third patch, same issue with reads - they may be freely
reordered (by e.g. the compiler) as they are just regular reads.

But I don't claim to be an expert on these so please correct me if I'm
wrong :)

-- 
Anssi Hannula / Bitwise Oy

  reply	other threads:[~2018-12-03 10:32 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 18:21 [PATCH 0/3] net: macb: DMA race condition fixes Anssi Hannula
2018-11-30 18:21 ` [PATCH 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA Anssi Hannula
2018-12-03  4:44   ` Harini Katakam
2018-12-05 12:37   ` Claudiu.Beznea
2018-12-05 13:58     ` Anssi Hannula
2018-12-05 20:32   ` David Miller
2018-12-06 14:16     ` Claudiu.Beznea
2018-11-30 18:21 ` [PATCH 2/3] net: macb: fix dropped RX frames due to a race Anssi Hannula
2018-12-03  4:52   ` Harini Katakam
2018-12-03 10:31     ` Anssi Hannula [this message]
2018-12-03 10:36       ` Harini Katakam
2018-12-05 12:38   ` Claudiu.Beznea
2018-11-30 18:21 ` [PATCH 3/3] net: macb: add missing barriers when reading buffers Anssi Hannula
2018-12-05 12:37   ` Claudiu.Beznea
2018-12-05 14:00     ` Anssi Hannula
2018-12-06 14:14       ` Claudiu.Beznea
2018-12-07 12:00         ` Anssi Hannula
2018-12-10 10:34           ` Claudiu.Beznea
2018-12-11 13:21             ` Anssi Hannula
2018-12-12 10:58               ` Claudiu.Beznea
2018-12-12 11:27                 ` Anssi Hannula
2018-12-13 10:48                   ` Claudiu.Beznea
2018-12-12 10:59               ` [PATCH 3/3 v2] net: macb: add missing barriers when reading descriptors Anssi Hannula
2018-12-12 23:19                 ` David Miller
2018-12-03  8:26 ` [PATCH 0/3] net: macb: DMA race condition fixes Nicolas.Ferre
2018-12-03 23:56   ` David Miller
2018-12-05 20:35 ` David Miller
2018-12-07 12:04   ` Anssi Hannula
2018-12-10 10:58     ` Nicolas.Ferre
2018-12-10 11:32       ` Claudiu.Beznea
2018-12-10 11:34         ` Claudiu.Beznea

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f15d8b59-7e89-225b-9c52-52a61713cc05@bitwise.fi \
    --to=anssi.hannula@bitwise.fi \
    --cc=davem@davemloft.net \
    --cc=harinik@xilinx.com \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.ferre@microchip.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.