All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anssi Hannula <anssi.hannula@bitwise.fi>
To: Claudiu.Beznea@microchip.com
Cc: Nicolas.Ferre@microchip.com, davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH 3/3] net: macb: add missing barriers when reading buffers
Date: Wed, 12 Dec 2018 13:27:14 +0200	[thread overview]
Message-ID: <70b55891-1bc4-be1a-1e45-09f073d3e021@bitwise.fi> (raw)
In-Reply-To: <0212fb04-8c2d-264c-a459-6d579a56507e@microchip.com>

On 12.12.2018 12:58, Claudiu.Beznea@microchip.com wrote:
>
> On 11.12.2018 15:21, Anssi Hannula wrote:
>> On 10.12.2018 12:34, Claudiu.Beznea@microchip.com wrote:
>>> On 07.12.2018 14:00, Anssi Hannula wrote:
>>>> On 6.12.2018 16:14, Claudiu.Beznea@microchip.com wrote:
>>>>> Hi Anssi,
>>>> Hi!
>>>>
>>>>> On 05.12.2018 16:00, Anssi Hannula wrote:
>>>>>> On 5.12.2018 14:37, Claudiu.Beznea@microchip.com wrote:
>>>>>>> On 30.11.2018 20:21, Anssi Hannula wrote:
>>>>>>>> When reading buffer descriptors on RX or on TX completion, an
>>>>>>>> RX_USED/TX_USED bit is checked first to ensure that the descriptor has
>>>>>>>> been populated. However, there are no memory barriers to ensure that the
>>>>>>>> data protected by the RX_USED/TX_USED bit is up-to-date with respect to
>>>>>>>> that bit.
>>>>>>>>
>>>>>>>> Fix that by adding DMA read memory barriers on those paths.
>>>>>>>>
>>>>>>>> I did not observe any actual issues caused by these being missing,
>>>>>>>> though.
>>>>>>>>
>>>>>>>> Tested on a ZynqMP based system.
>>>>>>>>
>>>>>>>> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
>>>>>>>> Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
>>>>>>>> Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
>>>>>>>> ---
>>>>>>>>  drivers/net/ethernet/cadence/macb_main.c | 20 ++++++++++++++++----
>>>>>>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>>>>>>>> index 430b7a0f5436..c93baa8621d5 100644
>>>>>>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>>>>>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>>>>>>> @@ -861,6 +861,11 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>>>>>>>>  
>>>>>>>>  			/* First, update TX stats if needed */
>>>>>>>>  			if (skb) {
>>>>>>>> +				/* Ensure all of desc is at least as up-to-date
>>>>>>>> +				 * as ctrl (TX_USED bit)
>>>>>>>> +				 */
>>>>>>>> +				dma_rmb();
>>>>>>>> +
>>>>>>> Is this necessary? Wouldn't previous rmb() take care of this? At this time
>>>>>>> data specific to this descriptor was completed. The TX descriptors for next
>>>>>>> data to be send is updated under a locked spinlock.
>>>>>> The previous rmb() is before the TX_USED check, so my understanding is
>>>>>> that the following could happen in theory:
>>>>> We are using this IP on and ARM architecture, so, with regards to rmb(), I
>>>>> understand from [1] that dsb completes when:
>>>>> "All explicit memory accesses before this instruction complete.
>>>>> All Cache, Branch predictor and TLB maintenance operations before this
>>>>> instruction complete."
>>>>>
>>>>>> 1. rmb().
>>>>> According to [1] this should end after all previous instructions (loads,
>>>>> stores) ends.
>>>>>
>>>>>> 2. Reads are reordered so that TX timestamp is read first - no barriers
>>>>>> are crossed.
>>>>> But, as per [1], no onward instruction will be reached until all
>>>>> instruction prior to dsb ends, so, after rmb() all descriptor's members
>>>>> should be updated, right?
>>>> The descriptor that triggered the TX interrupt should be visible now, yes.
>>>> However, the controller may be writing to any other descriptors at the
>>>> same time as the loop is processing through them, as there are multiple
>>>> TX buffers.
>>> Yes, but there is the "rmb()" after "desc = macb_tx_desc(queue, tail);" at
>>> the beginning of loop intended for that... In the 2nd loop you are
>>> operating with the same descriptor which was read in the first loop.
>> I'm concerned about the 2nd iteration of the first for loop in
>> macb_tx_interrupt().
>> That operates on a different descriptor due to tail++, and that
>> different descriptor may have in-flight writes from controller as the
>> interrupt may or may not already be raised for it.
> I agree with that, there may be in-flights updates, but since ownership bit
> should be updated at the end of the TX (I expect at this moment to be
> updated all the descriptor's parts, including timestamps), if you read a
> descriptor and it hasn't been treated by the hardware the descriptor is own
> by the hardware and there is this check after ctrl is read:
>
> if (!(ctrl & MACB_BIT(TX_USED)))
> 	break;
>
> Next time a new tx interrupt will be raised this descriptor would be also
> treated.
>
> In case the above instruction doesn't break then it means the descriptor
> was transmitted. And since ownership should be updated at the end of the
> TX, this means no in-flights updates should be done on this descriptor, so,
> when you will read the timestamps these should be the good ones (even it is
> the first one, the 3rd, and so on).

Agreed with above, the only problem is if the ts1/ts2 load is reordered
before ctrl load.

> Moreover, this new barrier barrier is in the 2nd loop, you're placing a
> memory barrier after every step in the 2nd loop for the same descriptor
> read in the 1st loop (it was already loaded in a processor register).

The barrier is inside "if (skb)", and skb is only set for the last frame
as indicated by the code at the end of the loop:

            /* skb is set only for the last buffer of the frame.
             * WARNING: at this point skb has been freed by
             * macb_tx_unmap().
             */
            if (skb)
                break;


> To avoid reordering of ctrl and ts1/ts2 loads wouldn't be enough to have
> memory barrier after:
> ctrl = desc->ctrl;

Yes, it would.
My v2 moved it into the other direction, into the ts code, though, which
has the added benefit of no barrier when timestamps are disabled (which
IIRC is the default).
But I can move it to just after ctrl if you so prefer.

>> Or am I missing something?
>>
>>>>>> 3. HW writes timestamp and sets TX_USED (or they become visible).
>>>>> I expect hardware to set TX_USED and timestamp before raising TX complete
>>>>> interrupt. If so, there should be no on-flight updates of this descriptor,
>>>>> right? Hardware raised a TX_USED bit read interrupt when it reads a
>>>>> descriptor like this and hangs TX.
>>>> For the first iteration of the loop, that is correct - there should be
>>>> no in-flight writes from controller as it already raised the interrupt.
>>>> However, the following iterations of the loop process descriptors that
>>>> may or may not have the interrupt raised yet, and therefore may still
>>>> have in-flight writes.
>>> I expect the hardware to give up ownership of the descriptor just at the
>>> end of TX operation, so that the word containing TX_USED to be the last one
>>> updated. If you reads the descriptor and TX_USED is not there the first
>>> loop breaks, queue->tx_tail is updated with the last processed descriptor
>>> in the queue (see "queue->tx_tail = tail;") then the next interrupt will
>>> start processing with this last one descriptor.
>> Agreed, that is how it works. Issues only occur if we somehow operate on
>> data before ownership was transferred.
> Could this happens? Shouldn't this:
> 	if (!(ctrl & MACB_BIT(TX_USED)))
> 		break;
>
> assure that this could not happen and the fact that the ownership is passed
> to CPU at the end of descriptor update?

I don't see how that prevents desc->ctrl and desc->ts_1/ts_2 reordering,
which is the only issue that I can see.
By "data" above I meant ts_1/ts_2, AFAICS there is nothing else we read
from the descriptors aside from ctrl itself.

> If load reordering b/w desc->ctrl and desc->ts_1/ts_2 wouldn't be enough to
> have a barrier after
> ctrl = desc->ctrl instruction?

Yes, per above.

>>>>>> 4. Code checks TX_USED.
>>>>>> 5. Code operates on timestamp that is actually garbage.
>>>>>>
>>>>>> I'm not 100% sure that there isn't some lighter/cleaner way to do this
>>>>>> than dma_rmb(), though.
>>>>> If you still think this scenario could happen why not calling a dsb in
>>>>> gem_ptp_do_timestamp(). I feel like that is a proper place to call it.
>>>> OK, I will move it there. Unless we arrive at a conclusion that it is
>>>> unnecessary altogether, of course :)
>>>>
>>>>> Moreover, there is bit 32 of desc->ctrl which tells you if a valid
>>>>> timestamp was placed in the descriptor. But, again, I expect the timestamp
>>>>> and TX_USED to be set by hardware before raising TX complete interrupt.
>>>> Yes, but since my concern is that without barriers in between,
>>>> desc->ctrl might be read after ts_1/ts_2, so that bit might be seen as
>>>> set even though ts_1 is not yet an actual timestamp. And per above, all
>>>> this may occur before the TX complete interrupt is raised for the
>>>> descriptor in question.
>>> If so, why not placing the barrier when reading timestamps? From my point
>>> of view the place of "dma_rmb()" in this patch doesn't guarantees that
>>> ts1/ts2 were read after the execution of barrier (correct me if I'm wrong).
>> If the timestamp ts1/ts2 reads are done after dma_rmb(), and dev->ctrl
>> is read before the barrier, my understanding is that they are guaranteed
>> to be ordered so that dev->ctrl is read before ts1/ts2.
>>
>> Per [1], "[DMB] ensures that all explicit memory accesses that appear in
>> program order before the |DMB| instruction are observed before any
>> explicit memory accesses that appear in program order after the |DMB|
>> instruction".
>> And the compiler barrier ("memory" asm clobber) included within
>> dma_rmb() ensures that in program order, ctrl is loaded before the
>> barrier, and ts1/ts2 after it.
>>
>> But as mentioned, it makes sense to have it closer to the ts1/ts2 reads
>> so I'll make that change.
>>
>> [1]
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html
>>
>>
>>>> I agree that this TX case seems somewhat unlikely to be triggered in
>>>> real life at least at present, though, as the ts_1/ts_2 read is so far
>>>> after the desc->ctrl read, and behind function calls, so unlikely to be
>>>> reordered before desc->ctrl read.
>>>> But the similar RX cases below seem more problematic as the racy loads
>>>> in question are right after each other in code.
>>>>
[...]

-- 
Anssi Hannula / Bitwise Oy
+358 503803997

  reply	other threads:[~2018-12-12 11:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-30 18:21 [PATCH 0/3] net: macb: DMA race condition fixes Anssi Hannula
2018-11-30 18:21 ` [PATCH 1/3] net: macb: fix random memory corruption on RX with 64-bit DMA Anssi Hannula
2018-12-03  4:44   ` Harini Katakam
2018-12-05 12:37   ` Claudiu.Beznea
2018-12-05 13:58     ` Anssi Hannula
2018-12-05 20:32   ` David Miller
2018-12-06 14:16     ` Claudiu.Beznea
2018-11-30 18:21 ` [PATCH 2/3] net: macb: fix dropped RX frames due to a race Anssi Hannula
2018-12-03  4:52   ` Harini Katakam
2018-12-03 10:31     ` Anssi Hannula
2018-12-03 10:36       ` Harini Katakam
2018-12-05 12:38   ` Claudiu.Beznea
2018-11-30 18:21 ` [PATCH 3/3] net: macb: add missing barriers when reading buffers Anssi Hannula
2018-12-05 12:37   ` Claudiu.Beznea
2018-12-05 14:00     ` Anssi Hannula
2018-12-06 14:14       ` Claudiu.Beznea
2018-12-07 12:00         ` Anssi Hannula
2018-12-10 10:34           ` Claudiu.Beznea
2018-12-11 13:21             ` Anssi Hannula
2018-12-12 10:58               ` Claudiu.Beznea
2018-12-12 11:27                 ` Anssi Hannula [this message]
2018-12-13 10:48                   ` Claudiu.Beznea
2018-12-12 10:59               ` [PATCH 3/3 v2] net: macb: add missing barriers when reading descriptors Anssi Hannula
2018-12-12 23:19                 ` David Miller
2018-12-03  8:26 ` [PATCH 0/3] net: macb: DMA race condition fixes Nicolas.Ferre
2018-12-03 23:56   ` David Miller
2018-12-05 20:35 ` David Miller
2018-12-07 12:04   ` Anssi Hannula
2018-12-10 10:58     ` Nicolas.Ferre
2018-12-10 11:32       ` Claudiu.Beznea
2018-12-10 11:34         ` Claudiu.Beznea

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70b55891-1bc4-be1a-1e45-09f073d3e021@bitwise.fi \
    --to=anssi.hannula@bitwise.fi \
    --cc=Claudiu.Beznea@microchip.com \
    --cc=Nicolas.Ferre@microchip.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.