All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Glass <sjg@chromium.org>
To: u-boot@lists.denx.de
Subject: [U-Boot] [PATCH v5 02/19] usb: dwc2: Use separate input and output buffers
Date: Mon, 3 Apr 2017 14:10:37 -0600	[thread overview]
Message-ID: <CAPnjgZ2Sa_Sawb1t_1fyz=+QYtF1WLxS9w5jOHbudOyQZmxDzg@mail.gmail.com> (raw)
In-Reply-To: <1839232.Ocurkfb8WW@sbruens-linux>

Hi Stefan,

On 3 April 2017 at 12:18, Brüns, Stefan <Stefan.Bruens@rwth-aachen.de> wrote:
> On Montag, 3. April 2017 17:38:40 CEST you wrote:
>> Hi Stefan,
>>
>> On 3 April 2017 at 08:26, Brüns, Stefan <Stefan.Bruens@rwth-aachen.de>
> wrote:
>> > On Montag, 3. April 2017 01:23:17 CEST you wrote:
>> >> Hi Stefan,
>> >>
>> >> On 2 April 2017 at 15:34, Stefan Bruens <stefan.bruens@rwth-aachen.de>
>> >
>> > wrote:
>> >> > On Sonntag, 2. April 2017 17:43:38 CEST Simon Glass wrote:
>> >> >> Hi Stefan,
>> >> >>
>> >> >> On 2 April 2017 at 07:10, Stefan Bruens <stefan.bruens@rwth-aachen.de>
>> >> >
>> >> > wrote:
>> >> >> > On Sonntag, 2. April 2017 05:01:41 CEST Marek Vasut wrote:
>> >> >> >> On 04/02/2017 01:40 AM, Simon Glass wrote:
>> >> >> >> > Hi Marek,
>> >> >> >> >
>> >> >> >> > On 1 April 2017 at 14:15, Marek Vasut <marex@denx.de> wrote:
>> >> >> >> >> On 04/01/2017 08:05 PM, Simon Glass wrote:
>> >> >> >> >>> On Raspberry Pi 2 and 3 a problem was noticed when enabling
>> >> >> >> >>> driver
>> >> >> >> >>> model
>> >> >> >> >>> for USB: the cache invalidate after an incoming transfer does
>> >> >> >> >>> not
>> >> >> >> >>> seem
>> >> >> >> >>> to
>> >> >> >> >>> work correctly.
>> >> >> >> >>>
>> >> >> >> >>> This may be a problem with the underlying caching
>> >> >> >> >>> implementation
>> >> >> >> >>> on
>> >> >> >> >>> armv7
>> >> >> >> >>> and armv8 but this seems very unlikely. As a work-around, use
>> >> >> >> >>> separate
>> >> >> >> >>> buffers for input and output. This ensures that the input
>> >> >> >> >>> buffer
>> >> >> >> >>> will
>> >> >> >> >>> not
>> >> >> >> >>> hold dirty cache data.
>> >> >> >> >>
>> >> >> >> >> What do you think of this patch:
>> >> >> >> >> [U-Boot] usb: dwc2: invalidate the dcache before starting the
>> >> >> >> >> DMA
>> >> >> >> >
>> >> >> >> > Yes that matches what I did as a hack. I didn't realise that the
>> >> >> >> > DMA
>> >> >> >> > would go through the cache. Thanks for the pointer.
>> >> >> >>
>> >> >> >> DMA should not go through the cache. I have yet to review that
>> >> >> >> patch,
>> >> >> >> but IMO it's relevant to this problem you observe.
>> >> >> >
>> >> >> > DMA transfers not going through the cache is probably the problem
>> >> >> > here:
>> >> >> >
>> >> >> > Assume we have the aligned_buffer at address 0xdead0000
>> >> >> >
>> >> >> > 1. The cpu writes to address 0xdead0002. This is fine, as it is the
>> >> >> > current
>> >> >> > owner of the address. The cacheline is marked dirty.
>> >> >> > 2. The cpu no longer needs the corresponding address range, and it
>> >> >> > is
>> >> >> > reallocated (i.e. freed and then allocated from dwc2) or reused
>> >> >> > (i.e.
>> >> >> > formerly out buffer, now in buffer).
>> >> >> > 3. The CPU starts the DMA transfer
>> >> >> > 4. The DMA transfer writes to e.g. 0xdead0000-0xdead0200 in memory.
>> >> >> > 5. The CPU fetches an address aliasing with 0xdead0000. The dirty
>> >> >> > cache
>> >> >> > line is evicted, and the 0xdead0000-0xdead0040 memory contents are
>> >> >> > overwritten.
>> >> >>
>> >> >> This is the part I don't understand. This should be an invalidate, not
>> >> >> a clean and invalidate, so there should be not memory write.
>> >> >>
>> >> >> Also if the CPU fetches from cached 0xdead0000 without an invalidate,
>> >> >> it will not cause a cash clean. It will simple read the data from the
>> >> >> cache and ignore what the DMA wrote.
>> >> >
>> >> > The CPU does not fetch 0xdead0000, but from an address *aliasing* with
>> >> > 0xdead000. As 0xdead0000 is *dirty* (we have neither flushed (clears
>> >> > dirty
>> >> > bit) or invalidated (implicitly clears dirty for the address)), the
>> >> > cache
>> >> > controller has to write out the 0xdead0000 cache line to memory.
>> >>
>> >> That doesn't make any sense to me. Can you explain it a bit more?
>> >>
>> >> If the CPU fetches from a cache-alias of 0xdead0000, say 0xa11a5000
>> >> then I expect the cache line would contain the data for that alias,
>> >> not for 0xdead0000.
>> >
>> > The important part is the dirty flag in the 0xdead0000 cacheline. By
>> > reading an aliasing address, you are causing eviction of the current
>> > cache line contents, and writing back its contents to memory. Reading of
>> > an address may cause write of a different address. You can see it as an
>> > dcache_flush_range done by the cache controller.
>>
>> OK so I think you are saying that reading from 0xa11a5000 causes dirty
>> data to be flushed from the cache to 0xdead0000. Makes perfect sense.
>> But why is there dirty data at 0xdead0000?
>>
>> - If we did a write last time, then it would have been dirty until we
>> flushed the cache line, which we did before telling the DMA to start
>> up.
>>
>> - If we did a read last time, then it is clean. We have read the data,
>> but not changed it.
>>
>> What am I missing?
>
> The following is a gross oversimplification, but might explain it:
>
> 1. Assume all of the 64kB of the aligned_buffer is dirty. (Likely, if the
> buffer is calloced.)
> 2. We are doing some transfers. All transfers are small, e.g. 64 byte.
> 3. In accordance with the two cases you mentioned above, the first 64 byte are
> no longer dirty, as the last out transfer has flushed the cacheline.
> 4. We are doing our first large in transfer (i.e. larger than 64 byte).
> 5. Bad Things (TM) may happen to any data at aligned_buffer[64] and beyond.
>
> If this holds, a single invalidate_dcache_range(aligned_buffer, aligned_buffer
> +65536,...) during the initialization of the controller would suffice.

OK I can test that. It makes sense.

>
>> > I think you are assuming a write-through cache here, which leads to your
>> > confusion.
>>
>> No that's a separate issue.
>>
>> >> So a later invalidate of 0xdead0000 should at most
>> >> clean the cache line and write to memory at 0xa11a5000. If it were to
>> >> write cached data intended for 0xa11a5000 to memory at 0xdead0000,
>> >> then surely this would be a bug?
>> >
>> > After the cache line for 0xdead0000 has been evicted, any flush/invalidate
>> > operations are noops for that address.
>>
>> OK good, so that's not the problem.
>>
>> >> Therefore I cannot see the situation where the CPU should write to
>> >> 0xdead0000 when that address is invalidated.
>> >
>> > It is not the invalidation which causes the write, but eviction from the
>> > cache.
>> >
>> >  > >> On armv8 we appear not to suppose invalidate in the code, so it
>> >  > >> makes
>> >> >>
>> >> >> sense for rpi_3.
>> >> >>
>> >> >> But for rpi_2 which seems to do a proper invalidate, I still don't see
>> >> >> the problem.
>> >> >
>> >> > Which part of the code is different between rpi2 and rpi3? The dwc2
>> >> > code
>> >> > is
>> >> > identical, is the memory invalidated in some other place?
>> >>
>> >> It is the invalidate_dcache_range() function.
>> >
>> > Thats for pointing that out, for anyone not having read the code:
>> >
>> > ARMv7 has different operations for flush_dcache_range and
>> > invalidate_dcache_range, the former does a writeback of dirty cache lines
>> > and sets the invalid tags for the corresponding cache lines, the latter
>> > only does the invalidate part.
>> >
>> > ARMv8 does a writeback for both operations. I assume thats what you call
>> > "improper".
>>
>> Yes, I believe it is wrong. Linux has a proper invalidate, why not U-Boot?
>
> For why it does not exist for ARMv8 U-Boot, I can not answer.
>
> The fact invalidate actually is a flush on ARMv8 makes the problem much more
> likely to happen. If the buffer, which is a member of dwc2_priv, is memset
> during the initialization it *will* be dirty. Any in transfer which is larger
> than any previous in or out transfer will cause a flush of the tail, up to
> xfer_len, of the transfer buffer.
>
> On ARMv7 the problem will be only apparent if a cache eviction happens during
> the DMA.

OK now it makes sense. With a write-back cache the window of
opportunity for this is much larger. For me, this explains why we need
to invalidate before issuing the read to the DMA engine.

>
>> > The important part is, calling flush multiple times in a row is *exactly*
>> > the same thing as calling flush once. Calling flush instead of invalidate
>> > is the same thing *if* the dirty flag is not set, as the writeback part
>> > is skipped in that case.
>>
>> Yes indeed.
>>
>> >> >> > Obviously, the dirty cache line from (1.) has to be cleared at the
>> >> >> > beginning of (3.), as Eddys patch does.
>> >> >>
>> >> >> But I still don't understand why we have to clean instead of just
>> >> >> invalidate?
>> >> >
>> >> > The patch by Eddie Cai just does an invalidate_dcache_range on the
>> >> > transfer
>> >> > buffer, nothing else. Where do you see a "clean" (whatever that refers
>> >> > to)?
>> >>
>> >> In the armv8  invalidate_dcache_range() function.
>> >
>> > The writeback does not happen, as the cacheline is not dirty. It should
>> > not
>> > even be in the cache after invalidate has been called once.
>> >
>> > We have to make sure the buffers address range is not in the cache prior
>> > to
>> > starting the DMA. We can either use invalidate_dcache_range or
>> > flush_dcache_range to guarantee this. Which one we use does not matter
>> > here, although invalidate only is typically a little bit more
>> > lightweight.
>> Yes. Just to restate my assertion. It should be possible to:
>>
>> - have some dirty data in the cache
>> - start up DMA
>> - invalidate that data
>> - read it
>>
>> in that order. It should not be necessary to move the invalidate to
>> before the DMA start-up, right?
>
> Unfortunately that won't be enough. *If* there is some dirty data in the
> cache, it can be written back to main memory *any time*. If you are lucky, the
> writeback happens before the memory location is written by the DMA controller,
> but there is no guarantee until you flush and/or invalidate (either suffices).

OK this was the missing piece for me. And it makes sense since we
probably do memset() the data at startup (e.g. malloc() does this).
Also the problem only affects the first few blocks.

I'll do some more testing at some point now that I understand the issue.

Regards,
Simon

  parent reply	other threads:[~2017-04-03 20:10 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-01 18:05 [U-Boot] [PATCH v5 00/19] arm: rpi: Enable USB, Ethernet, MMC, Video driver model on Raspberry Pi Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 01/19] net: smsc95xx: Correct free_pkt() implementation Simon Glass
2017-04-02 17:46   ` Joe Hershberger
2017-04-01 18:05 ` [U-Boot] [PATCH v5 02/19] usb: dwc2: Use separate input and output buffers Simon Glass
2017-04-01 20:15   ` Marek Vasut
2017-04-01 23:40     ` Simon Glass
2017-04-02  3:01       ` Marek Vasut
2017-04-02 13:10         ` Stefan Bruens
2017-04-02 15:43           ` Simon Glass
2017-04-02 21:34             ` Stefan Bruens
2017-04-02 23:23               ` Simon Glass
2017-04-03 14:26                 ` Brüns, Stefan
2017-04-03 15:38                   ` Simon Glass
2017-04-03 18:18                     ` Brüns, Stefan
2017-04-03 18:24                       ` Brüns, Stefan
2017-04-03 20:10                         ` Simon Glass
2017-04-03 20:10                       ` Simon Glass [this message]
2017-04-01 18:05 ` [U-Boot] [PATCH v5 03/19] dm: mmc: Set up the MMC device when controller is probed Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 04/19] dm: video: Correct line clearing code Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 05/19] string: Use memcpy() within memmove() when we can Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 06/19] arm: rpi: Drop the GPIO device addresses Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 07/19] arm: rpi: Drop CONFIG_CONS_INDEX Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 08/19] dm: arm: rpi: Move to driver model for USB Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 09/19] dm: arm: rpi: Use driver model for Ethernet Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 10/19] arm: rpi: Add a file to handle messages Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 11/19] arm: rpi: Add a function to obtain the MMC clock Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 12/19] dm: mmc: rpi: Convert Raspberry Pi to driver model for MMC Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 13/19] dm: arm: rpi: Drop CONFIG_OF_EMBED Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 14/19] video: arm: rpi: Move the video query out of the driver Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 15/19] video: arm: rpi: Move the video settings " Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 16/19] dm: video: Refactor lcd_simplefb to prepare for driver model Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 17/19] dm: video: Add driver-model support to lcd_simplefb Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 18/19] dm: video: arm: rpi: Convert to use driver model for video Simon Glass
2017-04-01 18:05 ` [U-Boot] [PATCH v5 19/19] arm: rpi: Add a TODO to move all messages into the msg handler Simon Glass
2017-04-01 19:05 ` [U-Boot] [PATCH v5 00/19] arm: rpi: Enable USB, Ethernet, MMC, Video driver model on Raspberry Pi Stefan Bruens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPnjgZ2Sa_Sawb1t_1fyz=+QYtF1WLxS9w5jOHbudOyQZmxDzg@mail.gmail.com' \
    --to=sjg@chromium.org \
    --cc=u-boot@lists.denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.