linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell King - ARM Linux <linux-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org>
To: Alexey Brodkin <Alexey.Brodkin-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
Cc: "linux-sh-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-sh-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
	<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	"sparclinux-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<sparclinux-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"deanbo422-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<deanbo422-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"hch-jcswGhMUV9g@public.gmane.org"
	<hch-jcswGhMUV9g@public.gmane.org>,
	"linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-c6x-dev-jPsnJVOj+W6hPH1hqNUYSQ@public.gmane.org"
	<linux-c6x-dev-jPsnJVOj+W6hPH1hqNUYSQ@public.gmane.org>,
	"linux-hexagon-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-hexagon-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-snps-arc-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-snps-arc-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	"linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw@public.gmane.org"
	<linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw@public.gmane.org>,
	"Vineet.Gupta1-HKixBCOQz3hWk0Htik3J/w@public.gmane.org"
	<Vineet.Gupta1-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>,
	"linux-m68k-cunTk1MwBs8S/qaLPR03pWD2FQJk+8+b@public.gmane.org"
	<linux-m68k-cunTk1MwBs8S/qaLPR03pWD2FQJk+8+b@public.gmane.org>,
	"openrisc-cunTk1MwBs9a3B2Vnqf2dGD2FQJk+8+b@public.gmane.org"
	<openrisc-cunTk1MwBs9a3B2Vnqf2dGD2FQJk+8+b@public.gmane.org>,
	"green.hu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<green.hu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	monstr@mons
Subject: Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
Date: Fri, 18 May 2018 22:33:10 +0100	[thread overview]
Message-ID: <20180518213309.GG17671@n2100.armlinux.org.uk> (raw)
In-Reply-To: <182840dedb4890a88c672b1c5d556920bf89a8fb.camel-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>

On Fri, May 18, 2018 at 07:57:34PM +0000, Alexey Brodkin wrote:
> Hi Russel,

That's Russell.

> On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
> > It's necessary.  Take a moment to think carefully about this:
> > 
> >         dma_map_single(, dir)
> > 
> >         dma_sync_single_for_cpu(, dir)
> > 
> >         dma_sync_single_for_device(, dir)
> > 
> >         dma_unmap_single(, dir)
> > 
> > In the case of a DMA-incoherent architecture, the operations done at each
> > stage depend on the direction argument:
> > 
> >         map             for_cpu         for_device      unmap
> > TO_DEV  writeback       none            writeback       none
> > TO_CPU  invalidate      invalidate*     invalidate      invalidate*
> > BIDIR   writeback       invalidate      writeback       invalidate
> > 
> > * - only necessary if the CPU speculatively prefetches.
> 
> I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even
> if CPU doesn't preferch - what if we reuse the same buffer for multiple
> reads from DMA device?

That's fine - for non-coherent DMA, the CPU caches will only end up
containing data for that memory if:
- the CPU speculatively fetches data from that memory, or
- the CPU explicitly touches that memory

> > The multiple invalidations for the TO_CPU case handles different
> > conditions that can result in data corruption, and for some CPUs, all
> > four are necessary.
> 
> I would agree that map()/unmap() a quite a special cases and so depending
> on direction we need to execute in them either for_cpu() or for_device()
> call-backs depending on direction.
> 
> As for invalidation in case of for_device(TO_CPU) I still don't see
> a rationale behind it. Would be interesting to see a real example where
> we benefit from this.

Yes, you could avoid that, but depending how you structure the
architecture implementation, it can turn out to be a corner case.
The above table is precisely how 32-bit ARM is implemented, because
the way we implement them is based on who owns the memory - the "map"
and "for_device" operations translate internally to a cpu-to-device
ownership transition of the buffer.  Similar for "unmap" and "to_cpu".
It basically avoids having to add additional functions at the lower
implementation levels.

> > Things get more interesting if the implementation behind the DMA API has
> > to copy data between the buffer supplied to the mapping and some DMA
> > accessible buffer:
> > 
> >         map             for_cpu         for_device      unmap
> > TO_DEV  copy to dma     none            copy to dma     none
> > TO_CPU  none            copy to cpu     none            copy to cpu
> > BIDIR   copy to dma     copy to cpu     copy to dma     copy to cpu
> > 
> > So, in both cases, the value of the direction argument defines what you
> > need to do in each call.
> 
> Interesting enough in your seond table (which describes more complicated
> case indeed) you set "none" for for_device(TO_CPU) which looks logical
> to me.
> 
> So IMHO that's what make sense:
> ---------------------------->8-----------------------------
>         map             for_cpu         for_device      unmap
> TO_DEV  writeback       none            writeback       none
> TO_CPU  none            invalidate      none            invalidate*
> BIDIR   writeback       invalidate      writeback       invalidate*
> ---------------------------->8-----------------------------

That doesn't make sense for the TO_CPU case.  If the caches contain
dirty cache lines, and you're DMAing from the device to the system
RAM, other system activity can cause the dirty cache lines to be
evicted (written back) to memory which the DMA has already overwritten.
The result is data corruption.  So, you really can't have "none" in
the "map" case there.

Given that, the "for_cpu" case becomes dependent on whether the CPU
speculatively prefetches.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux <linux@armlinux.org.uk>
To: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: "deanbo422@gmail.com" <deanbo422@gmail.com>,
	"linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"nios2-dev@lists.rocketboards.org"
	<nios2-dev@lists.rocketboards.org>,
	"linux-xtensa@linux-xtensa.org" <linux-xtensa@linux-xtensa.org>,
	"linux-m68k@lists.linux-m68k.org"
	<linux-m68k@lists.linux-m68k.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-hexagon@vger.kernel.org" <linux-hexagon@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>,
	"linux-alpha@vger.kernel.org" <linux-alpha@vger.kernel.org>,
	"linux-snps-arc@lists.infradead.org"
	<linux-snps-arc@lists.infradead.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"green.hu@gmail.com" <green.hu@gmail.com>,
	"Vineet.Gupta1@synopsys.com" <Vineet.Gupta1@synopsys.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"openrisc@lists.librecores.org" <openrisc@lists.librecores.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"monstr@monstr.eu" <monstr@monstr.eu>,
	"linux-parisc@vger.kernel.org" <linux-parisc@vger.kernel.org>,
	"linux-c6x-dev@linux-c6x.org" <linux-c6x-dev@linux-c6x.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>
Subject: Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
Date: Fri, 18 May 2018 22:33:10 +0100	[thread overview]
Message-ID: <20180518213309.GG17671@n2100.armlinux.org.uk> (raw)
Message-ID: <20180518213310.dCtFdGOu0gFb5OkOSG0gygspVsWU94r_NUP1Xsx83O0@z> (raw)
In-Reply-To: <182840dedb4890a88c672b1c5d556920bf89a8fb.camel@synopsys.com>

On Fri, May 18, 2018 at 07:57:34PM +0000, Alexey Brodkin wrote:
> Hi Russel,

That's Russell.

> On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
> > It's necessary.  Take a moment to think carefully about this:
> > 
> >         dma_map_single(, dir)
> > 
> >         dma_sync_single_for_cpu(, dir)
> > 
> >         dma_sync_single_for_device(, dir)
> > 
> >         dma_unmap_single(, dir)
> > 
> > In the case of a DMA-incoherent architecture, the operations done at each
> > stage depend on the direction argument:
> > 
> >         map             for_cpu         for_device      unmap
> > TO_DEV  writeback       none            writeback       none
> > TO_CPU  invalidate      invalidate*     invalidate      invalidate*
> > BIDIR   writeback       invalidate      writeback       invalidate
> > 
> > * - only necessary if the CPU speculatively prefetches.
> 
> I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even
> if CPU doesn't preferch - what if we reuse the same buffer for multiple
> reads from DMA device?

That's fine - for non-coherent DMA, the CPU caches will only end up
containing data for that memory if:
- the CPU speculatively fetches data from that memory, or
- the CPU explicitly touches that memory

> > The multiple invalidations for the TO_CPU case handles different
> > conditions that can result in data corruption, and for some CPUs, all
> > four are necessary.
> 
> I would agree that map()/unmap() a quite a special cases and so depending
> on direction we need to execute in them either for_cpu() or for_device()
> call-backs depending on direction.
> 
> As for invalidation in case of for_device(TO_CPU) I still don't see
> a rationale behind it. Would be interesting to see a real example where
> we benefit from this.

Yes, you could avoid that, but depending how you structure the
architecture implementation, it can turn out to be a corner case.
The above table is precisely how 32-bit ARM is implemented, because
the way we implement them is based on who owns the memory - the "map"
and "for_device" operations translate internally to a cpu-to-device
ownership transition of the buffer.  Similar for "unmap" and "to_cpu".
It basically avoids having to add additional functions at the lower
implementation levels.

> > Things get more interesting if the implementation behind the DMA API has
> > to copy data between the buffer supplied to the mapping and some DMA
> > accessible buffer:
> > 
> >         map             for_cpu         for_device      unmap
> > TO_DEV  copy to dma     none            copy to dma     none
> > TO_CPU  none            copy to cpu     none            copy to cpu
> > BIDIR   copy to dma     copy to cpu     copy to dma     copy to cpu
> > 
> > So, in both cases, the value of the direction argument defines what you
> > need to do in each call.
> 
> Interesting enough in your seond table (which describes more complicated
> case indeed) you set "none" for for_device(TO_CPU) which looks logical
> to me.
> 
> So IMHO that's what make sense:
> ---------------------------->8-----------------------------
>         map             for_cpu         for_device      unmap
> TO_DEV  writeback       none            writeback       none
> TO_CPU  none            invalidate      none            invalidate*
> BIDIR   writeback       invalidate      writeback       invalidate*
> ---------------------------->8-----------------------------

That doesn't make sense for the TO_CPU case.  If the caches contain
dirty cache lines, and you're DMAing from the device to the system
RAM, other system activity can cause the dirty cache lines to be
evicted (written back) to memory which the DMA has already overwritten.
The result is data corruption.  So, you really can't have "none" in
the "map" case there.

Given that, the "for_cpu" case becomes dependent on whether the CPU
speculatively prefetches.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

  parent reply	other threads:[~2018-05-18 21:33 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-11  7:59 common non-cache coherent direct dma mapping ops Christoph Hellwig
2018-05-11  7:59 ` Christoph Hellwig
2018-05-11  7:59 ` [PATCH 01/20] dma-mapping: simplify Kconfig dependencies Christoph Hellwig
2018-05-11  7:59   ` Christoph Hellwig
     [not found] ` <20180511075945.16548-1-hch-jcswGhMUV9g@public.gmane.org>
2018-05-11  7:59   ` [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-3-hch-jcswGhMUV9g@public.gmane.org>
2018-05-18 13:03       ` Alexey Brodkin
2018-05-18 13:03         ` Alexey Brodkin
     [not found]         ` <bad125dff49f6e49c895e818c9d1abb346a46e8e.camel-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2018-05-18 13:27           ` hch-jcswGhMUV9g
2018-05-18 13:27             ` hch
     [not found]             ` <20180518132731.GA31125-jcswGhMUV9g@public.gmane.org>
2018-05-18 14:13               ` Alexey Brodkin
2018-05-18 14:13                 ` Alexey Brodkin
2018-05-18 17:28               ` Vineet Gupta
2018-05-18 17:28                 ` Vineet Gupta
2018-05-18 17:20           ` dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation) Vineet Gupta
2018-05-18 17:20             ` Vineet Gupta
2018-05-18 17:50             ` Russell King - ARM Linux
2018-05-18 17:50               ` Russell King - ARM Linux
     [not found]               ` <20180518175004.GF17671-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2018-05-18 19:57                 ` Alexey Brodkin
2018-05-18 19:57                   ` Alexey Brodkin
     [not found]                   ` <182840dedb4890a88c672b1c5d556920bf89a8fb.camel-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>
2018-05-18 21:33                     ` Russell King - ARM Linux [this message]
2018-05-18 21:33                       ` Russell King - ARM Linux
2018-05-18 20:35                 ` Vineet Gupta
2018-05-18 20:35                   ` Vineet Gupta
2018-05-18 21:55                   ` Russell King - ARM Linux
2018-05-18 21:55                     ` Russell King - ARM Linux
2018-05-18 20:05         ` [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation Helge Deller
2018-05-18 20:05           ` Helge Deller
     [not found]           ` <0c5d27e9-2799-eb38-8b09-47a04c48b5c7-Mmb7MZpHnFY@public.gmane.org>
2018-05-19  6:38             ` hch-jcswGhMUV9g
2018-05-19  6:38               ` hch
2018-05-11  7:59   ` [PATCH 03/20] arc: use generic dma_noncoherent_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-4-hch-jcswGhMUV9g@public.gmane.org>
2018-05-11 12:44       ` Alexey Brodkin
2018-05-11 12:44         ` Alexey Brodkin
2018-05-11  7:59   ` [PATCH 04/20] arm-nommu: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-5-hch-jcswGhMUV9g@public.gmane.org>
2018-05-11  9:11       ` Russell King - ARM Linux
2018-05-11  9:11         ` Russell King - ARM Linux
     [not found]         ` <20180511091114.GA16141-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2018-05-22 11:53           ` Christoph Hellwig
2018-05-22 11:53             ` Christoph Hellwig
2018-05-11 13:56       ` John Garry
2018-05-11 13:56         ` John Garry
2018-05-11  7:59   ` [PATCH 05/20] c6x: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
     [not found]     ` <20180511075945.16548-6-hch-jcswGhMUV9g@public.gmane.org>
2018-05-15  0:25       ` [Linux-c6x-dev] " Mark Salter
2018-05-15  0:25         ` Mark Salter
2018-05-11  7:59   ` [PATCH 06/20] hexagon: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 07/20] m68k: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 08/20] microblaze: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 09/20] microblaze: remove the consistent_sync and consistent_sync_page Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 10/20] nds32: use generic dma_noncoherent_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 11/20] nios2: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 12/20] openrisc: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 13/20] sh: simplify get_arch_dma_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 14/20] sh: introduce a sh_cacheop_vaddr helper Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 15/20] sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 16/20] mm: split arch/sh/mm/consistent.c Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 17/20] sh: use generic dma_noncoherent_ops Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 18/20] xtensa: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 19/20] sparc: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-11  7:59   ` [PATCH 20/20] parisc: " Christoph Hellwig
2018-05-11  7:59     ` Christoph Hellwig
2018-05-13 13:26 ` common non-cache coherent direct dma mapping ops Helge Deller
2018-05-13 13:26   ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180518213309.GG17671@n2100.armlinux.org.uk \
    --to=linux-i+ivw8tiwo2tmtq+vha3yw@public.gmane.org \
    --cc=Alexey.Brodkin-HKixBCOQz3hWk0Htik3J/w@public.gmane.org \
    --cc=Vineet.Gupta1-HKixBCOQz3hWk0Htik3J/w@public.gmane.org \
    --cc=deanbo422-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=green.hu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-c6x-dev-jPsnJVOj+W6hPH1hqNUYSQ@public.gmane.org \
    --cc=linux-hexagon-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-m68k-cunTk1MwBs8S/qaLPR03pWD2FQJk+8+b@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=linux-sh-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-snps-arc-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw@public.gmane.org \
    --cc=monstr@mons \
    --cc=openrisc-cunTk1MwBs9a3B2Vnqf2dGD2FQJk+8+b@public.gmane.org \
    --cc=sparclinux-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).