All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>,
	linaro-mm-sig@lists.linaro.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linux-kernel@vger.kernel.org,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [Linaro-mm-sig] [RFC] ARM DMA mapping TODO, v1
Date: Thu, 28 Apr 2011 15:37:00 -0400	[thread overview]
Message-ID: <BANLkTinhm7ar1mf1D-dSMiLtw5hRNY36RA@mail.gmail.com> (raw)
In-Reply-To: <20110428143440.GP17290@n2100.arm.linux.org.uk>

On Thu, Apr 28, 2011 at 10:34 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Apr 28, 2011 at 04:29:52PM +0200, Arnd Bergmann wrote:
>> Given that people still want to have an interface that does what I
>> though this one did, I guess we have two options:
>>
>> * Kill off dma_cache_sync and replace it with calls to dma_sync_*
>>   so we can start using dma_alloc_noncoherent on ARM
>
> I don't think this is an option as dma_sync_*() is part of the streaming
> DMA mapping API (dma_map_*) which participates in the idea of buffer
> ownership, which the noncoherent API doesn't appear to.

Sorry to jump in like that, but to me it seems that this whole
discussion is going toward having the decision of cache attribute
inside dma_* function and that a driver asking for uncached memory
might get cached memory if IOMMU or others component allows to have
cache coherency.

As Jesse pointed out already, for performance reasons it's lot better
if you let the driver decide even if you have an iommu capable of
handling coherency for you. My understanding is that each time
coherency is asked for it trigger bus activities of some kind (i think
snoop is the term used for pci) this traffic can slow down both the
cpu and the device. For graphic driver we have a lot of write once and
use (once or more) buffer and it makes a lot of sense to have those
buffer allocated using uncached memory so we can tell the device (in
case of drm driver) that there is no need to trigger snoop activities
for coherency. So i believe the decision should ultimately be in the
driver side.

Jesse also pointed out space exhaustion inside the iommu and i believe
this should also be considered. This is why i believe the dma_* api is
not well suited. In DRM/TTM we use  pci_dma_mapping* and we also play
with with page set_page*_uc|wc|wb.

So i believe a better API might look like :

- struct dma_alloc_unit {
bool contiguous;
uint dmamask;
}
struct dma_buffer {
dma_unit
}
CONTIGUOUS tell that this dma unit needs contiguous allocation or not,
if it needs contiguous allocation and there is an iommu then the
allocator might allocate non contiguous pages/memory and latter
properly program the iommu to make things look contiguous to the
device.
if contiguous==false then allocator might allocate one page at a time
but should rather to allocate a bunch of contiguous page to allow
optimization for minimizing  tlb miss if the device allow such things
(maybe adding a flag here might make sense)
-dma_buffer dma_alloc_(uc|wc|wb)(dma_alloc_unit, size) : alloc memory
according to constraint defined by dma_alloc_unit
-dma_buffer_update(dma_buffer, offset, size) allow dmabounce&swiotlb
to know what needs to be updated
-dma_bus_map(dma_buffer) map the buffer on to the bus in case of
dmabounce that would mean copy to the bounce buffer, for iommu that
would mean bind it, and in case of no iommu well do nothings
-dma_bus_unmap(dma_buffer) implementation might not necessarily unmap
the buffer if there is plenty of room in the iommu

So usage would look like :
mydma_buffer =  dma_alloc_uc(N);
cpuptr=dma_cpu_ptr(mydma_buffer)
//write to the buffer
// tell dma which data need to be updated depending on platform
iommu,dmabounce cache flushing ...
dma_buffer_update(mydma_buffer, offset, size)
dma_bus_map(mydma_buffer)
// let the device use the buffer
...
// the buffer isn't use anymore by the device
dma_bus_unmap(mydma_buffer)

It hides things like iommu or dmabounce from the device driver but
still allow the device driver to ask for the most optimal way. A
platform decide to not support dma_alloc_uc|wc (ie non coherent) if it
has an iommu that can handle coherency or some others way to handle it
like flushing. But if platform wants better performance it should try
to provide non coherent allocation (through highmem or changing kernel
mapping properties ...).

Maybe i am completely missing the point.

Cheers,
Jerome

WARNING: multiple messages have this Message-ID (diff)
From: j.glisse@gmail.com (Jerome Glisse)
To: linux-arm-kernel@lists.infradead.org
Subject: [Linaro-mm-sig] [RFC] ARM DMA mapping TODO, v1
Date: Thu, 28 Apr 2011 15:37:00 -0400	[thread overview]
Message-ID: <BANLkTinhm7ar1mf1D-dSMiLtw5hRNY36RA@mail.gmail.com> (raw)
In-Reply-To: <20110428143440.GP17290@n2100.arm.linux.org.uk>

On Thu, Apr 28, 2011 at 10:34 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Apr 28, 2011 at 04:29:52PM +0200, Arnd Bergmann wrote:
>> Given that people still want to have an interface that does what I
>> though this one did, I guess we have two options:
>>
>> * Kill off dma_cache_sync and replace it with calls to dma_sync_*
>> ? so we can start using dma_alloc_noncoherent on ARM
>
> I don't think this is an option as dma_sync_*() is part of the streaming
> DMA mapping API (dma_map_*) which participates in the idea of buffer
> ownership, which the noncoherent API doesn't appear to.

Sorry to jump in like that, but to me it seems that this whole
discussion is going toward having the decision of cache attribute
inside dma_* function and that a driver asking for uncached memory
might get cached memory if IOMMU or others component allows to have
cache coherency.

As Jesse pointed out already, for performance reasons it's lot better
if you let the driver decide even if you have an iommu capable of
handling coherency for you. My understanding is that each time
coherency is asked for it trigger bus activities of some kind (i think
snoop is the term used for pci) this traffic can slow down both the
cpu and the device. For graphic driver we have a lot of write once and
use (once or more) buffer and it makes a lot of sense to have those
buffer allocated using uncached memory so we can tell the device (in
case of drm driver) that there is no need to trigger snoop activities
for coherency. So i believe the decision should ultimately be in the
driver side.

Jesse also pointed out space exhaustion inside the iommu and i believe
this should also be considered. This is why i believe the dma_* api is
not well suited. In DRM/TTM we use  pci_dma_mapping* and we also play
with with page set_page*_uc|wc|wb.

So i believe a better API might look like :

- struct dma_alloc_unit {
bool contiguous;
uint dmamask;
}
struct dma_buffer {
dma_unit
}
CONTIGUOUS tell that this dma unit needs contiguous allocation or not,
if it needs contiguous allocation and there is an iommu then the
allocator might allocate non contiguous pages/memory and latter
properly program the iommu to make things look contiguous to the
device.
if contiguous==false then allocator might allocate one page at a time
but should rather to allocate a bunch of contiguous page to allow
optimization for minimizing  tlb miss if the device allow such things
(maybe adding a flag here might make sense)
-dma_buffer dma_alloc_(uc|wc|wb)(dma_alloc_unit, size) : alloc memory
according to constraint defined by dma_alloc_unit
-dma_buffer_update(dma_buffer, offset, size) allow dmabounce&swiotlb
to know what needs to be updated
-dma_bus_map(dma_buffer) map the buffer on to the bus in case of
dmabounce that would mean copy to the bounce buffer, for iommu that
would mean bind it, and in case of no iommu well do nothings
-dma_bus_unmap(dma_buffer) implementation might not necessarily unmap
the buffer if there is plenty of room in the iommu

So usage would look like :
mydma_buffer =  dma_alloc_uc(N);
cpuptr=dma_cpu_ptr(mydma_buffer)
//write to the buffer
// tell dma which data need to be updated depending on platform
iommu,dmabounce cache flushing ...
dma_buffer_update(mydma_buffer, offset, size)
dma_bus_map(mydma_buffer)
// let the device use the buffer
...
// the buffer isn't use anymore by the device
dma_bus_unmap(mydma_buffer)

It hides things like iommu or dmabounce from the device driver but
still allow the device driver to ask for the most optimal way. A
platform decide to not support dma_alloc_uc|wc (ie non coherent) if it
has an iommu that can handle coherency or some others way to handle it
like flushing. But if platform wants better performance it should try
to provide non coherent allocation (through highmem or changing kernel
mapping properties ...).

Maybe i am completely missing the point.

Cheers,
Jerome

  parent reply	other threads:[~2011-04-28 19:37 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-21 19:29 [RFC] ARM DMA mapping TODO, v1 Arnd Bergmann
2011-04-21 19:29 ` Arnd Bergmann
2011-04-21 20:09 ` [Linaro-mm-sig] " Jesse Barnes
2011-04-21 20:09   ` Jesse Barnes
2011-04-21 21:52   ` Zach Pfeffer
2011-04-21 21:52     ` Zach Pfeffer
2011-04-22  0:34     ` KyongHo Cho
2011-04-22  0:34       ` KyongHo Cho
2011-04-26 14:29       ` Arnd Bergmann
2011-04-26 14:29         ` Arnd Bergmann
2011-04-26 14:28     ` Arnd Bergmann
2011-04-26 14:28       ` Arnd Bergmann
2011-04-26 14:26   ` Arnd Bergmann
2011-04-26 14:26     ` Arnd Bergmann
2011-04-26 15:39     ` Jesse Barnes
2011-04-26 15:39       ` Jesse Barnes
2011-04-27  7:35 ` Russell King - ARM Linux
2011-04-27  7:35   ` Russell King - ARM Linux
2011-04-27  8:56   ` Arnd Bergmann
2011-04-27  8:56     ` Arnd Bergmann
2011-04-27  9:09     ` Russell King - ARM Linux
2011-04-27  9:09       ` Russell King - ARM Linux
2011-04-27 11:02       ` Arnd Bergmann
2011-04-27 11:02         ` Arnd Bergmann
2011-04-27 16:16         ` [Linaro-mm-sig] " Alex Deucher
2011-04-27 16:16           ` Alex Deucher
2011-04-27 17:44           ` Anca Emanuel
2011-04-27 17:44             ` Anca Emanuel
2011-04-27 20:27             ` Russell King - ARM Linux
2011-04-27 20:27               ` Russell King - ARM Linux
2011-04-27 20:16         ` Russell King - ARM Linux
2011-04-27 20:16           ` Russell King - ARM Linux
2011-04-27 20:21           ` Arnd Bergmann
2011-04-27 20:21             ` Arnd Bergmann
2011-04-27 20:26             ` Russell King - ARM Linux
2011-04-27 20:26               ` Russell King - ARM Linux
2011-04-27 20:48               ` Arnd Bergmann
2011-04-27 20:48                 ` Arnd Bergmann
2011-04-27 21:41               ` Benjamin Herrenschmidt
2011-04-27 21:41                 ` Benjamin Herrenschmidt
2011-04-28  9:30                 ` Russell King - ARM Linux
2011-04-28  9:30                   ` Russell King - ARM Linux
2011-04-28 21:07                   ` Benjamin Herrenschmidt
2011-04-28 21:07                     ` Benjamin Herrenschmidt
2011-04-29 11:26                     ` Arnd Bergmann
2011-04-29 11:26                       ` Arnd Bergmann
2011-04-29 11:47                       ` Benjamin Herrenschmidt
2011-04-29 11:47                         ` Benjamin Herrenschmidt
2011-04-29 11:56                       ` Alan Cox
2011-04-29 11:56                         ` Alan Cox
2011-04-29 22:51                         ` Benjamin Herrenschmidt
2011-04-29 22:51                           ` Benjamin Herrenschmidt
2011-04-29 12:06                       ` [Linaro-mm-sig] " Thomas Hellstrom
2011-04-29 12:06                         ` Thomas Hellstrom
2011-04-29 13:34                         ` Jerome Glisse
2011-04-29 13:34                           ` Jerome Glisse
2011-04-29 22:55                           ` Benjamin Herrenschmidt
2011-04-29 22:55                             ` Benjamin Herrenschmidt
2011-04-29 22:53                         ` Benjamin Herrenschmidt
2011-04-29 22:53                           ` Benjamin Herrenschmidt
2011-04-27 10:51     ` Marek Szyprowski
2011-04-27 10:51       ` Marek Szyprowski
2011-04-27 21:37   ` Benjamin Herrenschmidt
2011-04-27 21:37     ` Benjamin Herrenschmidt
2011-04-28  6:40     ` [Linaro-mm-sig] " Arnd Bergmann
2011-04-28  6:40       ` Arnd Bergmann
2011-04-28  6:46       ` FUJITA Tomonori
2011-04-28  6:46         ` FUJITA Tomonori
2011-04-28  9:37     ` Russell King - ARM Linux
2011-04-28  9:37       ` Russell King - ARM Linux
2011-04-28 10:32       ` [Linaro-mm-sig] " Marek Szyprowski
2011-04-28 10:32         ` Marek Szyprowski
2011-04-28 10:51         ` Russell King - ARM Linux
2011-04-28 10:51           ` Russell King - ARM Linux
2011-04-28 12:28           ` Arnd Bergmann
2011-04-28 12:28             ` Arnd Bergmann
2011-04-28 13:15             ` Russell King - ARM Linux
2011-04-28 13:15               ` Russell King - ARM Linux
2011-04-28 14:29               ` Arnd Bergmann
2011-04-28 14:29                 ` Arnd Bergmann
2011-04-28 14:34                 ` Russell King - ARM Linux
2011-04-28 14:34                   ` Russell King - ARM Linux
2011-04-28 14:39                   ` Arnd Bergmann
2011-04-28 14:39                     ` Arnd Bergmann
2011-04-28 14:58                     ` Russell King - ARM Linux
2011-04-28 14:58                       ` Russell King - ARM Linux
2011-04-28 19:37                   ` Jerome Glisse [this message]
2011-04-28 19:37                     ` Jerome Glisse
2011-04-29  0:29                     ` Benjamin Herrenschmidt
2011-04-29  0:29                       ` Benjamin Herrenschmidt
2011-04-29  5:50                       ` Thomas Hellstrom
2011-04-29  5:50                         ` Thomas Hellstrom
2011-04-29  7:35                         ` Benjamin Herrenschmidt
2011-04-29  7:35                           ` Benjamin Herrenschmidt
2011-04-29 10:55                           ` Thomas Hellstrom
2011-04-29 10:55                             ` Thomas Hellstrom
2011-04-29 22:50                             ` Benjamin Herrenschmidt
2011-04-29 22:50                               ` Benjamin Herrenschmidt
2011-04-29 16:27                           ` Jesse Barnes
2011-04-29 16:27                             ` Jesse Barnes
2011-04-29 22:46                             ` Benjamin Herrenschmidt
2011-04-29 22:46                               ` Benjamin Herrenschmidt
2011-04-30  2:45                               ` Jesse Barnes
2011-04-30  2:45                                 ` Jesse Barnes
2011-04-29  7:59                         ` Russell King - ARM Linux
2011-04-29  7:59                           ` Russell King - ARM Linux
2011-04-29 16:32                           ` Jesse Barnes
2011-04-29 16:32                             ` Jesse Barnes
2011-04-29 18:29                             ` Arnd Bergmann
2011-04-29 18:29                               ` Arnd Bergmann
2011-04-29 22:15                               ` Russell King - ARM Linux
2011-04-29 22:15                                 ` Russell King - ARM Linux
2011-05-02  4:42                                 ` David Brown
2011-05-02  4:42                                   ` David Brown
2011-05-02 11:26                                   ` Arnd Bergmann
2011-05-02 11:26                                     ` Arnd Bergmann
2011-04-29 22:37                               ` Benjamin Herrenschmidt
2011-04-29 22:37                                 ` Benjamin Herrenschmidt
2011-04-29 13:42                     ` Joerg Roedel
2011-04-29 13:42                       ` Joerg Roedel
2011-04-29 14:19                       ` Jerome Glisse
2011-04-29 14:19                         ` Jerome Glisse
2011-04-29 15:37                       ` Jordan Crouse
2011-04-29 15:37                         ` Jordan Crouse
2011-04-28 14:38                 ` FUJITA Tomonori
2011-04-28 14:38                   ` FUJITA Tomonori
2011-04-29  0:25               ` Benjamin Herrenschmidt
2011-04-29  0:25                 ` Benjamin Herrenschmidt
2011-04-29 11:21                 ` Arnd Bergmann
2011-04-29 11:21                   ` Arnd Bergmann
2011-04-28 10:41   ` Joerg Roedel
2011-04-28 10:41     ` Joerg Roedel
2011-04-28 11:01     ` Russell King - ARM Linux
2011-04-28 11:01       ` Russell King - ARM Linux
2011-04-28 12:25       ` Joerg Roedel
2011-04-28 12:25         ` Joerg Roedel
2011-04-28 12:42         ` Russell King - ARM Linux
2011-04-28 12:42           ` Russell King - ARM Linux
2011-04-28 12:59           ` Joerg Roedel
2011-04-28 12:59             ` Joerg Roedel
2011-04-28 13:02           ` Arnd Bergmann
2011-04-28 13:02             ` Arnd Bergmann
2011-04-28 13:19             ` Russell King - ARM Linux
2011-04-28 13:19               ` Russell King - ARM Linux
2011-04-28 13:56               ` Joerg Roedel
2011-04-28 13:56                 ` Joerg Roedel
2011-04-28 14:30                 ` Russell King - ARM Linux
2011-04-28 14:30                   ` Russell King - ARM Linux
2011-04-27  9:52 ` Catalin Marinas
2011-04-27  9:52   ` Catalin Marinas
2011-04-27 10:43   ` Arnd Bergmann
2011-04-27 10:43     ` Arnd Bergmann
2011-04-27 11:08     ` Catalin Marinas
2011-04-27 11:08       ` Catalin Marinas
2011-04-28  0:15       ` Valdis.Kletnieks
2011-04-28  0:15         ` Valdis.Kletnieks at vt.edu
2011-04-28  8:27         ` Catalin Marinas
2011-04-28  8:27           ` Catalin Marinas
2011-04-28 12:12           ` Arnd Bergmann
2011-04-28 12:12             ` Arnd Bergmann
2011-04-28 12:36             ` Russell King - ARM Linux
2011-04-28 12:36               ` Russell King - ARM Linux
2011-04-28 12:48               ` Arnd Bergmann
2011-04-28 12:48                 ` Arnd Bergmann
2011-05-03 14:45             ` Dave Martin
2011-05-03 14:45               ` Dave Martin
2011-04-29 15:41       ` [Linaro-mm-sig] " Arnd Bergmann
2011-04-29 15:41         ` Arnd Bergmann
2011-04-29 16:42         ` Catalin Marinas
2011-04-29 16:42           ` Catalin Marinas
2011-05-03 15:05     ` [Linaro-mm-sig] " Laurent Pinchart
2011-05-03 15:05       ` Laurent Pinchart
2011-05-03 15:31       ` Arnd Bergmann
2011-05-03 15:31         ` Arnd Bergmann
2011-04-27 14:06   ` FUJITA Tomonori
2011-04-27 14:06     ` FUJITA Tomonori
2011-04-27 14:29     ` Catalin Marinas
2011-04-27 14:29       ` Catalin Marinas
2011-04-27 14:34       ` FUJITA Tomonori
2011-04-27 14:34         ` FUJITA Tomonori
2011-04-27 20:29     ` Russell King - ARM Linux
2011-04-27 20:29       ` Russell King - ARM Linux
2011-04-27 21:45   ` Benjamin Herrenschmidt
2011-04-27 21:45     ` Benjamin Herrenschmidt
2011-04-28  7:24     ` [Linaro-mm-sig] " KyongHo Cho
2011-04-28  7:24       ` KyongHo Cho
2011-04-28  8:31     ` Catalin Marinas
2011-04-28  8:31       ` Catalin Marinas
2011-04-27 21:31 ` Benjamin Herrenschmidt
2011-04-27 21:31   ` Benjamin Herrenschmidt
2011-04-28  9:42   ` Russell King - ARM Linux
2011-04-28  9:42     ` Russell King - ARM Linux
2011-04-28 10:27 ` Joerg Roedel
2011-04-28 10:27   ` Joerg Roedel
2011-04-28 12:15   ` Arnd Bergmann
2011-04-28 12:15     ` Arnd Bergmann
2011-05-03 14:35 [Linaro-mm-sig] " Laurent Pinchart
2011-05-03 14:35 ` Laurent Pinchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTinhm7ar1mf1D-dSMiLtw5hRNY36RA@mail.gmail.com \
    --to=j.glisse@gmail.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.