All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Thomas Hellstrom <thellstrom@vmware.com>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	Russell King - ARM Linux <linux@arm.linux.org.uk>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-kernel@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [Linaro-mm-sig] [RFC] ARM DMA mapping TODO, v1
Date: Sat, 30 Apr 2011 08:46:54 +1000	[thread overview]
Message-ID: <1304117214.2513.262.camel@pasglop> (raw)
In-Reply-To: <20110429092712.5bbd6948@jbarnes-desktop>

On Fri, 2011-04-29 at 09:27 -0700, Jesse Barnes wrote:

> You must be making it sound worse than it really is, otherwise how
> would an embedded platform like the above deal with a display engine
> that needed a large, contiguous chunk of uncached memory for the
> display buffer?  If the CPU is actively speculating into it and
> overwriting blits etc it would never work...  Or do you do such
> reservations up front at 1G granularity??

Such embedded platforms have not been used with GPUs so far and our only
implementation of 64-bit BookE is fortunately also completely cache
coherent :-)

The good thing on ppc is that so far there is no new design coming from
us or FSL that isn't cache coherent. The bad thing is that people seem
to still try to pump out things using old 44x which isn't and somewhat
seem to also want to use GPUs on them :-)

The 44x is a case where I have a small (64 entries) SW loaded TLB and I
bolt the first 768M of the linear mapping (lowmem) using 3x256M entries.
What "saves" it is that it's also an ancient design with essentially a
busted prefetch engine that will thus cope with aliases as long as we
don't explicitely access the cached and non-cached aliases
simultaneously. 

The nasty cases I have never really dealt with properly are the Apple
machines and their non coherent AGP. Those processors were really not
designed with the idea that one would do non-coherent DMA, especially
the 970 (G5) and our Linux code really don't like it.

Things tend to "work" with DRI 1 because we allocate the AGP memory once
in one big chunk (it's pages but they are allocated together and thus
tend to be contiguous) so the possible issues with prefetch are so rare,
I think we end up being lucky. With DRI 2 dynamically mapping things
in/out, we have a bigger problem and I don't know how to solve it other
than forcing the DRM to allocate graphic objects in reserved areas of
memory made of 16M pools that I unmap from the linear mapping.... (since
I use 16M pages to map the linear mapping). 

For ppc32 laptops it's even worse as I use 256MB BATs (block address
translation, kind of special registers to create large static mappings)
to map the linear mapping, which brings me back to the 44x case to some
extent. I can't really do without at the moment, at the very least I
require the kernel text / data / bss to be covered by BATs.

> > Right. We should still shoot HW designers who give up coherency for the
> > sake of 3D benchmarks. It's insanely stupid.
> 
> Ah if it were that simple. :)  There are big costs to implementing full
> coherency for all your devices, as you well know, so it's just not a
> question of benchmark optimization.

But it -is- that simple.

You do have to deal with coherency anyways for your PHB unless you start
advocating that we should make everything else non coherent as well. So
you have the logic. Just make your GPU operate on the same protocol.

It's really only a perf tradeoff I believe. And a bad one.

Cheers,
Ben.


WARNING: multiple messages have this Message-ID (diff)
From: benh@kernel.crashing.org (Benjamin Herrenschmidt)
To: linux-arm-kernel@lists.infradead.org
Subject: [Linaro-mm-sig] [RFC] ARM DMA mapping TODO, v1
Date: Sat, 30 Apr 2011 08:46:54 +1000	[thread overview]
Message-ID: <1304117214.2513.262.camel@pasglop> (raw)
In-Reply-To: <20110429092712.5bbd6948@jbarnes-desktop>

On Fri, 2011-04-29 at 09:27 -0700, Jesse Barnes wrote:

> You must be making it sound worse than it really is, otherwise how
> would an embedded platform like the above deal with a display engine
> that needed a large, contiguous chunk of uncached memory for the
> display buffer?  If the CPU is actively speculating into it and
> overwriting blits etc it would never work...  Or do you do such
> reservations up front at 1G granularity??

Such embedded platforms have not been used with GPUs so far and our only
implementation of 64-bit BookE is fortunately also completely cache
coherent :-)

The good thing on ppc is that so far there is no new design coming from
us or FSL that isn't cache coherent. The bad thing is that people seem
to still try to pump out things using old 44x which isn't and somewhat
seem to also want to use GPUs on them :-)

The 44x is a case where I have a small (64 entries) SW loaded TLB and I
bolt the first 768M of the linear mapping (lowmem) using 3x256M entries.
What "saves" it is that it's also an ancient design with essentially a
busted prefetch engine that will thus cope with aliases as long as we
don't explicitely access the cached and non-cached aliases
simultaneously. 

The nasty cases I have never really dealt with properly are the Apple
machines and their non coherent AGP. Those processors were really not
designed with the idea that one would do non-coherent DMA, especially
the 970 (G5) and our Linux code really don't like it.

Things tend to "work" with DRI 1 because we allocate the AGP memory once
in one big chunk (it's pages but they are allocated together and thus
tend to be contiguous) so the possible issues with prefetch are so rare,
I think we end up being lucky. With DRI 2 dynamically mapping things
in/out, we have a bigger problem and I don't know how to solve it other
than forcing the DRM to allocate graphic objects in reserved areas of
memory made of 16M pools that I unmap from the linear mapping.... (since
I use 16M pages to map the linear mapping). 

For ppc32 laptops it's even worse as I use 256MB BATs (block address
translation, kind of special registers to create large static mappings)
to map the linear mapping, which brings me back to the 44x case to some
extent. I can't really do without at the moment, at the very least I
require the kernel text / data / bss to be covered by BATs.

> > Right. We should still shoot HW designers who give up coherency for the
> > sake of 3D benchmarks. It's insanely stupid.
> 
> Ah if it were that simple. :)  There are big costs to implementing full
> coherency for all your devices, as you well know, so it's just not a
> question of benchmark optimization.

But it -is- that simple.

You do have to deal with coherency anyways for your PHB unless you start
advocating that we should make everything else non coherent as well. So
you have the logic. Just make your GPU operate on the same protocol.

It's really only a perf tradeoff I believe. And a bad one.

Cheers,
Ben.

  reply	other threads:[~2011-04-29 22:47 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-21 19:29 [RFC] ARM DMA mapping TODO, v1 Arnd Bergmann
2011-04-21 19:29 ` Arnd Bergmann
2011-04-21 20:09 ` [Linaro-mm-sig] " Jesse Barnes
2011-04-21 20:09   ` Jesse Barnes
2011-04-21 21:52   ` Zach Pfeffer
2011-04-21 21:52     ` Zach Pfeffer
2011-04-22  0:34     ` KyongHo Cho
2011-04-22  0:34       ` KyongHo Cho
2011-04-26 14:29       ` Arnd Bergmann
2011-04-26 14:29         ` Arnd Bergmann
2011-04-26 14:28     ` Arnd Bergmann
2011-04-26 14:28       ` Arnd Bergmann
2011-04-26 14:26   ` Arnd Bergmann
2011-04-26 14:26     ` Arnd Bergmann
2011-04-26 15:39     ` Jesse Barnes
2011-04-26 15:39       ` Jesse Barnes
2011-04-27  7:35 ` Russell King - ARM Linux
2011-04-27  7:35   ` Russell King - ARM Linux
2011-04-27  8:56   ` Arnd Bergmann
2011-04-27  8:56     ` Arnd Bergmann
2011-04-27  9:09     ` Russell King - ARM Linux
2011-04-27  9:09       ` Russell King - ARM Linux
2011-04-27 11:02       ` Arnd Bergmann
2011-04-27 11:02         ` Arnd Bergmann
2011-04-27 16:16         ` [Linaro-mm-sig] " Alex Deucher
2011-04-27 16:16           ` Alex Deucher
2011-04-27 17:44           ` Anca Emanuel
2011-04-27 17:44             ` Anca Emanuel
2011-04-27 20:27             ` Russell King - ARM Linux
2011-04-27 20:27               ` Russell King - ARM Linux
2011-04-27 20:16         ` Russell King - ARM Linux
2011-04-27 20:16           ` Russell King - ARM Linux
2011-04-27 20:21           ` Arnd Bergmann
2011-04-27 20:21             ` Arnd Bergmann
2011-04-27 20:26             ` Russell King - ARM Linux
2011-04-27 20:26               ` Russell King - ARM Linux
2011-04-27 20:48               ` Arnd Bergmann
2011-04-27 20:48                 ` Arnd Bergmann
2011-04-27 21:41               ` Benjamin Herrenschmidt
2011-04-27 21:41                 ` Benjamin Herrenschmidt
2011-04-28  9:30                 ` Russell King - ARM Linux
2011-04-28  9:30                   ` Russell King - ARM Linux
2011-04-28 21:07                   ` Benjamin Herrenschmidt
2011-04-28 21:07                     ` Benjamin Herrenschmidt
2011-04-29 11:26                     ` Arnd Bergmann
2011-04-29 11:26                       ` Arnd Bergmann
2011-04-29 11:47                       ` Benjamin Herrenschmidt
2011-04-29 11:47                         ` Benjamin Herrenschmidt
2011-04-29 11:56                       ` Alan Cox
2011-04-29 11:56                         ` Alan Cox
2011-04-29 22:51                         ` Benjamin Herrenschmidt
2011-04-29 22:51                           ` Benjamin Herrenschmidt
2011-04-29 12:06                       ` [Linaro-mm-sig] " Thomas Hellstrom
2011-04-29 12:06                         ` Thomas Hellstrom
2011-04-29 13:34                         ` Jerome Glisse
2011-04-29 13:34                           ` Jerome Glisse
2011-04-29 22:55                           ` Benjamin Herrenschmidt
2011-04-29 22:55                             ` Benjamin Herrenschmidt
2011-04-29 22:53                         ` Benjamin Herrenschmidt
2011-04-29 22:53                           ` Benjamin Herrenschmidt
2011-04-27 10:51     ` Marek Szyprowski
2011-04-27 10:51       ` Marek Szyprowski
2011-04-27 21:37   ` Benjamin Herrenschmidt
2011-04-27 21:37     ` Benjamin Herrenschmidt
2011-04-28  6:40     ` [Linaro-mm-sig] " Arnd Bergmann
2011-04-28  6:40       ` Arnd Bergmann
2011-04-28  6:46       ` FUJITA Tomonori
2011-04-28  6:46         ` FUJITA Tomonori
2011-04-28  9:37     ` Russell King - ARM Linux
2011-04-28  9:37       ` Russell King - ARM Linux
2011-04-28 10:32       ` [Linaro-mm-sig] " Marek Szyprowski
2011-04-28 10:32         ` Marek Szyprowski
2011-04-28 10:51         ` Russell King - ARM Linux
2011-04-28 10:51           ` Russell King - ARM Linux
2011-04-28 12:28           ` Arnd Bergmann
2011-04-28 12:28             ` Arnd Bergmann
2011-04-28 13:15             ` Russell King - ARM Linux
2011-04-28 13:15               ` Russell King - ARM Linux
2011-04-28 14:29               ` Arnd Bergmann
2011-04-28 14:29                 ` Arnd Bergmann
2011-04-28 14:34                 ` Russell King - ARM Linux
2011-04-28 14:34                   ` Russell King - ARM Linux
2011-04-28 14:39                   ` Arnd Bergmann
2011-04-28 14:39                     ` Arnd Bergmann
2011-04-28 14:58                     ` Russell King - ARM Linux
2011-04-28 14:58                       ` Russell King - ARM Linux
2011-04-28 19:37                   ` Jerome Glisse
2011-04-28 19:37                     ` Jerome Glisse
2011-04-29  0:29                     ` Benjamin Herrenschmidt
2011-04-29  0:29                       ` Benjamin Herrenschmidt
2011-04-29  5:50                       ` Thomas Hellstrom
2011-04-29  5:50                         ` Thomas Hellstrom
2011-04-29  7:35                         ` Benjamin Herrenschmidt
2011-04-29  7:35                           ` Benjamin Herrenschmidt
2011-04-29 10:55                           ` Thomas Hellstrom
2011-04-29 10:55                             ` Thomas Hellstrom
2011-04-29 22:50                             ` Benjamin Herrenschmidt
2011-04-29 22:50                               ` Benjamin Herrenschmidt
2011-04-29 16:27                           ` Jesse Barnes
2011-04-29 16:27                             ` Jesse Barnes
2011-04-29 22:46                             ` Benjamin Herrenschmidt [this message]
2011-04-29 22:46                               ` Benjamin Herrenschmidt
2011-04-30  2:45                               ` Jesse Barnes
2011-04-30  2:45                                 ` Jesse Barnes
2011-04-29  7:59                         ` Russell King - ARM Linux
2011-04-29  7:59                           ` Russell King - ARM Linux
2011-04-29 16:32                           ` Jesse Barnes
2011-04-29 16:32                             ` Jesse Barnes
2011-04-29 18:29                             ` Arnd Bergmann
2011-04-29 18:29                               ` Arnd Bergmann
2011-04-29 22:15                               ` Russell King - ARM Linux
2011-04-29 22:15                                 ` Russell King - ARM Linux
2011-05-02  4:42                                 ` David Brown
2011-05-02  4:42                                   ` David Brown
2011-05-02 11:26                                   ` Arnd Bergmann
2011-05-02 11:26                                     ` Arnd Bergmann
2011-04-29 22:37                               ` Benjamin Herrenschmidt
2011-04-29 22:37                                 ` Benjamin Herrenschmidt
2011-04-29 13:42                     ` Joerg Roedel
2011-04-29 13:42                       ` Joerg Roedel
2011-04-29 14:19                       ` Jerome Glisse
2011-04-29 14:19                         ` Jerome Glisse
2011-04-29 15:37                       ` Jordan Crouse
2011-04-29 15:37                         ` Jordan Crouse
2011-04-28 14:38                 ` FUJITA Tomonori
2011-04-28 14:38                   ` FUJITA Tomonori
2011-04-29  0:25               ` Benjamin Herrenschmidt
2011-04-29  0:25                 ` Benjamin Herrenschmidt
2011-04-29 11:21                 ` Arnd Bergmann
2011-04-29 11:21                   ` Arnd Bergmann
2011-04-28 10:41   ` Joerg Roedel
2011-04-28 10:41     ` Joerg Roedel
2011-04-28 11:01     ` Russell King - ARM Linux
2011-04-28 11:01       ` Russell King - ARM Linux
2011-04-28 12:25       ` Joerg Roedel
2011-04-28 12:25         ` Joerg Roedel
2011-04-28 12:42         ` Russell King - ARM Linux
2011-04-28 12:42           ` Russell King - ARM Linux
2011-04-28 12:59           ` Joerg Roedel
2011-04-28 12:59             ` Joerg Roedel
2011-04-28 13:02           ` Arnd Bergmann
2011-04-28 13:02             ` Arnd Bergmann
2011-04-28 13:19             ` Russell King - ARM Linux
2011-04-28 13:19               ` Russell King - ARM Linux
2011-04-28 13:56               ` Joerg Roedel
2011-04-28 13:56                 ` Joerg Roedel
2011-04-28 14:30                 ` Russell King - ARM Linux
2011-04-28 14:30                   ` Russell King - ARM Linux
2011-04-27  9:52 ` Catalin Marinas
2011-04-27  9:52   ` Catalin Marinas
2011-04-27 10:43   ` Arnd Bergmann
2011-04-27 10:43     ` Arnd Bergmann
2011-04-27 11:08     ` Catalin Marinas
2011-04-27 11:08       ` Catalin Marinas
2011-04-28  0:15       ` Valdis.Kletnieks
2011-04-28  0:15         ` Valdis.Kletnieks at vt.edu
2011-04-28  8:27         ` Catalin Marinas
2011-04-28  8:27           ` Catalin Marinas
2011-04-28 12:12           ` Arnd Bergmann
2011-04-28 12:12             ` Arnd Bergmann
2011-04-28 12:36             ` Russell King - ARM Linux
2011-04-28 12:36               ` Russell King - ARM Linux
2011-04-28 12:48               ` Arnd Bergmann
2011-04-28 12:48                 ` Arnd Bergmann
2011-05-03 14:45             ` Dave Martin
2011-05-03 14:45               ` Dave Martin
2011-04-29 15:41       ` [Linaro-mm-sig] " Arnd Bergmann
2011-04-29 15:41         ` Arnd Bergmann
2011-04-29 16:42         ` Catalin Marinas
2011-04-29 16:42           ` Catalin Marinas
2011-05-03 15:05     ` [Linaro-mm-sig] " Laurent Pinchart
2011-05-03 15:05       ` Laurent Pinchart
2011-05-03 15:31       ` Arnd Bergmann
2011-05-03 15:31         ` Arnd Bergmann
2011-04-27 14:06   ` FUJITA Tomonori
2011-04-27 14:06     ` FUJITA Tomonori
2011-04-27 14:29     ` Catalin Marinas
2011-04-27 14:29       ` Catalin Marinas
2011-04-27 14:34       ` FUJITA Tomonori
2011-04-27 14:34         ` FUJITA Tomonori
2011-04-27 20:29     ` Russell King - ARM Linux
2011-04-27 20:29       ` Russell King - ARM Linux
2011-04-27 21:45   ` Benjamin Herrenschmidt
2011-04-27 21:45     ` Benjamin Herrenschmidt
2011-04-28  7:24     ` [Linaro-mm-sig] " KyongHo Cho
2011-04-28  7:24       ` KyongHo Cho
2011-04-28  8:31     ` Catalin Marinas
2011-04-28  8:31       ` Catalin Marinas
2011-04-27 21:31 ` Benjamin Herrenschmidt
2011-04-27 21:31   ` Benjamin Herrenschmidt
2011-04-28  9:42   ` Russell King - ARM Linux
2011-04-28  9:42     ` Russell King - ARM Linux
2011-04-28 10:27 ` Joerg Roedel
2011-04-28 10:27   ` Joerg Roedel
2011-04-28 12:15   ` Arnd Bergmann
2011-04-28 12:15     ` Arnd Bergmann
2011-05-03 14:35 [Linaro-mm-sig] " Laurent Pinchart
2011-05-03 14:35 ` Laurent Pinchart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1304117214.2513.262.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=arnd@arndb.de \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=thellstrom@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.