All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Starkey <Brian.Starkey@arm.com>
To: Liam Mark <lmark@codeaurora.org>
Cc: nd <nd@arm.com>, Sumit Semwal <sumit.semwal@linaro.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"devel@driverdev.osuosl.org" <devel@driverdev.osuosl.org>,
	Martijn Coenen <maco@android.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	John Stultz <john.stultz@linaro.org>,
	Todd Kjos <tkjos@android.com>, Arve Hjonnevag <arve@android.com>,
	"linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>,
	Laura Abbott <labbott@redhat.com>
Subject: Re: [RFC PATCH v2] android: ion: How to properly clean caches for uncached allocations
Date: Wed, 28 Nov 2018 11:10:53 +0000	[thread overview]
Message-ID: <20181128111052.q2xokwcpuucynoft@DESKTOP-E1NTVVP.localdomain> (raw)
In-Reply-To: <alpine.DEB.2.10.1811272231460.20916@lmark-linux.qualcomm.com>

Hi Liam,

On Tue, Nov 27, 2018 at 10:46:07PM -0800, Liam Mark wrote:
> On Tue, 27 Nov 2018, Brian Starkey wrote:
> 
> > Hi Liam,
> > 
> > On Mon, Nov 26, 2018 at 08:59:44PM -0800, Liam Mark wrote:
> > > On Tue, 20 Nov 2018, Brian Starkey wrote:
> > > 
> > > > Hi Liam,
> > > > 
> > > > I'm missing a bit of context here, but I did read the v1 thread.
> > > > Please accept my apologies if I'm re-treading trodden ground.
> > > > 
> > > > I do know we're chasing nebulous ion "problems" on our end, which
> > > > certainly seem to be related to what you're trying to fix here.
> > > > 
> > > > On Thu, Nov 01, 2018 at 03:15:06PM -0700, Liam Mark wrote:
> > > > >Based on the suggestions from Laura I created a first draft for a change
> > > > >which will attempt to ensure that uncached mappings are only applied to
> > > > >ION memory who's cache lines have been cleaned.
> > > > >It does this by providing cached mappings (for uncached ION allocations)
> > > > >until the ION buffer is dma mapped and successfully cleaned, then it
> > > > drops
> > > > >the userspace mappings and when pages are accessed they are faulted back
> > > > >in and uncached mappings are created.
> > > > 
> > > > If I understand right, there's no way to portably clean the cache of
> > > > the kernel mapping before we map the pages into userspace. Is that
> > > > right?
> > > > 
> > > 
> > > Yes, it isn't always possible to clean the caches for an uncached mapping 
> > > because a device is required by the DMA APIs to do cache maintenance and 
> > > there isn't necessarily a device available (dma_buf_attach may not yet 
> > > have been called).
> > > 
> > > > Alternatively, can we just make ion refuse to give userspace a
> > > > non-cached mapping for pages which are mapped in the kernel as cached?
> > > 
> > > These pages will all be mapped as cached in the kernel for 64 bit (kernel 
> > > logical addresses) so you would always be refusing to create a non-cached mapping.
> > 
> > And that might be the sane thing to do, no?
> > 
> > AFAIK there are still pages which aren't ever mapped as cached (e.g.
> > dma_declare_coherent_memory(), anything under /reserved-memory marked
> > as no-map). If those are exposed as an ion heap, then non-cached
> > mappings would be fine, and permitted.
> > 
> 
> Sounds like you are suggesting using carveouts to support uncached?
> 

No, I'm just saying that ion can't give out uncached _CPU_ mappings
for pages which are already mapped on the CPU as cached.

> We have many multimedia use cases which use very large amounts of uncached
> memory, uncached memory is used as a performance optimization because CPU
> access won't happen so it allows us to skip cache maintenance for all the
> dma map and dma unmap calls. To create carveouts large enough to support
> to support the worst case scenarios could result in very large carveouts.
> 
> Large carveouts like this would likely result in poor memory utilizations
> (since they are tuned for worst case) which would likely have significant
> performance impacts (more limited memory causes more frequent memory
> reclaim ect...).
> 
> Also estimating for worst case could be difficult since the amount of
> uncached memory could be app dependent.
> Unfortunately I don't think this would make for a very scalable solution.
> 

Sure, I understand the desire not to use carveouts. I'm not suggesting
carveouts are a viable alternative.

> > > 
> > > > Would userspace using the dma-buf sync ioctl around its accesses do
> > > > the "right thing" in that case?
> > > > 
> > > 
> > > I don't think so, the dma-buf sync ioctl require a device to peform cache 
> > > maintenance, but as mentioned above a device may not be available.
> > > 
> > 
> > If a device didn't attach yet, then no cache maintenance is
> > necessary. The only thing accessing the memory is the CPU, via a
> > cached mapping, which should work just fine. So far so good.
> > 
> 
> Unfortunately not.
> Scenario:
> - Client allocates uncached memory.
> - Client calls the DMA_BUF_IOCTL_SYNC IOCT IOCTL with flags
> DMA_BUF_SYNC_START (but this doesn't do any cache maintenance since there
> isn't any device)
> - Client mmap the memory (ION creates uncached mapping)
> - Client reads from that uncached mapping

I think I maybe wasn't clear with my proposal. The sequence should be
like this:

 - Client allocates memory
   - If this is from a region which the CPU has mapped as cached, then
     that's not "uncached" memory - it's cached memory - and you have
     to treat it as such.
 - Client calls the DMA_BUF_IOCTL_SYNC IOCTL with flags
   DMA_BUF_SYNC_START (but this doesn't do any cache maintenance since
   there isn't any device)
 - Client mmaps the memory
   - ion creates a _cached_ mapping into the userspace process. ion
     *must not* create an uncached mapping.
 - Client reads from that cached mapping
   - It sees zeroes, as expected.

This proposal ensures that everyone will *always* see correct data if
they use the DMA APIs properly (device accesses via
dma_buf_{map,unmap}, CPU access via {begin,end}_cpu_access).

> 
> Because memory has not been cleaned (we haven't had a device yet) the
> zeros that were written to this memory could  still be in the cache (since
> they were written with a cached mapping), this means that the unprivilived
> userpace client is now potentially reading sensitive kernel data....
> 

This is precisely why you can't just "pretend" that those pages
are uncached. You can't have the same memory mapped with different
attributes and get consistent behaviour.

> > If there are already attachments, then ion_dma_buf_begin_cpu_access()
> > will sync for CPU access against all of the attached devices, and
> > again the CPU should see the right thing.
> > 
> > In the other direction, ion_dma_buf_end_cpu_access() will sync for
> > device access for all currently attached devices. If there's no
> > attached devices yet, then there's nothing to do until there is (only
> > thing accessing is CPU via a CPU-cached mapping).
> > 
> > When the first (or another) device attaches, then when it maps the
> > buffer, the map_dma_buf callback should do whatever sync-ing is needed
> > for that device.
> > 
> > I might be way off with my understanding of the various DMA APIs, but
> > this is how I think they're meant to work.
> > 
> > > > Given that as you pointed out, the kernel does still have a cached
> > > > mapping to these pages, trying to give the CPU a non-cached mapping of
> > > > those same pages while preserving consistency seems fraught. Wouldn't
> > > > it be better to make sure all CPU mappings are cached, and have CPU
> > > > clients use the dma_buf_{begin,end}_cpu_access() hooks to get
> > > > consistency where needed?
> > > > 
> > > 
> > > It is fraught, but unfortunately you can't rely on 
> > > dma_buf_{begin,end}_cpu_access() to do cache maintenance as these calls 
> > > require a device, and a device is not always available.
> > 
> > As above, if there's really no device, then no syncing is needed
> > because only the CPU is accessing the buffer, and only ever via cached
> > mappings.
> > 
> 
> Sure you can use cached mappings, but with cached memory to ensure cache 
> coherency you would always need to do cache maintenance at dma map and dma 
> unmap (since you can't rely on their being a device when 
> dma_buf_{begin,end}_cpu_access() hooks are called).

As you've said below, you can't skip cache maintenance in the general
case - the first time a device maps the buffer, you need to clean the
cache to make sure the memset(0) is seen by the device.

> But with this cached memory you get poor performance because you are 
> frequently doing cache mainteance uncessarily because there *could* be CPU access.
> 
> The reason we want to use uncached allocations, with uncached mappings, is 
> to avoid all this uncessary cache maintenance.
> 

OK I think this is the key - you don't actually care whether the
mappings are non-cached, you just don't want to pay a sync penalty if
the CPU never touched the buffer.

In that case, then to me the right thing to do is make ion use
dma_map_sg_attrs(..., DMA_ATTR_SKIP_CPU_SYNC) in ion_map_dma_buf(), if
it knows that the CPU hasn't touched the buffer (which it does - from
{begin,end}_cpu_access).

That seems to be exactly what it's there for:

 /*
  * DMA_ATTR_SKIP_CPU_SYNC: Allows platform code to skip synchronization of
  * the CPU cache for the given buffer assuming that it has been already
  * transferred to 'device' domain.
  */

The very first time you map the buffer on a device, you have to sync
(transfer to 'device' domain). After that, if you never touch the
buffer on the CPU, then you'll never pay the CPU cache maintenance
penalty.

> > > 
> > > > >
> > > > >This change has the following potential disadvantages:
> > > > >- It assumes that userpace clients won't attempt to access the buffer
> > > > >while it is being mapped as we are removing the userpspace mappings at
> > > > >this point (though it is okay for them to have it mapped)
> > > > >- It assumes that kernel clients won't hold a kernel mapping to the
> > > > buffer
> > > > >(ie dma_buf_kmap) while it is being dma-mapped. What should we do if
> > > > there
> > > > >is a kernel mapping at the time of dma mapping, fail the mapping, warn?
> > > > >- There may be a performance penalty as a result of having to fault in
> > > > the
> > > > >pages after removing the userspace mappings.
> > > > 
> > > > I wonder if the dma-buf sync ioctl might provide a way for userspace
> > > > to opt-in to when the zap/fault happens. Zap on (DMA_BUF_SYNC_WRITE |
> > > > DMA_BUF_SYNC_WRITE_END) and fault on (DMA_BUF_SYNC_READ |
> > > > DMA_BUF_SYNC_START)
> > > > 
> > > 
> > > Not sure I understand, can you elaborate. 
> > > Are you also adding a requirment that ION pages can't be mmaped during a
> > > call to dma_buf_map_attachment?
> > 
> > I was only suggesting that zapping the mappings "at random" (from
> > userspace's perspective), and then faulting them back in (also "at
> > random"), might cause unexpected and not-controllable stalls in the
> > app. We could use the ioctl hooks as an explicit indication from the
> > app that now is a good time to zap the mapping and/or fault back in
> > the whole buffer. begin_cpu_access is allowed to be a "slow"
> > operation, so apps should already be expecting to get stalled on the
> > sync ioctl.
> > 
> 
> I think we have to do the zapping when have a device with which we can
> then immediately clean the caches for the memory.
> 
> The dma_buf_map_attachement seems like a logical time to do this, we have
> a device and the user should not be doing CPU access at this time.
> There is no guarantee you will ever have a device attached when the ioctl
> hooks are called so this could mean you never get a chance to switch to
> actual uncached mappings if you only try to do this from the ioctl hooks.
> 

You can always zap in the ioctl. You just might end up having to
create a cached mapping for userspace again if a device doesn't attach
before the next time it calls the SYNC_START ioctl.

So yeah, with your approach of trying to switch userspace over to
non-cached mappings, I think map_attachment is the best place to do
the whole shebang, to avoid unnecessary work.

> The one-of hit of having to fault the pages back in is unfortunate but I
> can't seem to find a better time to do it.

That part you really could do in the SYNC_START ioctl, it's just not
symmetric.

Thanks,
-Brian

> 
> Liam
> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

WARNING: multiple messages have this Message-ID (diff)
From: Brian.Starkey@arm.com (Brian Starkey)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2] android: ion: How to properly clean caches for uncached allocations
Date: Wed, 28 Nov 2018 11:10:53 +0000	[thread overview]
Message-ID: <20181128111052.q2xokwcpuucynoft@DESKTOP-E1NTVVP.localdomain> (raw)
In-Reply-To: <alpine.DEB.2.10.1811272231460.20916@lmark-linux.qualcomm.com>

Hi Liam,

On Tue, Nov 27, 2018 at 10:46:07PM -0800, Liam Mark wrote:
> On Tue, 27 Nov 2018, Brian Starkey wrote:
> 
> > Hi Liam,
> > 
> > On Mon, Nov 26, 2018 at 08:59:44PM -0800, Liam Mark wrote:
> > > On Tue, 20 Nov 2018, Brian Starkey wrote:
> > > 
> > > > Hi Liam,
> > > > 
> > > > I'm missing a bit of context here, but I did read the v1 thread.
> > > > Please accept my apologies if I'm re-treading trodden ground.
> > > > 
> > > > I do know we're chasing nebulous ion "problems" on our end, which
> > > > certainly seem to be related to what you're trying to fix here.
> > > > 
> > > > On Thu, Nov 01, 2018 at 03:15:06PM -0700, Liam Mark wrote:
> > > > >Based on the suggestions from Laura I created a first draft for a change
> > > > >which will attempt to ensure that uncached mappings are only applied to
> > > > >ION memory who's cache lines have been cleaned.
> > > > >It does this by providing cached mappings (for uncached ION allocations)
> > > > >until the ION buffer is dma mapped and successfully cleaned, then it
> > > > drops
> > > > >the userspace mappings and when pages are accessed they are faulted back
> > > > >in and uncached mappings are created.
> > > > 
> > > > If I understand right, there's no way to portably clean the cache of
> > > > the kernel mapping before we map the pages into userspace. Is that
> > > > right?
> > > > 
> > > 
> > > Yes, it isn't always possible to clean the caches for an uncached mapping 
> > > because a device is required by the DMA APIs to do cache maintenance and 
> > > there isn't necessarily a device available (dma_buf_attach may not yet 
> > > have been called).
> > > 
> > > > Alternatively, can we just make ion refuse to give userspace a
> > > > non-cached mapping for pages which are mapped in the kernel as cached?
> > > 
> > > These pages will all be mapped as cached in the kernel for 64 bit (kernel 
> > > logical addresses) so you would always be refusing to create a non-cached mapping.
> > 
> > And that might be the sane thing to do, no?
> > 
> > AFAIK there are still pages which aren't ever mapped as cached (e.g.
> > dma_declare_coherent_memory(), anything under /reserved-memory marked
> > as no-map). If those are exposed as an ion heap, then non-cached
> > mappings would be fine, and permitted.
> > 
> 
> Sounds like you are suggesting using carveouts to support uncached?
> 

No, I'm just saying that ion can't give out uncached _CPU_ mappings
for pages which are already mapped on the CPU as cached.

> We have many multimedia use cases which use very large amounts of uncached
> memory, uncached memory is used as a performance optimization because CPU
> access won't happen so it allows us to skip cache maintenance for all the
> dma map and dma unmap calls. To create carveouts large enough to support
> to support the worst case scenarios could result in very large carveouts.
> 
> Large carveouts like this would likely result in poor memory utilizations
> (since they are tuned for worst case) which would likely have significant
> performance impacts (more limited memory causes more frequent memory
> reclaim ect...).
> 
> Also estimating for worst case could be difficult since the amount of
> uncached memory could be app dependent.
> Unfortunately I don't think this would make for a very scalable solution.
> 

Sure, I understand the desire not to use carveouts. I'm not suggesting
carveouts are a viable alternative.

> > > 
> > > > Would userspace using the dma-buf sync ioctl around its accesses do
> > > > the "right thing" in that case?
> > > > 
> > > 
> > > I don't think so, the dma-buf sync ioctl require a device to peform cache 
> > > maintenance, but as mentioned above a device may not be available.
> > > 
> > 
> > If a device didn't attach yet, then no cache maintenance is
> > necessary. The only thing accessing the memory is the CPU, via a
> > cached mapping, which should work just fine. So far so good.
> > 
> 
> Unfortunately not.
> Scenario:
> - Client allocates uncached memory.
> - Client calls the DMA_BUF_IOCTL_SYNC IOCT IOCTL with flags
> DMA_BUF_SYNC_START (but this doesn't do any cache maintenance since there
> isn't any device)
> - Client mmap the memory (ION creates uncached mapping)
> - Client reads from that uncached mapping

I think I maybe wasn't clear with my proposal. The sequence should be
like this:

 - Client allocates memory
   - If this is from a region which the CPU has mapped as cached, then
     that's not "uncached" memory - it's cached memory - and you have
     to treat it as such.
 - Client calls the DMA_BUF_IOCTL_SYNC IOCTL with flags
   DMA_BUF_SYNC_START (but this doesn't do any cache maintenance since
   there isn't any device)
 - Client mmaps the memory
   - ion creates a _cached_ mapping into the userspace process. ion
     *must not* create an uncached mapping.
 - Client reads from that cached mapping
   - It sees zeroes, as expected.

This proposal ensures that everyone will *always* see correct data if
they use the DMA APIs properly (device accesses via
dma_buf_{map,unmap}, CPU access via {begin,end}_cpu_access).

> 
> Because memory has not been cleaned (we haven't had a device yet) the
> zeros that were written to this memory could  still be in the cache (since
> they were written with a cached mapping), this means that the unprivilived
> userpace client is now potentially reading sensitive kernel data....
> 

This is precisely why you can't just "pretend" that those pages
are uncached. You can't have the same memory mapped with different
attributes and get consistent behaviour.

> > If there are already attachments, then ion_dma_buf_begin_cpu_access()
> > will sync for CPU access against all of the attached devices, and
> > again the CPU should see the right thing.
> > 
> > In the other direction, ion_dma_buf_end_cpu_access() will sync for
> > device access for all currently attached devices. If there's no
> > attached devices yet, then there's nothing to do until there is (only
> > thing accessing is CPU via a CPU-cached mapping).
> > 
> > When the first (or another) device attaches, then when it maps the
> > buffer, the map_dma_buf callback should do whatever sync-ing is needed
> > for that device.
> > 
> > I might be way off with my understanding of the various DMA APIs, but
> > this is how I think they're meant to work.
> > 
> > > > Given that as you pointed out, the kernel does still have a cached
> > > > mapping to these pages, trying to give the CPU a non-cached mapping of
> > > > those same pages while preserving consistency seems fraught. Wouldn't
> > > > it be better to make sure all CPU mappings are cached, and have CPU
> > > > clients use the dma_buf_{begin,end}_cpu_access() hooks to get
> > > > consistency where needed?
> > > > 
> > > 
> > > It is fraught, but unfortunately you can't rely on 
> > > dma_buf_{begin,end}_cpu_access() to do cache maintenance as these calls 
> > > require a device, and a device is not always available.
> > 
> > As above, if there's really no device, then no syncing is needed
> > because only the CPU is accessing the buffer, and only ever via cached
> > mappings.
> > 
> 
> Sure you can use cached mappings, but with cached memory to ensure cache 
> coherency you would always need to do cache maintenance at dma map and dma 
> unmap (since you can't rely on their being a device when 
> dma_buf_{begin,end}_cpu_access() hooks are called).

As you've said below, you can't skip cache maintenance in the general
case - the first time a device maps the buffer, you need to clean the
cache to make sure the memset(0) is seen by the device.

> But with this cached memory you get poor performance because you are 
> frequently doing cache mainteance uncessarily because there *could* be CPU access.
> 
> The reason we want to use uncached allocations, with uncached mappings, is 
> to avoid all this uncessary cache maintenance.
> 

OK I think this is the key - you don't actually care whether the
mappings are non-cached, you just don't want to pay a sync penalty if
the CPU never touched the buffer.

In that case, then to me the right thing to do is make ion use
dma_map_sg_attrs(..., DMA_ATTR_SKIP_CPU_SYNC) in ion_map_dma_buf(), if
it knows that the CPU hasn't touched the buffer (which it does - from
{begin,end}_cpu_access).

That seems to be exactly what it's there for:

 /*
  * DMA_ATTR_SKIP_CPU_SYNC: Allows platform code to skip synchronization of
  * the CPU cache for the given buffer assuming that it has been already
  * transferred to 'device' domain.
  */

The very first time you map the buffer on a device, you have to sync
(transfer to 'device' domain). After that, if you never touch the
buffer on the CPU, then you'll never pay the CPU cache maintenance
penalty.

> > > 
> > > > >
> > > > >This change has the following potential disadvantages:
> > > > >- It assumes that userpace clients won't attempt to access the buffer
> > > > >while it is being mapped as we are removing the userpspace mappings at
> > > > >this point (though it is okay for them to have it mapped)
> > > > >- It assumes that kernel clients won't hold a kernel mapping to the
> > > > buffer
> > > > >(ie dma_buf_kmap) while it is being dma-mapped. What should we do if
> > > > there
> > > > >is a kernel mapping at the time of dma mapping, fail the mapping, warn?
> > > > >- There may be a performance penalty as a result of having to fault in
> > > > the
> > > > >pages after removing the userspace mappings.
> > > > 
> > > > I wonder if the dma-buf sync ioctl might provide a way for userspace
> > > > to opt-in to when the zap/fault happens. Zap on (DMA_BUF_SYNC_WRITE |
> > > > DMA_BUF_SYNC_WRITE_END) and fault on (DMA_BUF_SYNC_READ |
> > > > DMA_BUF_SYNC_START)
> > > > 
> > > 
> > > Not sure I understand, can you elaborate. 
> > > Are you also adding a requirment that ION pages can't be mmaped during a
> > > call to dma_buf_map_attachment?
> > 
> > I was only suggesting that zapping the mappings "at random" (from
> > userspace's perspective), and then faulting them back in (also "at
> > random"), might cause unexpected and not-controllable stalls in the
> > app. We could use the ioctl hooks as an explicit indication from the
> > app that now is a good time to zap the mapping and/or fault back in
> > the whole buffer. begin_cpu_access is allowed to be a "slow"
> > operation, so apps should already be expecting to get stalled on the
> > sync ioctl.
> > 
> 
> I think we have to do the zapping when have a device with which we can
> then immediately clean the caches for the memory.
> 
> The dma_buf_map_attachement seems like a logical time to do this, we have
> a device and the user should not be doing CPU access at this time.
> There is no guarantee you will ever have a device attached when the ioctl
> hooks are called so this could mean you never get a chance to switch to
> actual uncached mappings if you only try to do this from the ioctl hooks.
> 

You can always zap in the ioctl. You just might end up having to
create a cached mapping for userspace again if a device doesn't attach
before the next time it calls the SYNC_START ioctl.

So yeah, with your approach of trying to switch userspace over to
non-cached mappings, I think map_attachment is the best place to do
the whole shebang, to avoid unnecessary work.

> The one-of hit of having to fault the pages back in is unfortunate but I
> can't seem to find a better time to do it.

That part you really could do in the SYNC_START ioctl, it's just not
symmetric.

Thanks,
-Brian

> 
> Liam
> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project

WARNING: multiple messages have this Message-ID (diff)
From: Brian Starkey <Brian.Starkey@arm.com>
To: Liam Mark <lmark@codeaurora.org>
Cc: "devel@driverdev.osuosl.org" <devel@driverdev.osuosl.org>,
	Todd Kjos <tkjos@android.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>,
	Arve Hjonnevag <arve@android.com>, nd <nd@arm.com>,
	Martijn Coenen <maco@android.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH v2] android: ion: How to properly clean caches for uncached allocations
Date: Wed, 28 Nov 2018 11:10:53 +0000	[thread overview]
Message-ID: <20181128111052.q2xokwcpuucynoft@DESKTOP-E1NTVVP.localdomain> (raw)
In-Reply-To: <alpine.DEB.2.10.1811272231460.20916@lmark-linux.qualcomm.com>

Hi Liam,

On Tue, Nov 27, 2018 at 10:46:07PM -0800, Liam Mark wrote:
> On Tue, 27 Nov 2018, Brian Starkey wrote:
> 
> > Hi Liam,
> > 
> > On Mon, Nov 26, 2018 at 08:59:44PM -0800, Liam Mark wrote:
> > > On Tue, 20 Nov 2018, Brian Starkey wrote:
> > > 
> > > > Hi Liam,
> > > > 
> > > > I'm missing a bit of context here, but I did read the v1 thread.
> > > > Please accept my apologies if I'm re-treading trodden ground.
> > > > 
> > > > I do know we're chasing nebulous ion "problems" on our end, which
> > > > certainly seem to be related to what you're trying to fix here.
> > > > 
> > > > On Thu, Nov 01, 2018 at 03:15:06PM -0700, Liam Mark wrote:
> > > > >Based on the suggestions from Laura I created a first draft for a change
> > > > >which will attempt to ensure that uncached mappings are only applied to
> > > > >ION memory who's cache lines have been cleaned.
> > > > >It does this by providing cached mappings (for uncached ION allocations)
> > > > >until the ION buffer is dma mapped and successfully cleaned, then it
> > > > drops
> > > > >the userspace mappings and when pages are accessed they are faulted back
> > > > >in and uncached mappings are created.
> > > > 
> > > > If I understand right, there's no way to portably clean the cache of
> > > > the kernel mapping before we map the pages into userspace. Is that
> > > > right?
> > > > 
> > > 
> > > Yes, it isn't always possible to clean the caches for an uncached mapping 
> > > because a device is required by the DMA APIs to do cache maintenance and 
> > > there isn't necessarily a device available (dma_buf_attach may not yet 
> > > have been called).
> > > 
> > > > Alternatively, can we just make ion refuse to give userspace a
> > > > non-cached mapping for pages which are mapped in the kernel as cached?
> > > 
> > > These pages will all be mapped as cached in the kernel for 64 bit (kernel 
> > > logical addresses) so you would always be refusing to create a non-cached mapping.
> > 
> > And that might be the sane thing to do, no?
> > 
> > AFAIK there are still pages which aren't ever mapped as cached (e.g.
> > dma_declare_coherent_memory(), anything under /reserved-memory marked
> > as no-map). If those are exposed as an ion heap, then non-cached
> > mappings would be fine, and permitted.
> > 
> 
> Sounds like you are suggesting using carveouts to support uncached?
> 

No, I'm just saying that ion can't give out uncached _CPU_ mappings
for pages which are already mapped on the CPU as cached.

> We have many multimedia use cases which use very large amounts of uncached
> memory, uncached memory is used as a performance optimization because CPU
> access won't happen so it allows us to skip cache maintenance for all the
> dma map and dma unmap calls. To create carveouts large enough to support
> to support the worst case scenarios could result in very large carveouts.
> 
> Large carveouts like this would likely result in poor memory utilizations
> (since they are tuned for worst case) which would likely have significant
> performance impacts (more limited memory causes more frequent memory
> reclaim ect...).
> 
> Also estimating for worst case could be difficult since the amount of
> uncached memory could be app dependent.
> Unfortunately I don't think this would make for a very scalable solution.
> 

Sure, I understand the desire not to use carveouts. I'm not suggesting
carveouts are a viable alternative.

> > > 
> > > > Would userspace using the dma-buf sync ioctl around its accesses do
> > > > the "right thing" in that case?
> > > > 
> > > 
> > > I don't think so, the dma-buf sync ioctl require a device to peform cache 
> > > maintenance, but as mentioned above a device may not be available.
> > > 
> > 
> > If a device didn't attach yet, then no cache maintenance is
> > necessary. The only thing accessing the memory is the CPU, via a
> > cached mapping, which should work just fine. So far so good.
> > 
> 
> Unfortunately not.
> Scenario:
> - Client allocates uncached memory.
> - Client calls the DMA_BUF_IOCTL_SYNC IOCT IOCTL with flags
> DMA_BUF_SYNC_START (but this doesn't do any cache maintenance since there
> isn't any device)
> - Client mmap the memory (ION creates uncached mapping)
> - Client reads from that uncached mapping

I think I maybe wasn't clear with my proposal. The sequence should be
like this:

 - Client allocates memory
   - If this is from a region which the CPU has mapped as cached, then
     that's not "uncached" memory - it's cached memory - and you have
     to treat it as such.
 - Client calls the DMA_BUF_IOCTL_SYNC IOCTL with flags
   DMA_BUF_SYNC_START (but this doesn't do any cache maintenance since
   there isn't any device)
 - Client mmaps the memory
   - ion creates a _cached_ mapping into the userspace process. ion
     *must not* create an uncached mapping.
 - Client reads from that cached mapping
   - It sees zeroes, as expected.

This proposal ensures that everyone will *always* see correct data if
they use the DMA APIs properly (device accesses via
dma_buf_{map,unmap}, CPU access via {begin,end}_cpu_access).

> 
> Because memory has not been cleaned (we haven't had a device yet) the
> zeros that were written to this memory could  still be in the cache (since
> they were written with a cached mapping), this means that the unprivilived
> userpace client is now potentially reading sensitive kernel data....
> 

This is precisely why you can't just "pretend" that those pages
are uncached. You can't have the same memory mapped with different
attributes and get consistent behaviour.

> > If there are already attachments, then ion_dma_buf_begin_cpu_access()
> > will sync for CPU access against all of the attached devices, and
> > again the CPU should see the right thing.
> > 
> > In the other direction, ion_dma_buf_end_cpu_access() will sync for
> > device access for all currently attached devices. If there's no
> > attached devices yet, then there's nothing to do until there is (only
> > thing accessing is CPU via a CPU-cached mapping).
> > 
> > When the first (or another) device attaches, then when it maps the
> > buffer, the map_dma_buf callback should do whatever sync-ing is needed
> > for that device.
> > 
> > I might be way off with my understanding of the various DMA APIs, but
> > this is how I think they're meant to work.
> > 
> > > > Given that as you pointed out, the kernel does still have a cached
> > > > mapping to these pages, trying to give the CPU a non-cached mapping of
> > > > those same pages while preserving consistency seems fraught. Wouldn't
> > > > it be better to make sure all CPU mappings are cached, and have CPU
> > > > clients use the dma_buf_{begin,end}_cpu_access() hooks to get
> > > > consistency where needed?
> > > > 
> > > 
> > > It is fraught, but unfortunately you can't rely on 
> > > dma_buf_{begin,end}_cpu_access() to do cache maintenance as these calls 
> > > require a device, and a device is not always available.
> > 
> > As above, if there's really no device, then no syncing is needed
> > because only the CPU is accessing the buffer, and only ever via cached
> > mappings.
> > 
> 
> Sure you can use cached mappings, but with cached memory to ensure cache 
> coherency you would always need to do cache maintenance at dma map and dma 
> unmap (since you can't rely on their being a device when 
> dma_buf_{begin,end}_cpu_access() hooks are called).

As you've said below, you can't skip cache maintenance in the general
case - the first time a device maps the buffer, you need to clean the
cache to make sure the memset(0) is seen by the device.

> But with this cached memory you get poor performance because you are 
> frequently doing cache mainteance uncessarily because there *could* be CPU access.
> 
> The reason we want to use uncached allocations, with uncached mappings, is 
> to avoid all this uncessary cache maintenance.
> 

OK I think this is the key - you don't actually care whether the
mappings are non-cached, you just don't want to pay a sync penalty if
the CPU never touched the buffer.

In that case, then to me the right thing to do is make ion use
dma_map_sg_attrs(..., DMA_ATTR_SKIP_CPU_SYNC) in ion_map_dma_buf(), if
it knows that the CPU hasn't touched the buffer (which it does - from
{begin,end}_cpu_access).

That seems to be exactly what it's there for:

 /*
  * DMA_ATTR_SKIP_CPU_SYNC: Allows platform code to skip synchronization of
  * the CPU cache for the given buffer assuming that it has been already
  * transferred to 'device' domain.
  */

The very first time you map the buffer on a device, you have to sync
(transfer to 'device' domain). After that, if you never touch the
buffer on the CPU, then you'll never pay the CPU cache maintenance
penalty.

> > > 
> > > > >
> > > > >This change has the following potential disadvantages:
> > > > >- It assumes that userpace clients won't attempt to access the buffer
> > > > >while it is being mapped as we are removing the userpspace mappings at
> > > > >this point (though it is okay for them to have it mapped)
> > > > >- It assumes that kernel clients won't hold a kernel mapping to the
> > > > buffer
> > > > >(ie dma_buf_kmap) while it is being dma-mapped. What should we do if
> > > > there
> > > > >is a kernel mapping at the time of dma mapping, fail the mapping, warn?
> > > > >- There may be a performance penalty as a result of having to fault in
> > > > the
> > > > >pages after removing the userspace mappings.
> > > > 
> > > > I wonder if the dma-buf sync ioctl might provide a way for userspace
> > > > to opt-in to when the zap/fault happens. Zap on (DMA_BUF_SYNC_WRITE |
> > > > DMA_BUF_SYNC_WRITE_END) and fault on (DMA_BUF_SYNC_READ |
> > > > DMA_BUF_SYNC_START)
> > > > 
> > > 
> > > Not sure I understand, can you elaborate. 
> > > Are you also adding a requirment that ION pages can't be mmaped during a
> > > call to dma_buf_map_attachment?
> > 
> > I was only suggesting that zapping the mappings "at random" (from
> > userspace's perspective), and then faulting them back in (also "at
> > random"), might cause unexpected and not-controllable stalls in the
> > app. We could use the ioctl hooks as an explicit indication from the
> > app that now is a good time to zap the mapping and/or fault back in
> > the whole buffer. begin_cpu_access is allowed to be a "slow"
> > operation, so apps should already be expecting to get stalled on the
> > sync ioctl.
> > 
> 
> I think we have to do the zapping when have a device with which we can
> then immediately clean the caches for the memory.
> 
> The dma_buf_map_attachement seems like a logical time to do this, we have
> a device and the user should not be doing CPU access at this time.
> There is no guarantee you will ever have a device attached when the ioctl
> hooks are called so this could mean you never get a chance to switch to
> actual uncached mappings if you only try to do this from the ioctl hooks.
> 

You can always zap in the ioctl. You just might end up having to
create a cached mapping for userspace again if a device doesn't attach
before the next time it calls the SYNC_START ioctl.

So yeah, with your approach of trying to switch userspace over to
non-cached mappings, I think map_attachment is the best place to do
the whole shebang, to avoid unnecessary work.

> The one-of hit of having to fault the pages back in is unfortunate but I
> can't seem to find a better time to do it.

That part you really could do in the SYNC_START ioctl, it's just not
symmetric.

Thanks,
-Brian

> 
> Liam
> 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2018-11-28 11:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-01  5:18 [RFC] android: ion: How to properly clean caches for uncached allocations Liam Mark
2018-03-01  5:18 ` Liam Mark
2018-03-07 23:15 ` Laura Abbott
2018-03-07 23:15   ` Laura Abbott
2018-03-09  0:45   ` Liam Mark
2018-03-09  0:45     ` Liam Mark
2018-03-09  2:01     ` Laura Abbott
2018-03-09  2:01       ` Laura Abbott
2018-11-01 22:15     ` [RFC PATCH v2] " Liam Mark
2018-11-01 22:15       ` Liam Mark
2018-11-01 22:15       ` Liam Mark
2018-11-02 19:01       ` John Stultz
2018-11-02 19:01         ` John Stultz
2018-11-02 19:01         ` John Stultz
2018-11-06 21:20         ` Liam Mark
2018-11-06 21:20           ` Liam Mark
2018-11-20 16:46       ` Brian Starkey
2018-11-20 16:46         ` Brian Starkey
2018-11-20 16:46         ` Brian Starkey
2018-11-27  4:59         ` Liam Mark
2018-11-27  4:59           ` Liam Mark
2018-11-27  4:59           ` Liam Mark
2018-11-27 10:35           ` Brian Starkey
2018-11-27 10:35             ` Brian Starkey
2018-11-27 10:35             ` Brian Starkey
2018-11-28  6:46             ` Liam Mark
2018-11-28  6:46               ` Liam Mark
2018-11-28  6:46               ` Liam Mark
2018-11-28 11:10               ` Brian Starkey [this message]
2018-11-28 11:10                 ` Brian Starkey
2018-11-28 11:10                 ` Brian Starkey
2018-11-29  7:03                 ` Liam Mark
2018-11-29  7:03                   ` Liam Mark
2018-11-29  7:03                   ` Liam Mark
2018-11-29 14:14                   ` Brian Starkey
2018-11-29 14:14                     ` Brian Starkey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181128111052.q2xokwcpuucynoft@DESKTOP-E1NTVVP.localdomain \
    --to=brian.starkey@arm.com \
    --cc=arve@android.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=john.stultz@linaro.org \
    --cc=labbott@redhat.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lmark@codeaurora.org \
    --cc=maco@android.com \
    --cc=nd@arm.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tkjos@android.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.