All of lore.kernel.org
 help / color / mirror / Atom feed
* IIO: Ensuring DMA safe buffers.
@ 2022-04-19 11:12 Jonathan Cameron
  2022-04-19 11:54 ` Catalin Marinas
  2022-04-20 12:30 ` Nuno Sá
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Cameron @ 2022-04-19 11:12 UTC (permalink / raw)
  To: linux-iio; +Cc: lars, catalin.marinas

Hi All,

For a long time IIO has been making assumption that ____cacheline_aligned
was sufficient to ensure buffers were in their own cacheline and hence
DMA safe.  We generally needed this for all SPI device drivers as many
SPI ABI calls can pass the buffer directly through to be used for DMA.
Not that regmap doesn't currently do this but it might in future (and did
in the past).  I can't remember the history of this well enough to know
why we did it that way. I don't remember the pain of debugging random
corruption caused by getting it wrong however...

However it turns out via
https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@arm.com/ 
"[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below the cache line size"
discussion that there are platforms where this isn't true and need
128 byte alignment despite a ____cacheline_size aligning to 64 bytes on
all ARM64 platforms.

(I should probably have known that as the platform I support in my day
job has 128 byte cachelines in L3 - however it's DMA coherent so
that's not a problem).

The above series also highlights that we can do much better anyway on some platforms
using the new ARCH_DMA_MINALIGN (currently it's only defined for some archs but
after that patch everyone gets it).  We should be safe to use that everywhere
we currently force ___cachline_aligned and waste a little less space on padding
which is always nice ;)

Given we have no reports of a problem with 128 byte non DMA coherent platforms
I don't propose to 'fix' this until we can make use of the new define
in the above patch set.  That is going to make a mess of backporting the
fix however.  I'm wishing we did what crypto has done and had a subsystem
specific define for this but such is life.  We will also want to be careful
that we cover the rather odd sounding case of ARCH_DMA_MINALIGN < 8 given
there are a few drivers that rely on >= 8 to ensure we can do aligned puts
of 64 bit timestamps.

I'm definitely open to opinions on how we handle this and wanted people to
have this on their radar.

+CC Catalin for info.  If you can sneak the first patch in your series
in for next cycle that would be great even if the rest takes a while longer.

Thanks,

Jonathan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IIO: Ensuring DMA safe buffers.
  2022-04-19 11:12 IIO: Ensuring DMA safe buffers Jonathan Cameron
@ 2022-04-19 11:54 ` Catalin Marinas
  2022-04-20 12:30 ` Nuno Sá
  1 sibling, 0 replies; 6+ messages in thread
From: Catalin Marinas @ 2022-04-19 11:54 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio, lars

Hi Jonathan,

On Tue, Apr 19, 2022 at 12:12:41PM +0100, Jonathan Cameron wrote:
> For a long time IIO has been making assumption that ____cacheline_aligned
> was sufficient to ensure buffers were in their own cacheline and hence
> DMA safe.

As you noticed, it's not. __cacheline_aligned is meant to align to
SMP_CACHE_BYTES which is L1_CACHE_BYTES, typically smaller than
ARCH_DMA_MINALIGN.

> The above series also highlights that we can do much better anyway on some platforms
> using the new ARCH_DMA_MINALIGN (currently it's only defined for some archs but
> after that patch everyone gets it).  We should be safe to use that everywhere
> we currently force ___cachline_aligned and waste a little less space on padding
> which is always nice ;)

I guess in the meantime you can just use ARCH_KMALLOC_MINALIGN, it gives
you the DMA guarantees. Any series decoupling the two will have to
update the use. I prefer to use ARCH_DMA_MINALIGN for DMA alignment but
still discussing the best way to address the crypto code changes.

> Given we have no reports of a problem with 128 byte non DMA coherent platforms
> I don't propose to 'fix' this until we can make use of the new define
> in the above patch set.  That is going to make a mess of backporting the
> fix however.  I'm wishing we did what crypto has done and had a subsystem
> specific define for this but such is life. 

Maybe we should add a generic __dma_aligned.

> We will also want to be careful
> that we cover the rather odd sounding case of ARCH_DMA_MINALIGN < 8 given
> there are a few drivers that rely on >= 8 to ensure we can do aligned puts
> of 64 bit timestamps.

I don't think ARCH_DMA_MINALIGN should go below sizeof(long long),
that's what ARCH_KMALLOC_MINALIGN is by default, it would break other
things.

> +CC Catalin for info.  If you can sneak the first patch in your series
> in for next cycle that would be great even if the rest takes a while longer.

I'll try but we have to agree on the crypto change because keeping
CRYPTO_MINALIGN as the smaller ARCH_KMALLOC_MINALIGN risks breaking DMA
into those structures.

-- 
Catalin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IIO: Ensuring DMA safe buffers.
  2022-04-19 11:12 IIO: Ensuring DMA safe buffers Jonathan Cameron
  2022-04-19 11:54 ` Catalin Marinas
@ 2022-04-20 12:30 ` Nuno Sá
  2022-04-21  9:05   ` Jonathan Cameron
  1 sibling, 1 reply; 6+ messages in thread
From: Nuno Sá @ 2022-04-20 12:30 UTC (permalink / raw)
  To: Jonathan Cameron, linux-iio; +Cc: lars, catalin.marinas

On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote:
> Hi All,
> 
> For a long time IIO has been making assumption that
> ____cacheline_aligned
> was sufficient to ensure buffers were in their own cacheline and
> hence
> DMA safe.  We generally needed this for all SPI device drivers as
> many
> SPI ABI calls can pass the buffer directly through to be used for
> DMA.
> Not that regmap doesn't currently do this but it might in future (and
> did
> in the past).  I can't remember the history of this well enough to
> know
> why we did it that way. I don't remember the pain of debugging random
> corruption caused by getting it wrong however...
> 
> However it turns out via
> https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@arm.com/
>  
> "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below the
> cache line size"
> discussion that there are platforms where this isn't true and need
> 128 byte alignment despite a ____cacheline_size aligning to 64 bytes
> on
> all ARM64 platforms.
> 

Oh boy... Here it goes my 5 cents on this

> (I should probably have known that as the platform I support in my
> day
> job has 128 byte cachelines in L3 - however it's DMA coherent so
> that's not a problem).
> 
> The above series also highlights that we can do much better anyway on
> some platforms
> using the new ARCH_DMA_MINALIGN (currently it's only defined for some
> archs but
> after that patch everyone gets it).  We should be safe to use that
> everywhere
> we currently force ___cachline_aligned and waste a little less space
> on padding
> which is always nice ;)
> 

The above series actually made me go an try to have some more
understanding on kmalloc. I might be completely wrong but AFAIU, for
our typical case, this won't matter much. Typically we have something
like:

struct something {
    //typically we do have more parameters
    int val;
    u8 buffer[8] ____cacheline_aligned;
};

I thing the idea is to replace this by the compile time annotation
ARCH_DMA_MINALIGN which is 128 (for arm64). That means that
kmalloc(sizeof(struct something)) will put us already in the kmalloc-
256 cache (right?) and then we will just get 256 aligned objects as
this is a power of 2. The point is that as we normally do the compile
time annotation we should still have the padding bytes as given by
sizeof().

If I understood things correctly, the above series is way more
beneficial for smaller kmalloc calls (without annotation) that won't
need the 128 align constrain or for structures with zero length arrays
(annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up in the
kmalloc-192 cache.

Does the above make any sense?

That said, I think that we also need to be careful in
iio_device_alloc(). Ideally, to save some space, we only align to a
real cache line (padding between iio and private struct) and not
ARCH_DMA_MINALIGN. Would it be enough?

> Given we have no reports of a problem with 128 byte non DMA coherent
> platforms
> I don't propose to 'fix' this until we can make use of the new define
> in the above patch set.  That is going to make a mess of backporting
> the
> fix however.  I'm wishing we did what crypto has done and had a
> subsystem
> specific define for this but such is life.  We will also want to be

We do have IIO_ALIGN but as we just found out is wrongly defined and
was never enforced. Maybe now is a good time for enforcing it :)
> 

- Nuno Sá

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IIO: Ensuring DMA safe buffers.
  2022-04-20 12:30 ` Nuno Sá
@ 2022-04-21  9:05   ` Jonathan Cameron
  2022-04-21 10:35     ` Nuno Sá
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Cameron @ 2022-04-21  9:05 UTC (permalink / raw)
  To: Nuno Sá; +Cc: linux-iio, lars, catalin.marinas

On Wed, 20 Apr 2022 14:30:36 +0200
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote:
> > Hi All,
> > 
> > For a long time IIO has been making assumption that
> > ____cacheline_aligned
> > was sufficient to ensure buffers were in their own cacheline and
> > hence
> > DMA safe.  We generally needed this for all SPI device drivers as
> > many
> > SPI ABI calls can pass the buffer directly through to be used for
> > DMA.
> > Not that regmap doesn't currently do this but it might in future (and
> > did
> > in the past).  I can't remember the history of this well enough to
> > know
> > why we did it that way. I don't remember the pain of debugging random
> > corruption caused by getting it wrong however...
> > 
> > However it turns out via
> > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@arm.com/
> >  
> > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below the
> > cache line size"
> > discussion that there are platforms where this isn't true and need
> > 128 byte alignment despite a ____cacheline_size aligning to 64 bytes
> > on
> > all ARM64 platforms.
> >   
> 
> Oh boy... Here it goes my 5 cents on this

:)

> 
> > (I should probably have known that as the platform I support in my
> > day
> > job has 128 byte cachelines in L3 - however it's DMA coherent so
> > that's not a problem).
> > 
> > The above series also highlights that we can do much better anyway on
> > some platforms
> > using the new ARCH_DMA_MINALIGN (currently it's only defined for some
> > archs but
> > after that patch everyone gets it).  We should be safe to use that
> > everywhere
> > we currently force ___cachline_aligned and waste a little less space
> > on padding
> > which is always nice ;)
> >   
> 
> The above series actually made me go an try to have some more
> understanding on kmalloc. I might be completely wrong but AFAIU, for
> our typical case, this won't matter much. Typically we have something
> like:
> 
> struct something {
>     //typically we do have more parameters
>     int val;
>     u8 buffer[8] ____cacheline_aligned;
> };
> 
> I thing the idea is to replace this by the compile time annotation
> ARCH_DMA_MINALIGN which is 128 (for arm64). That means that
> kmalloc(sizeof(struct something)) will put us already in the kmalloc-
> 256 cache (right?) and then we will just get 256 aligned objects as
> this is a power of 2. The point is that as we normally do the compile
> time annotation we should still have the padding bytes as given by
> sizeof().
> 
> If I understood things correctly, the above series is way more
> beneficial for smaller kmalloc calls (without annotation) that won't
> need the 128 align constrain or for structures with zero length arrays
> (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up in the
> kmalloc-192 cache.
> 
> Does the above make any sense?
If we keep the padding between iio_dev and iio_priv at ARCH_DMA_MINALIGN
then we should be fine because we get

0  iio_dev

128 start of iio_priv

256 start of aligned region (after changing to ARCH_DMA_MINALIGN as you say above.
383 end of potential DMA Safe region - so need to not have anything else
before this.

and I think we get in kmalloc-384 cache if it exists.  Which is fine.
> 
> That said, I think that we also need to be careful in
> iio_device_alloc(). Ideally, to save some space, we only align to a
> real cache line (padding between iio and private struct) and not
> ARCH_DMA_MINALIGN. Would it be enough?

For now I suggest stick to ARCH_DMA_MINALIGN but in future we may be able
to relax that.  We do however run into the same issue crypto does
of giving missleading information to the compiler (even it if it's
a non issue in reality) because the structures at iio_priv() will
still have compile time alignment markings to ARCH_DMA_MINALIGN
and if the padding of struct iio_dev doesn't bring us to that boundary
then the alignment isn't what he compiler thinks it is.
I'm not immediately sure how else to get that alignment as we don't
want to be hand adding padding to each driver..  We could do something
magic like introducing a second region with iio_priv2() but that
is horrible.

Walking through this.  If KMALLOC_MINALIGN
falls to a small enough number then we 'could get'

0                iio_Dev

x + delta        start of iio_priv
x + delta + 128  start of aligned region
x + delta + 256  end of aligned region.
-- any issue will occur because back end of this shares the min
dma aligned region with something else.

So at worst we get into the 384 size kmalloc cache and should be
fine anyway.

Note that we don't care that much about potential savings as there
is only one of these structures per driver instance so it'll be very
minor.

> 
> > Given we have no reports of a problem with 128 byte non DMA coherent
> > platforms
> > I don't propose to 'fix' this until we can make use of the new define
> > in the above patch set.  That is going to make a mess of backporting
> > the
> > fix however.  I'm wishing we did what crypto has done and had a
> > subsystem
> > specific define for this but such is life.  We will also want to be  
> 
> We do have IIO_ALIGN but as we just found out is wrongly defined and
> was never enforced. Maybe now is a good time for enforcing it :)

Agreed. I'll spin a series to change that to be inline with what crypto
does and use throughout the drivers.  Then we can move forwards from
there as Catalin's series moves forwards.

What fun :)

Jonathan

> >   
> 
> - Nuno Sá


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IIO: Ensuring DMA safe buffers.
  2022-04-21  9:05   ` Jonathan Cameron
@ 2022-04-21 10:35     ` Nuno Sá
  2022-04-22  7:13       ` Nuno Sá
  0 siblings, 1 reply; 6+ messages in thread
From: Nuno Sá @ 2022-04-21 10:35 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio, lars, catalin.marinas

On Thu, 2022-04-21 at 10:05 +0100, Jonathan Cameron wrote:
> On Wed, 20 Apr 2022 14:30:36 +0200
> Nuno Sá <noname.nuno@gmail.com> wrote:
> 
> > On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote:
> > > Hi All,
> > > 
> > > For a long time IIO has been making assumption that
> > > ____cacheline_aligned
> > > was sufficient to ensure buffers were in their own cacheline and
> > > hence
> > > DMA safe.  We generally needed this for all SPI device drivers as
> > > many
> > > SPI ABI calls can pass the buffer directly through to be used for
> > > DMA.
> > > Not that regmap doesn't currently do this but it might in future
> > > (and
> > > did
> > > in the past).  I can't remember the history of this well enough
> > > to
> > > know
> > > why we did it that way. I don't remember the pain of debugging
> > > random
> > > corruption caused by getting it wrong however...
> > > 
> > > However it turns out via
> > > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@arm.com/
> > >  
> > > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below the
> > > cache line size"
> > > discussion that there are platforms where this isn't true and
> > > need
> > > 128 byte alignment despite a ____cacheline_size aligning to 64
> > > bytes
> > > on
> > > all ARM64 platforms.
> > >   
> > 
> > Oh boy... Here it goes my 5 cents on this
> 
> :)
> 
> > 
> > > (I should probably have known that as the platform I support in
> > > my
> > > day
> > > job has 128 byte cachelines in L3 - however it's DMA coherent so
> > > that's not a problem).
> > > 
> > > The above series also highlights that we can do much better
> > > anyway on
> > > some platforms
> > > using the new ARCH_DMA_MINALIGN (currently it's only defined for
> > > some
> > > archs but
> > > after that patch everyone gets it).  We should be safe to use
> > > that
> > > everywhere
> > > we currently force ___cachline_aligned and waste a little less
> > > space
> > > on padding
> > > which is always nice ;)
> > >   
> > 
> > The above series actually made me go an try to have some more
> > understanding on kmalloc. I might be completely wrong but AFAIU,
> > for
> > our typical case, this won't matter much. Typically we have
> > something
> > like:
> > 
> > struct something {
> >     //typically we do have more parameters
> >     int val;
> >     u8 buffer[8] ____cacheline_aligned;
> > };
> > 
> > I thing the idea is to replace this by the compile time annotation
> > ARCH_DMA_MINALIGN which is 128 (for arm64). That means that
> > kmalloc(sizeof(struct something)) will put us already in the
> > kmalloc-
> > 256 cache (right?) and then we will just get 256 aligned objects as
> > this is a power of 2. The point is that as we normally do the
> > compile
> > time annotation we should still have the padding bytes as given by
> > sizeof().
> > 
> > If I understood things correctly, the above series is way more
> > beneficial for smaller kmalloc calls (without annotation) that
> > won't
> > need the 128 align constrain or for structures with zero length
> > arrays
> > (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up in
> > the
> > kmalloc-192 cache.
> > 
> > Does the above make any sense?
> If we keep the padding between iio_dev and iio_priv at
> ARCH_DMA_MINALIGN
> then we should be fine because we get
> 

Sure... I was just thinking in a way of saving some space. But
ARCH_DMA_MINALIGN is probably the safest way to go if we want to make
sure things are right. For the record I was not even thinking in
KMALLOC_MINALIGN but using something like cache_line_size() to get the
__real__ size. Even though I think KMALLOC_MINALIGN would work even
though our cache line might be 128. That will be already the case for
objects with KMALLOC_MINALIGN alignment and with buffers annotated with
ARCH_DMA_MINALIGN.

> 0  iio_dev
> 
> 128 start of iio_priv
> 
> 256 start of aligned region (after changing to ARCH_DMA_MINALIGN as
> you say above.
> 383 end of potential DMA Safe region - so need to not have anything
> else
> before this.
> 
> and I think we get in kmalloc-384 cache if it exists.  Which is fine.

Probably we end up in 512 which is also fine :)

> > 
> > That said, I think that we also need to be careful in
> > iio_device_alloc(). Ideally, to save some space, we only align to a
> > real cache line (padding between iio and private struct) and not
> > ARCH_DMA_MINALIGN. Would it be enough?
> 
> For now I suggest stick to ARCH_DMA_MINALIGN but in future we may be
> able
> to relax that.  We do however run into the same issue crypto does
> of giving missleading information to the compiler (even it if it's
> a non issue in reality) because the structures at iio_priv() will
> still have compile time alignment markings to ARCH_DMA_MINALIGN
> and if the padding of struct iio_dev doesn't bring us to that
> boundary
> then the alignment isn't what he compiler thinks it is.

I think in our case this might not be a problem as I think we will
always end up in a kmalloc cache where the iio_dev alignment will
always be bigger than ARCH_DMA_MINALIGN (multiple of) so iio_priv()
should always start at an ARCH_DMA_MINALIGN aligned address (not sure
though). If I understood your point correctly this is already what will
happen in other objects after that series because we can mark a DMA
buffer with 128 alignment and the container object might end up with
64. Apparently no issue observed so far...

> I'm not immediately sure how else to get that alignment as we don't
> want to be hand adding padding to each driver..  We could do
> something
> magic like introducing a second region with iio_priv2() but that
> is horrible.
> 
> Walking through this.  If KMALLOC_MINALIGN
> falls to a small enough number then we 'could get'
> 
> 0                iio_Dev
> 
> x + delta        start of iio_priv
> x + delta + 128  start of aligned region
> x + delta + 256  end of aligned region.
> -- any issue will occur because back end of this shares the min
> dma aligned region with something else.
> 
> So at worst we get into the 384 size kmalloc cache and should be
> fine anyway.
> 
> Note that we don't care that much about potential savings as there
> is only one of these structures per driver instance so it'll be very
> minor.
> 

Given the above point, agreed with using ARCH_DMA_MINALIGN (also safer
as I'm not 100% sure about what I have been saying :))


> > 
> > > Given we have no reports of a problem with 128 byte non DMA
> > > coherent
> > > platforms
> > > I don't propose to 'fix' this until we can make use of the new
> > > define
> > > in the above patch set.  That is going to make a mess of
> > > backporting
> > > the
> > > fix however.  I'm wishing we did what crypto has done and had a
> > > subsystem
> > > specific define for this but such is life.  We will also want to
> > > be  
> > 
> > We do have IIO_ALIGN but as we just found out is wrongly defined
> > and
> > was never enforced. Maybe now is a good time for enforcing it :)
> 
> Agreed. I'll spin a series to change that to be inline with what
> crypto
> does and use throughout the drivers.  Then we can move forwards from
> there as Catalin's series moves forwards.
> 
> What fun :)
> 

Cool, I'll help with reviewing...

- Nuno Sá 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IIO: Ensuring DMA safe buffers.
  2022-04-21 10:35     ` Nuno Sá
@ 2022-04-22  7:13       ` Nuno Sá
  0 siblings, 0 replies; 6+ messages in thread
From: Nuno Sá @ 2022-04-22  7:13 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio, lars, catalin.marinas

On Thu, 2022-04-21 at 12:35 +0200, Nuno Sá wrote:
> On Thu, 2022-04-21 at 10:05 +0100, Jonathan Cameron wrote:
> > On Wed, 20 Apr 2022 14:30:36 +0200
> > Nuno Sá <noname.nuno@gmail.com> wrote:
> > 
> > > On Tue, 2022-04-19 at 12:12 +0100, Jonathan Cameron wrote:
> > > > Hi All,
> > > > 
> > > > For a long time IIO has been making assumption that
> > > > ____cacheline_aligned
> > > > was sufficient to ensure buffers were in their own cacheline
> > > > and
> > > > hence
> > > > DMA safe.  We generally needed this for all SPI device drivers
> > > > as
> > > > many
> > > > SPI ABI calls can pass the buffer directly through to be used
> > > > for
> > > > DMA.
> > > > Not that regmap doesn't currently do this but it might in
> > > > future
> > > > (and
> > > > did
> > > > in the past).  I can't remember the history of this well enough
> > > > to
> > > > know
> > > > why we did it that way. I don't remember the pain of debugging
> > > > random
> > > > corruption caused by getting it wrong however...
> > > > 
> > > > However it turns out via
> > > > https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@arm.com/
> > > >  
> > > > "[PATCH 00/10] mm, arm64: Reduce ARCH_KMALLOC_MINALIGN below
> > > > the
> > > > cache line size"
> > > > discussion that there are platforms where this isn't true and
> > > > need
> > > > 128 byte alignment despite a ____cacheline_size aligning to 64
> > > > bytes
> > > > on
> > > > all ARM64 platforms.
> > > >   
> > > 
> > > Oh boy... Here it goes my 5 cents on this
> > 
> > :)
> > 
> > > 
> > > > (I should probably have known that as the platform I support in
> > > > my
> > > > day
> > > > job has 128 byte cachelines in L3 - however it's DMA coherent
> > > > so
> > > > that's not a problem).
> > > > 
> > > > The above series also highlights that we can do much better
> > > > anyway on
> > > > some platforms
> > > > using the new ARCH_DMA_MINALIGN (currently it's only defined
> > > > for
> > > > some
> > > > archs but
> > > > after that patch everyone gets it).  We should be safe to use
> > > > that
> > > > everywhere
> > > > we currently force ___cachline_aligned and waste a little less
> > > > space
> > > > on padding
> > > > which is always nice ;)
> > > >   
> > > 
> > > The above series actually made me go an try to have some more
> > > understanding on kmalloc. I might be completely wrong but AFAIU,
> > > for
> > > our typical case, this won't matter much. Typically we have
> > > something
> > > like:
> > > 
> > > struct something {
> > >     //typically we do have more parameters
> > >     int val;
> > >     u8 buffer[8] ____cacheline_aligned;
> > > };
> > > 
> > > I thing the idea is to replace this by the compile time
> > > annotation
> > > ARCH_DMA_MINALIGN which is 128 (for arm64). That means that
> > > kmalloc(sizeof(struct something)) will put us already in the
> > > kmalloc-
> > > 256 cache (right?) and then we will just get 256 aligned objects
> > > as
> > > this is a power of 2. The point is that as we normally do the
> > > compile
> > > time annotation we should still have the padding bytes as given
> > > by
> > > sizeof().
> > > 
> > > If I understood things correctly, the above series is way more
> > > beneficial for smaller kmalloc calls (without annotation) that
> > > won't
> > > need the 128 align constrain or for structures with zero length
> > > arrays
> > > (annotated with ARCH_DMA_MINALIGN) where kmalloc() calls end up
> > > in
> > > the
> > > kmalloc-192 cache.
> > > 
> > > Does the above make any sense?
> > If we keep the padding between iio_dev and iio_priv at
> > ARCH_DMA_MINALIGN
> > then we should be fine because we get
> > 
> 
> Sure... I was just thinking in a way of saving some space. But
> ARCH_DMA_MINALIGN is probably the safest way to go if we want to make
> sure things are right. For the record I was not even thinking in
> KMALLOC_MINALIGN but using something like cache_line_size() to get
> the
> __real__ size. Even though I think KMALLOC_MINALIGN would work even
> though our cache line might be 128. That will be already the case for

Well, this is total nonsense from me... KMALLOC_MINALIGN will fail
miserably if our cache line is bigger than it. For arm64 that would be
on KMALLOC_MINALIGN==64 and cache line==128. That said, aligning
iio_priv() to a cache line would be safe and we could save some space
in case the line is < ARCH_DMA_MINALIGN (which typically is).

Anyways, as Jonathan pointed out we typically just have one IIO object
per driver so the hassle is probably not worth it.

- Nuno Sá



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-22  7:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 11:12 IIO: Ensuring DMA safe buffers Jonathan Cameron
2022-04-19 11:54 ` Catalin Marinas
2022-04-20 12:30 ` Nuno Sá
2022-04-21  9:05   ` Jonathan Cameron
2022-04-21 10:35     ` Nuno Sá
2022-04-22  7:13       ` Nuno Sá

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.