All of lore.kernel.org
 help / color / mirror / Atom feed
* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 10:59 ` Masahiro Yamada
  0 siblings, 0 replies; 22+ messages in thread
From: Masahiro Yamada @ 2017-03-08 10:59 UTC (permalink / raw)
  To: dmaengine, linux-arm-kernel
  Cc: Russell King - ARM Linux, Arnd Bergmann,
	Linux Kernel Mailing List, Tejun Heo, James E.J. Bottomley,
	David S. Miller, masahiroy

Hi experts,

I have a question about
how to allocate DMA-safe buffer.


In my understanding, kmalloc() returns
memory with DMA safe alignment
in order to avoid cache-sharing problem when used for DMA.

The alignment is decided by ARCH_DMA_MINALIGN.
For example, on modern ARM 32bit boards, this value is typically 64.
So, memory returned by kmalloc() has
at least 64 byte alignment.


On the other hand, devm_kmalloc() does not return
enough-aligned memory.


On my board (ARM 32bit), devm_kmalloc() returns
(ARCH_DMA_MINALIGN aligned address) + 0x10.



The reason of the offset 0x10 is obvious.

struct devres {
        struct devres_node node;
        /* -- 3 pointers */
        unsigned long long data[]; /* guarantee ull alignment */
};


Management data is located at the top of struct devres.
Then, devm_kmalloc() returns dr->data.

The "unsigned long long" guarantees
the returned memory has 0x10 alignment,
but I think this may not be enough for DMA.

I noticed this when I was seeing drivers/mtd/nand/denali.c

The code looks as follows:


        denali->buf.buf = devm_kzalloc(denali->dev,
                             mtd->writesize + mtd->oobsize,
                             GFP_KERNEL);
        if (!denali->buf.buf) {
                ret = -ENOMEM;
                goto failed_req_irq;
        }

        /* Is 32-bit DMA supported? */
        ret = dma_set_mask(denali->dev, DMA_BIT_MASK(32));
        if (ret) {
                dev_err(denali->dev, "No usable DMA configuration\n");
                goto failed_req_irq;
        }

        denali->buf.dma_buf = dma_map_single(denali->dev, denali->buf.buf,
                             mtd->writesize + mtd->oobsize,
                             DMA_BIDIRECTIONAL);




Memory buffer is allocated by devm_kzalloc(), then
passed to dma_map_single().



Could this be a potential problem in general?

Is devm_kmalloc() not recommended
for buffer that can be DMA-mapped?


Any advice is appreciated.


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 10:59 ` Masahiro Yamada
  0 siblings, 0 replies; 22+ messages in thread
From: Masahiro Yamada @ 2017-03-08 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi experts,

I have a question about
how to allocate DMA-safe buffer.


In my understanding, kmalloc() returns
memory with DMA safe alignment
in order to avoid cache-sharing problem when used for DMA.

The alignment is decided by ARCH_DMA_MINALIGN.
For example, on modern ARM 32bit boards, this value is typically 64.
So, memory returned by kmalloc() has
at least 64 byte alignment.


On the other hand, devm_kmalloc() does not return
enough-aligned memory.


On my board (ARM 32bit), devm_kmalloc() returns
(ARCH_DMA_MINALIGN aligned address) + 0x10.



The reason of the offset 0x10 is obvious.

struct devres {
        struct devres_node node;
        /* -- 3 pointers */
        unsigned long long data[]; /* guarantee ull alignment */
};


Management data is located at the top of struct devres.
Then, devm_kmalloc() returns dr->data.

The "unsigned long long" guarantees
the returned memory has 0x10 alignment,
but I think this may not be enough for DMA.

I noticed this when I was seeing drivers/mtd/nand/denali.c

The code looks as follows:


        denali->buf.buf = devm_kzalloc(denali->dev,
                             mtd->writesize + mtd->oobsize,
                             GFP_KERNEL);
        if (!denali->buf.buf) {
                ret = -ENOMEM;
                goto failed_req_irq;
        }

        /* Is 32-bit DMA supported? */
        ret = dma_set_mask(denali->dev, DMA_BIT_MASK(32));
        if (ret) {
                dev_err(denali->dev, "No usable DMA configuration\n");
                goto failed_req_irq;
        }

        denali->buf.dma_buf = dma_map_single(denali->dev, denali->buf.buf,
                             mtd->writesize + mtd->oobsize,
                             DMA_BIDIRECTIONAL);




Memory buffer is allocated by devm_kzalloc(), then
passed to dma_map_single().



Could this be a potential problem in general?

Is devm_kmalloc() not recommended
for buffer that can be DMA-mapped?


Any advice is appreciated.


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 10:59 ` Masahiro Yamada
@ 2017-03-08 11:15   ` Robin Murphy
  -1 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2017-03-08 11:15 UTC (permalink / raw)
  To: Masahiro Yamada, dmaengine, linux-arm-kernel
  Cc: Russell King - ARM Linux, Arnd Bergmann, masahiroy,
	Linux Kernel Mailing List, James E.J. Bottomley, Tejun Heo,
	David S. Miller

On 08/03/17 10:59, Masahiro Yamada wrote:
> Hi experts,
> 
> I have a question about
> how to allocate DMA-safe buffer.
> 
> 
> In my understanding, kmalloc() returns
> memory with DMA safe alignment
> in order to avoid cache-sharing problem when used for DMA.
> 
> The alignment is decided by ARCH_DMA_MINALIGN.
> For example, on modern ARM 32bit boards, this value is typically 64.
> So, memory returned by kmalloc() has
> at least 64 byte alignment.
> 
> 
> On the other hand, devm_kmalloc() does not return
> enough-aligned memory.

How so? If anything returned by kmalloc() is guaranteed to occupy some
multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
falling into the same cache line, I don't see how stealing the first 16
bytes *of a single allocation* could make it start sharing cache lines
with another? :/

If a particular device has a problem with:

p = kmalloc(...);
d = dma_map_single(p + 0x10, ...);
do_something_with(d);

that's a separate issue altogether.

Robin.

> On my board (ARM 32bit), devm_kmalloc() returns
> (ARCH_DMA_MINALIGN aligned address) + 0x10.
> 
> 
> 
> The reason of the offset 0x10 is obvious.
> 
> struct devres {
>         struct devres_node node;
>         /* -- 3 pointers */
>         unsigned long long data[]; /* guarantee ull alignment */
> };
> 
> 
> Management data is located at the top of struct devres.
> Then, devm_kmalloc() returns dr->data.
> 
> The "unsigned long long" guarantees
> the returned memory has 0x10 alignment,
> but I think this may not be enough for DMA.
> 
> I noticed this when I was seeing drivers/mtd/nand/denali.c
> 
> The code looks as follows:
> 
> 
>         denali->buf.buf = devm_kzalloc(denali->dev,
>                              mtd->writesize + mtd->oobsize,
>                              GFP_KERNEL);
>         if (!denali->buf.buf) {
>                 ret = -ENOMEM;
>                 goto failed_req_irq;
>         }
> 
>         /* Is 32-bit DMA supported? */
>         ret = dma_set_mask(denali->dev, DMA_BIT_MASK(32));
>         if (ret) {
>                 dev_err(denali->dev, "No usable DMA configuration\n");
>                 goto failed_req_irq;
>         }
> 
>         denali->buf.dma_buf = dma_map_single(denali->dev, denali->buf.buf,
>                              mtd->writesize + mtd->oobsize,
>                              DMA_BIDIRECTIONAL);
> 
> 
> 
> 
> Memory buffer is allocated by devm_kzalloc(), then
> passed to dma_map_single().
> 
> 
> 
> Could this be a potential problem in general?
> 
> Is devm_kmalloc() not recommended
> for buffer that can be DMA-mapped?
> 
> 
> Any advice is appreciated.
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 11:15   ` Robin Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2017-03-08 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/03/17 10:59, Masahiro Yamada wrote:
> Hi experts,
> 
> I have a question about
> how to allocate DMA-safe buffer.
> 
> 
> In my understanding, kmalloc() returns
> memory with DMA safe alignment
> in order to avoid cache-sharing problem when used for DMA.
> 
> The alignment is decided by ARCH_DMA_MINALIGN.
> For example, on modern ARM 32bit boards, this value is typically 64.
> So, memory returned by kmalloc() has
> at least 64 byte alignment.
> 
> 
> On the other hand, devm_kmalloc() does not return
> enough-aligned memory.

How so? If anything returned by kmalloc() is guaranteed to occupy some
multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
falling into the same cache line, I don't see how stealing the first 16
bytes *of a single allocation* could make it start sharing cache lines
with another? :/

If a particular device has a problem with:

p = kmalloc(...);
d = dma_map_single(p + 0x10, ...);
do_something_with(d);

that's a separate issue altogether.

Robin.

> On my board (ARM 32bit), devm_kmalloc() returns
> (ARCH_DMA_MINALIGN aligned address) + 0x10.
> 
> 
> 
> The reason of the offset 0x10 is obvious.
> 
> struct devres {
>         struct devres_node node;
>         /* -- 3 pointers */
>         unsigned long long data[]; /* guarantee ull alignment */
> };
> 
> 
> Management data is located at the top of struct devres.
> Then, devm_kmalloc() returns dr->data.
> 
> The "unsigned long long" guarantees
> the returned memory has 0x10 alignment,
> but I think this may not be enough for DMA.
> 
> I noticed this when I was seeing drivers/mtd/nand/denali.c
> 
> The code looks as follows:
> 
> 
>         denali->buf.buf = devm_kzalloc(denali->dev,
>                              mtd->writesize + mtd->oobsize,
>                              GFP_KERNEL);
>         if (!denali->buf.buf) {
>                 ret = -ENOMEM;
>                 goto failed_req_irq;
>         }
> 
>         /* Is 32-bit DMA supported? */
>         ret = dma_set_mask(denali->dev, DMA_BIT_MASK(32));
>         if (ret) {
>                 dev_err(denali->dev, "No usable DMA configuration\n");
>                 goto failed_req_irq;
>         }
> 
>         denali->buf.dma_buf = dma_map_single(denali->dev, denali->buf.buf,
>                              mtd->writesize + mtd->oobsize,
>                              DMA_BIDIRECTIONAL);
> 
> 
> 
> 
> Memory buffer is allocated by devm_kzalloc(), then
> passed to dma_map_single().
> 
> 
> 
> Could this be a potential problem in general?
> 
> Is devm_kmalloc() not recommended
> for buffer that can be DMA-mapped?
> 
> 
> Any advice is appreciated.
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 11:15   ` Robin Murphy
@ 2017-03-08 18:06     ` Masahiro Yamada
  -1 siblings, 0 replies; 22+ messages in thread
From: Masahiro Yamada @ 2017-03-08 18:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: dmaengine, linux-arm-kernel, Russell King - ARM Linux,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

Hi Robin,


2017-03-08 20:15 GMT+09:00 Robin Murphy <robin.murphy@arm.com>:
> On 08/03/17 10:59, Masahiro Yamada wrote:
>> Hi experts,
>>
>> I have a question about
>> how to allocate DMA-safe buffer.
>>
>>
>> In my understanding, kmalloc() returns
>> memory with DMA safe alignment
>> in order to avoid cache-sharing problem when used for DMA.
>>
>> The alignment is decided by ARCH_DMA_MINALIGN.
>> For example, on modern ARM 32bit boards, this value is typically 64.
>> So, memory returned by kmalloc() has
>> at least 64 byte alignment.
>>
>>
>> On the other hand, devm_kmalloc() does not return
>> enough-aligned memory.
>
> How so? If anything returned by kmalloc() is guaranteed to occupy some
> multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
> falling into the same cache line, I don't see how stealing the first 16
> bytes *of a single allocation* could make it start sharing cache lines
> with another? :/

I just thought of traverse of the linked list of devres_node
on a different thread, but it should not happen.

Please forget my stupid question.


> If a particular device has a problem with:
>
> p = kmalloc(...);
> d = dma_map_single(p + 0x10, ...);
> do_something_with(d);
>
> that's a separate issue altogether.

Right.
Each device has own requirement for DMA alignment.

Thanks!


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 18:06     ` Masahiro Yamada
  0 siblings, 0 replies; 22+ messages in thread
From: Masahiro Yamada @ 2017-03-08 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,


2017-03-08 20:15 GMT+09:00 Robin Murphy <robin.murphy@arm.com>:
> On 08/03/17 10:59, Masahiro Yamada wrote:
>> Hi experts,
>>
>> I have a question about
>> how to allocate DMA-safe buffer.
>>
>>
>> In my understanding, kmalloc() returns
>> memory with DMA safe alignment
>> in order to avoid cache-sharing problem when used for DMA.
>>
>> The alignment is decided by ARCH_DMA_MINALIGN.
>> For example, on modern ARM 32bit boards, this value is typically 64.
>> So, memory returned by kmalloc() has
>> at least 64 byte alignment.
>>
>>
>> On the other hand, devm_kmalloc() does not return
>> enough-aligned memory.
>
> How so? If anything returned by kmalloc() is guaranteed to occupy some
> multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
> falling into the same cache line, I don't see how stealing the first 16
> bytes *of a single allocation* could make it start sharing cache lines
> with another? :/

I just thought of traverse of the linked list of devres_node
on a different thread, but it should not happen.

Please forget my stupid question.


> If a particular device has a problem with:
>
> p = kmalloc(...);
> d = dma_map_single(p + 0x10, ...);
> do_something_with(d);
>
> that's a separate issue altogether.

Right.
Each device has own requirement for DMA alignment.

Thanks!


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 18:06     ` Masahiro Yamada
@ 2017-03-08 19:48       ` Lars-Peter Clausen
  -1 siblings, 0 replies; 22+ messages in thread
From: Lars-Peter Clausen @ 2017-03-08 19:48 UTC (permalink / raw)
  To: Masahiro Yamada, Robin Murphy
  Cc: dmaengine, linux-arm-kernel, Russell King - ARM Linux,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

On 03/08/2017 07:06 PM, Masahiro Yamada wrote:
> Hi Robin,
> 
> 
> 2017-03-08 20:15 GMT+09:00 Robin Murphy <robin.murphy@arm.com>:
>> On 08/03/17 10:59, Masahiro Yamada wrote:
>>> Hi experts,
>>>
>>> I have a question about
>>> how to allocate DMA-safe buffer.
>>>
>>>
>>> In my understanding, kmalloc() returns
>>> memory with DMA safe alignment
>>> in order to avoid cache-sharing problem when used for DMA.
>>>
>>> The alignment is decided by ARCH_DMA_MINALIGN.
>>> For example, on modern ARM 32bit boards, this value is typically 64.
>>> So, memory returned by kmalloc() has
>>> at least 64 byte alignment.
>>>
>>>
>>> On the other hand, devm_kmalloc() does not return
>>> enough-aligned memory.
>>
>> How so? If anything returned by kmalloc() is guaranteed to occupy some
>> multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
>> falling into the same cache line, I don't see how stealing the first 16
>> bytes *of a single allocation* could make it start sharing cache lines
>> with another? :/
> 
> I just thought of traverse of the linked list of devres_node
> on a different thread, but it should not happen.

When the DMA memory is mapped for reading from the device the associated
cachelines are invalidated without writeback. There is no guarantee that the
changes made to the devres_node have made it to main memory yet, or is
there? So those updates could be lost and you'd get a data structure corruption.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 19:48       ` Lars-Peter Clausen
  0 siblings, 0 replies; 22+ messages in thread
From: Lars-Peter Clausen @ 2017-03-08 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/08/2017 07:06 PM, Masahiro Yamada wrote:
> Hi Robin,
> 
> 
> 2017-03-08 20:15 GMT+09:00 Robin Murphy <robin.murphy@arm.com>:
>> On 08/03/17 10:59, Masahiro Yamada wrote:
>>> Hi experts,
>>>
>>> I have a question about
>>> how to allocate DMA-safe buffer.
>>>
>>>
>>> In my understanding, kmalloc() returns
>>> memory with DMA safe alignment
>>> in order to avoid cache-sharing problem when used for DMA.
>>>
>>> The alignment is decided by ARCH_DMA_MINALIGN.
>>> For example, on modern ARM 32bit boards, this value is typically 64.
>>> So, memory returned by kmalloc() has
>>> at least 64 byte alignment.
>>>
>>>
>>> On the other hand, devm_kmalloc() does not return
>>> enough-aligned memory.
>>
>> How so? If anything returned by kmalloc() is guaranteed to occupy some
>> multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
>> falling into the same cache line, I don't see how stealing the first 16
>> bytes *of a single allocation* could make it start sharing cache lines
>> with another? :/
> 
> I just thought of traverse of the linked list of devres_node
> on a different thread, but it should not happen.

When the DMA memory is mapped for reading from the device the associated
cachelines are invalidated without writeback. There is no guarantee that the
changes made to the devres_node have made it to main memory yet, or is
there? So those updates could be lost and you'd get a data structure corruption.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 19:48       ` Lars-Peter Clausen
@ 2017-03-08 19:59         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-03-08 19:59 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: Masahiro Yamada, Robin Murphy, dmaengine, linux-arm-kernel,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
> When the DMA memory is mapped for reading from the device the associated
> cachelines are invalidated without writeback. There is no guarantee that
> the changes made to the devres_node have made it to main memory yet, or
> is there?

That is incorrect.

Overlapping cache lines are always written back on transitions from CPU
to device ownership of the buffer (eg, dma_map_*().)

Updates that are made by the CPU on overlapping cache lines while the
memory is mapped for DMA may end up doing one of two things: either
overwriting the newly DMA'd data, or being lost altogether.

In the case of devm_* list manipulations, these should only ever happen
during device probe and tear down, and if DMA is active at those times,
the driver is seriously buggy.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 19:59         ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-03-08 19:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
> When the DMA memory is mapped for reading from the device the associated
> cachelines are invalidated without writeback. There is no guarantee that
> the changes made to the devres_node have made it to main memory yet, or
> is there?

That is incorrect.

Overlapping cache lines are always written back on transitions from CPU
to device ownership of the buffer (eg, dma_map_*().)

Updates that are made by the CPU on overlapping cache lines while the
memory is mapped for DMA may end up doing one of two things: either
overwriting the newly DMA'd data, or being lost altogether.

In the case of devm_* list manipulations, these should only ever happen
during device probe and tear down, and if DMA is active at those times,
the driver is seriously buggy.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 19:59         ` Russell King - ARM Linux
@ 2017-03-08 20:44           ` Lars-Peter Clausen
  -1 siblings, 0 replies; 22+ messages in thread
From: Lars-Peter Clausen @ 2017-03-08 20:44 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Masahiro Yamada, Robin Murphy, dmaengine, linux-arm-kernel,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
> On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
>> When the DMA memory is mapped for reading from the device the associated
>> cachelines are invalidated without writeback. There is no guarantee that
>> the changes made to the devres_node have made it to main memory yet, or
>> is there?
> 
> That is incorrect.
> 
> Overlapping cache lines are always written back on transitions from CPU
> to device ownership of the buffer (eg, dma_map_*().)

On ARM. But my understanding is that this is not a universal requirement
according to DMA-API.txt. It says that mappings must be cache line aligned
and otherwise behavior is undefined.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 20:44           ` Lars-Peter Clausen
  0 siblings, 0 replies; 22+ messages in thread
From: Lars-Peter Clausen @ 2017-03-08 20:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
> On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
>> When the DMA memory is mapped for reading from the device the associated
>> cachelines are invalidated without writeback. There is no guarantee that
>> the changes made to the devres_node have made it to main memory yet, or
>> is there?
> 
> That is incorrect.
> 
> Overlapping cache lines are always written back on transitions from CPU
> to device ownership of the buffer (eg, dma_map_*().)

On ARM. But my understanding is that this is not a universal requirement
according to DMA-API.txt. It says that mappings must be cache line aligned
and otherwise behavior is undefined.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 20:44           ` Lars-Peter Clausen
@ 2017-03-08 21:19             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-03-08 21:19 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: Masahiro Yamada, Robin Murphy, dmaengine, linux-arm-kernel,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

On Wed, Mar 08, 2017 at 09:44:17PM +0100, Lars-Peter Clausen wrote:
> On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
> > On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
> >> When the DMA memory is mapped for reading from the device the associated
> >> cachelines are invalidated without writeback. There is no guarantee that
> >> the changes made to the devres_node have made it to main memory yet, or
> >> is there?
> > 
> > That is incorrect.
> > 
> > Overlapping cache lines are always written back on transitions from CPU
> > to device ownership of the buffer (eg, dma_map_*().)
> 
> On ARM. But my understanding is that this is not a universal requirement
> according to DMA-API.txt. It says that mappings must be cache line aligned
> and otherwise behavior is undefined.

There is no use of the term "undefined" in the document you refer to.

There is the recommendation that regions are cache line aligned, but
there is quite a bit of history in the kernel where DMA has been to
regions that are not cache line aligned, and where the DMA region
overlaps with data that has recent accesses made to it.

The situation is improving (in that DMA buffers are being allocated
separately, rather than being part of some other structure) but that
doesn't mean that it's safe to assume that overlapping cache lines can
be invalidated.

In any case, DMA with devm allocated buffers really is not a good idea.
The lifetime of devm allocated buffers is between their allocation and
the point that the driver is unbound (either via probe failure or
removal.)  If that turns out to be shorter than DMA, then you end up
scribbing over free'd memory.

Moreover, any memory that is dma_map'd must be dma_unmap'd before
being freed.

So, unless we're going to get devm_* stuff for DMA API and for ensuring
that DMA is disabled, I don't think using devm_k*alloc() with DMA is
really a good idea.

Take the fact that it returns memory that is not cache line aligned to
be a big clue that it shouldn't be used for this purpose.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 21:19             ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-03-08 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 08, 2017 at 09:44:17PM +0100, Lars-Peter Clausen wrote:
> On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
> > On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
> >> When the DMA memory is mapped for reading from the device the associated
> >> cachelines are invalidated without writeback. There is no guarantee that
> >> the changes made to the devres_node have made it to main memory yet, or
> >> is there?
> > 
> > That is incorrect.
> > 
> > Overlapping cache lines are always written back on transitions from CPU
> > to device ownership of the buffer (eg, dma_map_*().)
> 
> On ARM. But my understanding is that this is not a universal requirement
> according to DMA-API.txt. It says that mappings must be cache line aligned
> and otherwise behavior is undefined.

There is no use of the term "undefined" in the document you refer to.

There is the recommendation that regions are cache line aligned, but
there is quite a bit of history in the kernel where DMA has been to
regions that are not cache line aligned, and where the DMA region
overlaps with data that has recent accesses made to it.

The situation is improving (in that DMA buffers are being allocated
separately, rather than being part of some other structure) but that
doesn't mean that it's safe to assume that overlapping cache lines can
be invalidated.

In any case, DMA with devm allocated buffers really is not a good idea.
The lifetime of devm allocated buffers is between their allocation and
the point that the driver is unbound (either via probe failure or
removal.)  If that turns out to be shorter than DMA, then you end up
scribbing over free'd memory.

Moreover, any memory that is dma_map'd must be dma_unmap'd before
being freed.

So, unless we're going to get devm_* stuff for DMA API and for ensuring
that DMA is disabled, I don't think using devm_k*alloc() with DMA is
really a good idea.

Take the fact that it returns memory that is not cache line aligned to
be a big clue that it shouldn't be used for this purpose.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 21:19             ` Russell King - ARM Linux
@ 2017-03-08 21:33               ` Lars-Peter Clausen
  -1 siblings, 0 replies; 22+ messages in thread
From: Lars-Peter Clausen @ 2017-03-08 21:33 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Masahiro Yamada, Robin Murphy, dmaengine, linux-arm-kernel,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

On 03/08/2017 10:19 PM, Russell King - ARM Linux wrote:
> On Wed, Mar 08, 2017 at 09:44:17PM +0100, Lars-Peter Clausen wrote:
>> On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
>>> On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
>>>> When the DMA memory is mapped for reading from the device the associated
>>>> cachelines are invalidated without writeback. There is no guarantee that
>>>> the changes made to the devres_node have made it to main memory yet, or
>>>> is there?
>>>
>>> That is incorrect.
>>>
>>> Overlapping cache lines are always written back on transitions from CPU
>>> to device ownership of the buffer (eg, dma_map_*().)
>>
>> On ARM. But my understanding is that this is not a universal requirement
>> according to DMA-API.txt. It says that mappings must be cache line aligned
>> and otherwise behavior is undefined.
> 
> There is no use of the term "undefined" in the document you refer to.
> 
> There is the recommendation that regions are cache line aligned, but
> there is quite a bit of history in the kernel where DMA has been to
> regions that are not cache line aligned, and where the DMA region
> overlaps with data that has recent accesses made to it.

I says: "Warnings:  Memory coherency operates at a granularity called the
cache line width.  In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache line
boundary and end exactly on one." That doesn't sound like a recommendation
to me. "should" usually implies a recommendation while "must" indicates a
hard requirement.

I believe e.g. MIPS will align the address by masking the lower bits off,
without any flushes. Wouldn't be surprised if other architectures do the same.

> 
> The situation is improving (in that DMA buffers are being allocated
> separately, rather than being part of some other structure) but that
> doesn't mean that it's safe to assume that overlapping cache lines can
> be invalidated.
> 
> In any case, DMA with devm allocated buffers really is not a good idea.

I very much agree with that part.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-08 21:33               ` Lars-Peter Clausen
  0 siblings, 0 replies; 22+ messages in thread
From: Lars-Peter Clausen @ 2017-03-08 21:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/08/2017 10:19 PM, Russell King - ARM Linux wrote:
> On Wed, Mar 08, 2017 at 09:44:17PM +0100, Lars-Peter Clausen wrote:
>> On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
>>> On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
>>>> When the DMA memory is mapped for reading from the device the associated
>>>> cachelines are invalidated without writeback. There is no guarantee that
>>>> the changes made to the devres_node have made it to main memory yet, or
>>>> is there?
>>>
>>> That is incorrect.
>>>
>>> Overlapping cache lines are always written back on transitions from CPU
>>> to device ownership of the buffer (eg, dma_map_*().)
>>
>> On ARM. But my understanding is that this is not a universal requirement
>> according to DMA-API.txt. It says that mappings must be cache line aligned
>> and otherwise behavior is undefined.
> 
> There is no use of the term "undefined" in the document you refer to.
> 
> There is the recommendation that regions are cache line aligned, but
> there is quite a bit of history in the kernel where DMA has been to
> regions that are not cache line aligned, and where the DMA region
> overlaps with data that has recent accesses made to it.

I says: "Warnings:  Memory coherency operates at a granularity called the
cache line width.  In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache line
boundary and end exactly on one." That doesn't sound like a recommendation
to me. "should" usually implies a recommendation while "must" indicates a
hard requirement.

I believe e.g. MIPS will align the address by masking the lower bits off,
without any flushes. Wouldn't be surprised if other architectures do the same.

> 
> The situation is improving (in that DMA buffers are being allocated
> separately, rather than being part of some other structure) but that
> doesn't mean that it's safe to assume that overlapping cache lines can
> be invalidated.
> 
> In any case, DMA with devm allocated buffers really is not a good idea.

I very much agree with that part.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 21:33               ` Lars-Peter Clausen
@ 2017-03-09  3:25                 ` Masahiro Yamada
  -1 siblings, 0 replies; 22+ messages in thread
From: Masahiro Yamada @ 2017-03-09  3:25 UTC (permalink / raw)
  To: Lars-Peter Clausen, Russell King - ARM Linux
  Cc: Robin Murphy, dmaengine, linux-arm-kernel, Arnd Bergmann,
	Linux Kernel Mailing List, James E.J. Bottomley, Tejun Heo,
	David S. Miller

Hi Russell, Lars-Peter,

Thanks for your expert comments.


2017-03-09 6:33 GMT+09:00 Lars-Peter Clausen <lars@metafoo.de>:
> On 03/08/2017 10:19 PM, Russell King - ARM Linux wrote:
>> On Wed, Mar 08, 2017 at 09:44:17PM +0100, Lars-Peter Clausen wrote:
>>> On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
>>>> On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
>>>>> When the DMA memory is mapped for reading from the device the associated
>>>>> cachelines are invalidated without writeback. There is no guarantee that
>>>>> the changes made to the devres_node have made it to main memory yet, or
>>>>> is there?
>>>>
>>>> That is incorrect.
>>>>
>>>> Overlapping cache lines are always written back on transitions from CPU
>>>> to device ownership of the buffer (eg, dma_map_*().)
>>>
>>> On ARM. But my understanding is that this is not a universal requirement
>>> according to DMA-API.txt. It says that mappings must be cache line aligned
>>> and otherwise behavior is undefined.
>>
>> There is no use of the term "undefined" in the document you refer to.
>>
>> There is the recommendation that regions are cache line aligned, but
>> there is quite a bit of history in the kernel where DMA has been to
>> regions that are not cache line aligned, and where the DMA region
>> overlaps with data that has recent accesses made to it.
>
> I says: "Warnings:  Memory coherency operates at a granularity called the
> cache line width.  In order for memory mapped by this API to operate
> correctly, the mapped region must begin exactly on a cache line
> boundary and end exactly on one." That doesn't sound like a recommendation
> to me. "should" usually implies a recommendation while "must" indicates a
> hard requirement.
>
> I believe e.g. MIPS will align the address by masking the lower bits off,
> without any flushes. Wouldn't be surprised if other architectures do the same.
>
>>
>> The situation is improving (in that DMA buffers are being allocated
>> separately, rather than being part of some other structure) but that
>> doesn't mean that it's safe to assume that overlapping cache lines can
>> be invalidated.
>>
>> In any case, DMA with devm allocated buffers really is not a good idea.
>
> I very much agree with that part.


I understood devm_k*alloc() is not good for DMA.

Now I am tackling on improvement of drivers/mtd/nand/denali.c
and I'd like to make the situation better.


I thought 3 choices.

Is there anything recommended, or should be avoided?


(a)  Change devm_kmalloc() to return address aligned with ARCH_DMA_MINALIGN.



(b)  Give alignment in your driver if you still want to use devm_.

     denali->buf = devm_kmalloc(dev, bufsize + ARCH_DMA_MINALIGN,
                                GFP_KERNEL | GFP_DMA);
     if (!denali->buf) {
             (error_handling)
     }

     denali->buf = PTR_ALIGN(denali->buf, ARCH_DMA_MINALIGN);



(c) Use kmalloc() and kfree().   (be careful for memory leak)




Thanks!


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-09  3:25                 ` Masahiro Yamada
  0 siblings, 0 replies; 22+ messages in thread
From: Masahiro Yamada @ 2017-03-09  3:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell, Lars-Peter,

Thanks for your expert comments.


2017-03-09 6:33 GMT+09:00 Lars-Peter Clausen <lars@metafoo.de>:
> On 03/08/2017 10:19 PM, Russell King - ARM Linux wrote:
>> On Wed, Mar 08, 2017 at 09:44:17PM +0100, Lars-Peter Clausen wrote:
>>> On 03/08/2017 08:59 PM, Russell King - ARM Linux wrote:
>>>> On Wed, Mar 08, 2017 at 08:48:31PM +0100, Lars-Peter Clausen wrote:
>>>>> When the DMA memory is mapped for reading from the device the associated
>>>>> cachelines are invalidated without writeback. There is no guarantee that
>>>>> the changes made to the devres_node have made it to main memory yet, or
>>>>> is there?
>>>>
>>>> That is incorrect.
>>>>
>>>> Overlapping cache lines are always written back on transitions from CPU
>>>> to device ownership of the buffer (eg, dma_map_*().)
>>>
>>> On ARM. But my understanding is that this is not a universal requirement
>>> according to DMA-API.txt. It says that mappings must be cache line aligned
>>> and otherwise behavior is undefined.
>>
>> There is no use of the term "undefined" in the document you refer to.
>>
>> There is the recommendation that regions are cache line aligned, but
>> there is quite a bit of history in the kernel where DMA has been to
>> regions that are not cache line aligned, and where the DMA region
>> overlaps with data that has recent accesses made to it.
>
> I says: "Warnings:  Memory coherency operates at a granularity called the
> cache line width.  In order for memory mapped by this API to operate
> correctly, the mapped region must begin exactly on a cache line
> boundary and end exactly on one." That doesn't sound like a recommendation
> to me. "should" usually implies a recommendation while "must" indicates a
> hard requirement.
>
> I believe e.g. MIPS will align the address by masking the lower bits off,
> without any flushes. Wouldn't be surprised if other architectures do the same.
>
>>
>> The situation is improving (in that DMA buffers are being allocated
>> separately, rather than being part of some other structure) but that
>> doesn't mean that it's safe to assume that overlapping cache lines can
>> be invalidated.
>>
>> In any case, DMA with devm allocated buffers really is not a good idea.
>
> I very much agree with that part.


I understood devm_k*alloc() is not good for DMA.

Now I am tackling on improvement of drivers/mtd/nand/denali.c
and I'd like to make the situation better.


I thought 3 choices.

Is there anything recommended, or should be avoided?


(a)  Change devm_kmalloc() to return address aligned with ARCH_DMA_MINALIGN.



(b)  Give alignment in your driver if you still want to use devm_.

     denali->buf = devm_kmalloc(dev, bufsize + ARCH_DMA_MINALIGN,
                                GFP_KERNEL | GFP_DMA);
     if (!denali->buf) {
             (error_handling)
     }

     denali->buf = PTR_ALIGN(denali->buf, ARCH_DMA_MINALIGN);



(c) Use kmalloc() and kfree().   (be careful for memory leak)




Thanks!


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-09  3:25                 ` Masahiro Yamada
@ 2017-03-09  9:20                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-03-09  9:20 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Lars-Peter Clausen, Robin Murphy, dmaengine, linux-arm-kernel,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

On Thu, Mar 09, 2017 at 12:25:07PM +0900, Masahiro Yamada wrote:
> (c) Use kmalloc() and kfree().   (be careful for memory leak)

This is quite simple.  For the first one, it doesn't seem that it's
DMA'd into, so there's no need to use GFP_DMA.

-	/* allocate a temporary buffer for nand_scan_ident() */
-	denali->buf.buf = devm_kzalloc(denali->dev, PAGE_SIZE,
-					GFP_DMA | GFP_KERNEL);
-	if (!denali->buf.buf)
-		return -ENOMEM;

...

+	denali->buf.buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!denali->buf.buf)
+		return -ENOMEM;
+
	/*
	 * scan for NAND devices attached to the controller
	 * this is the first stage in a two step process to register
	 * with the nand subsystem
	 */
	ret = nand_scan_ident(mtd, denali->max_banks, NULL);
+	kfree(denali->buf.buf);
+
	if (ret)
		goto failed_req_irq;

-	/* allocate the right size buffer now */
-	devm_kfree(denali->dev, denali->buf.buf);

For the second one, I think the first thing to do is to move the
dma_set_mask() to the very beginning of the probe function - if that
fails, then we can't use DMA, and it's not something that requires
any cleanup.

With that gone, convert the other devm_kzalloc() there for buf.buf to
kzalloc(), and ensure that it's appropriately freed.  Note that this
driver is _already_ buggy in that if:

        } else if (mtd->oobsize < (denali->bbtskipbytes +
                        ECC_8BITS * (mtd->writesize /
                        ECC_SECTOR_SIZE))) {
                pr_err("Your NAND chip OOB is not large enough to contain 8bit ECC correction codes");
                goto failed_req_irq;

fails, or these:

        ret = nand_scan_tail(mtd);
        if (ret)
                goto failed_req_irq;

        ret = mtd_device_register(mtd, NULL, 0);
        if (ret) {
                dev_err(denali->dev, "Failed to register MTD: %d\n", ret);
                goto failed_req_irq;
        }

it doesn't unmap the buffer.  So, the driver is already broken.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-09  9:20                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2017-03-09  9:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 09, 2017 at 12:25:07PM +0900, Masahiro Yamada wrote:
> (c) Use kmalloc() and kfree().   (be careful for memory leak)

This is quite simple.  For the first one, it doesn't seem that it's
DMA'd into, so there's no need to use GFP_DMA.

-	/* allocate a temporary buffer for nand_scan_ident() */
-	denali->buf.buf = devm_kzalloc(denali->dev, PAGE_SIZE,
-					GFP_DMA | GFP_KERNEL);
-	if (!denali->buf.buf)
-		return -ENOMEM;

...

+	denali->buf.buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!denali->buf.buf)
+		return -ENOMEM;
+
	/*
	 * scan for NAND devices attached to the controller
	 * this is the first stage in a two step process to register
	 * with the nand subsystem
	 */
	ret = nand_scan_ident(mtd, denali->max_banks, NULL);
+	kfree(denali->buf.buf);
+
	if (ret)
		goto failed_req_irq;

-	/* allocate the right size buffer now */
-	devm_kfree(denali->dev, denali->buf.buf);

For the second one, I think the first thing to do is to move the
dma_set_mask() to the very beginning of the probe function - if that
fails, then we can't use DMA, and it's not something that requires
any cleanup.

With that gone, convert the other devm_kzalloc() there for buf.buf to
kzalloc(), and ensure that it's appropriately freed.  Note that this
driver is _already_ buggy in that if:

        } else if (mtd->oobsize < (denali->bbtskipbytes +
                        ECC_8BITS * (mtd->writesize /
                        ECC_SECTOR_SIZE))) {
                pr_err("Your NAND chip OOB is not large enough to contain 8bit ECC correction codes");
                goto failed_req_irq;

fails, or these:

        ret = nand_scan_tail(mtd);
        if (ret)
                goto failed_req_irq;

        ret = mtd_device_register(mtd, NULL, 0);
        if (ret) {
                dev_err(denali->dev, "Failed to register MTD: %d\n", ret);
                goto failed_req_irq;
        }

it doesn't unmap the buffer.  So, the driver is already broken.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Question] devm_kmalloc() for DMA ?
  2017-03-08 18:06     ` Masahiro Yamada
@ 2017-03-09 11:19       ` Robin Murphy
  -1 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2017-03-09 11:19 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: dmaengine, linux-arm-kernel, Russell King - ARM Linux,
	Arnd Bergmann, Linux Kernel Mailing List, James E.J. Bottomley,
	Tejun Heo, David S. Miller

On 08/03/17 18:06, Masahiro Yamada wrote:
> Hi Robin,
> 
> 
> 2017-03-08 20:15 GMT+09:00 Robin Murphy <robin.murphy@arm.com>:
>> On 08/03/17 10:59, Masahiro Yamada wrote:
>>> Hi experts,
>>>
>>> I have a question about
>>> how to allocate DMA-safe buffer.
>>>
>>>
>>> In my understanding, kmalloc() returns
>>> memory with DMA safe alignment
>>> in order to avoid cache-sharing problem when used for DMA.
>>>
>>> The alignment is decided by ARCH_DMA_MINALIGN.
>>> For example, on modern ARM 32bit boards, this value is typically 64.
>>> So, memory returned by kmalloc() has
>>> at least 64 byte alignment.
>>>
>>>
>>> On the other hand, devm_kmalloc() does not return
>>> enough-aligned memory.
>>
>> How so? If anything returned by kmalloc() is guaranteed to occupy some
>> multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
>> falling into the same cache line, I don't see how stealing the first 16
>> bytes *of a single allocation* could make it start sharing cache lines
>> with another? :/
> 
> I just thought of traverse of the linked list of devres_node
> on a different thread, but it should not happen.
> 
> Please forget my stupid question.

On the contrary, my apologies for overlooking the subtlety here and
speaking too soon. It's not strictly the alignment of the allocation
that's at issue, it's more that the streaming DMA notion of "ownership"
of that particular allocation can be unwittingly violated by other agents.

Another thread traversing the list isn't actually a problem, as it's
effectively no different from the cache itself doing a speculative
prefetch, which we have to accommodate anyway. However, there could in
theory be a potential data loss if the devres node is *modified* (e.g.
an adjacent entry is added or removed) whilst the buffer is mapped for
DMA_FROM_DEVICE.

Now, as Russell says, doing streaming DMA on a managed allocation tied
to the device's lifetime doesn't make a great deal of sense, and if
you're allocating or freeing device-lifetime resources *while DMA is
active* you're almost certainly doing something wrong, so thankfully I
don't think we should need to worry about this in practice.

Looking at the Denali driver specifically, it does indeed look a bit
bogus. At least it's not doing DMA with a uint8-aligned array in the
middle of a live structure as it apparently once did, but in terms of
how it's actually using that buffer I think something along the lines of
the below would be appropriate (if anyone wants to try spinning a proper
patch out of it, feel free - I have no way of testing and am not
familiar with MTD in general).

Robin.

----->8-----
diff --git a/drivers/mtd/nand/denali.c b/drivers/mtd/nand/denali.c
index 73b9d4e2dca0..b2b4650a3923 100644
--- a/drivers/mtd/nand/denali.c
+++ b/drivers/mtd/nand/denali.c
@@ -1046,8 +1046,6 @@ static int write_page(struct mtd_info *mtd, struct
nand_chip *chip,
 			mtd->oobsize);
 	}

-	dma_sync_single_for_device(denali->dev, addr, size, DMA_TO_DEVICE);
-
 	clear_interrupts(denali);
 	denali_enable_dma(denali, true);

@@ -1063,7 +1061,6 @@ static int write_page(struct mtd_info *mtd, struct
nand_chip *chip,
 	}

 	denali_enable_dma(denali, false);
-	dma_sync_single_for_cpu(denali->dev, addr, size, DMA_TO_DEVICE);

 	return 0;
 }
@@ -1139,7 +1136,6 @@ static int denali_read_page(struct mtd_info *mtd,
struct nand_chip *chip,
 	setup_ecc_for_xfer(denali, true, false);

 	denali_enable_dma(denali, true);
-	dma_sync_single_for_device(denali->dev, addr, size, DMA_FROM_DEVICE);

 	clear_interrupts(denali);
 	denali_setup_dma(denali, DENALI_READ);
@@ -1147,8 +1143,6 @@ static int denali_read_page(struct mtd_info *mtd,
struct nand_chip *chip,
 	/* wait for operation to complete */
 	irq_status = wait_for_irq(denali, irq_mask);

-	dma_sync_single_for_cpu(denali->dev, addr, size, DMA_FROM_DEVICE);
-
 	memcpy(buf, denali->buf.buf, mtd->writesize);

 	check_erased_page = handle_ecc(denali, buf, irq_status, &max_bitflips);
@@ -1186,16 +1180,12 @@ static int denali_read_page_raw(struct mtd_info
*mtd, struct nand_chip *chip,
 	setup_ecc_for_xfer(denali, false, true);
 	denali_enable_dma(denali, true);

-	dma_sync_single_for_device(denali->dev, addr, size, DMA_FROM_DEVICE);
-
 	clear_interrupts(denali);
 	denali_setup_dma(denali, DENALI_READ);

 	/* wait for operation to complete */
 	wait_for_irq(denali, irq_mask);

-	dma_sync_single_for_cpu(denali->dev, addr, size, DMA_FROM_DEVICE);
-
 	denali_enable_dma(denali, false);

 	memcpy(buf, denali->buf.buf, mtd->writesize);
@@ -1466,16 +1456,6 @@ int denali_init(struct denali_nand_info *denali)
 	if (ret)
 		goto failed_req_irq;

-	/* allocate the right size buffer now */
-	devm_kfree(denali->dev, denali->buf.buf);
-	denali->buf.buf = devm_kzalloc(denali->dev,
-			     mtd->writesize + mtd->oobsize,
-			     GFP_KERNEL);
-	if (!denali->buf.buf) {
-		ret = -ENOMEM;
-		goto failed_req_irq;
-	}
-
 	/* Is 32-bit DMA supported? */
 	ret = dma_set_mask(denali->dev, DMA_BIT_MASK(32));
 	if (ret) {
@@ -1483,12 +1463,14 @@ int denali_init(struct denali_nand_info *denali)
 		goto failed_req_irq;
 	}

-	denali->buf.dma_buf = dma_map_single(denali->dev, denali->buf.buf,
-			     mtd->writesize + mtd->oobsize,
-			     DMA_BIDIRECTIONAL);
-	if (dma_mapping_error(denali->dev, denali->buf.dma_buf)) {
-		dev_err(denali->dev, "Failed to map DMA buffer\n");
-		ret = -EIO;
+	/* allocate the right size buffer now */
+	devm_kfree(denali->dev, denali->buf.buf);
+	denali->buf.buf = dmam_alloc_coherent(denali->dev,
+					      mtd->writesize + mtd->oobsize,
+					      &denali->buf.dma_buf,
+					      GFP_KERNEL);
+	if (!denali->buf.buf) {
+		ret = -ENOMEM;
 		goto failed_req_irq;
 	}

@@ -1598,7 +1580,5 @@ void denali_remove(struct denali_nand_info *denali)

 	nand_release(mtd);
 	denali_irq_cleanup(denali->irq, denali);
-	dma_unmap_single(denali->dev, denali->buf.dma_buf, bufsize,
-			 DMA_BIDIRECTIONAL);
 }
 EXPORT_SYMBOL(denali_remove);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Question] devm_kmalloc() for DMA ?
@ 2017-03-09 11:19       ` Robin Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2017-03-09 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/03/17 18:06, Masahiro Yamada wrote:
> Hi Robin,
> 
> 
> 2017-03-08 20:15 GMT+09:00 Robin Murphy <robin.murphy@arm.com>:
>> On 08/03/17 10:59, Masahiro Yamada wrote:
>>> Hi experts,
>>>
>>> I have a question about
>>> how to allocate DMA-safe buffer.
>>>
>>>
>>> In my understanding, kmalloc() returns
>>> memory with DMA safe alignment
>>> in order to avoid cache-sharing problem when used for DMA.
>>>
>>> The alignment is decided by ARCH_DMA_MINALIGN.
>>> For example, on modern ARM 32bit boards, this value is typically 64.
>>> So, memory returned by kmalloc() has
>>> at least 64 byte alignment.
>>>
>>>
>>> On the other hand, devm_kmalloc() does not return
>>> enough-aligned memory.
>>
>> How so? If anything returned by kmalloc() is guaranteed to occupy some
>> multiple of ARCH_DMA_MINALIGN bytes in order to avoid two allocations
>> falling into the same cache line, I don't see how stealing the first 16
>> bytes *of a single allocation* could make it start sharing cache lines
>> with another? :/
> 
> I just thought of traverse of the linked list of devres_node
> on a different thread, but it should not happen.
> 
> Please forget my stupid question.

On the contrary, my apologies for overlooking the subtlety here and
speaking too soon. It's not strictly the alignment of the allocation
that's at issue, it's more that the streaming DMA notion of "ownership"
of that particular allocation can be unwittingly violated by other agents.

Another thread traversing the list isn't actually a problem, as it's
effectively no different from the cache itself doing a speculative
prefetch, which we have to accommodate anyway. However, there could in
theory be a potential data loss if the devres node is *modified* (e.g.
an adjacent entry is added or removed) whilst the buffer is mapped for
DMA_FROM_DEVICE.

Now, as Russell says, doing streaming DMA on a managed allocation tied
to the device's lifetime doesn't make a great deal of sense, and if
you're allocating or freeing device-lifetime resources *while DMA is
active* you're almost certainly doing something wrong, so thankfully I
don't think we should need to worry about this in practice.

Looking at the Denali driver specifically, it does indeed look a bit
bogus. At least it's not doing DMA with a uint8-aligned array in the
middle of a live structure as it apparently once did, but in terms of
how it's actually using that buffer I think something along the lines of
the below would be appropriate (if anyone wants to try spinning a proper
patch out of it, feel free - I have no way of testing and am not
familiar with MTD in general).

Robin.

----->8-----
diff --git a/drivers/mtd/nand/denali.c b/drivers/mtd/nand/denali.c
index 73b9d4e2dca0..b2b4650a3923 100644
--- a/drivers/mtd/nand/denali.c
+++ b/drivers/mtd/nand/denali.c
@@ -1046,8 +1046,6 @@ static int write_page(struct mtd_info *mtd, struct
nand_chip *chip,
 			mtd->oobsize);
 	}

-	dma_sync_single_for_device(denali->dev, addr, size, DMA_TO_DEVICE);
-
 	clear_interrupts(denali);
 	denali_enable_dma(denali, true);

@@ -1063,7 +1061,6 @@ static int write_page(struct mtd_info *mtd, struct
nand_chip *chip,
 	}

 	denali_enable_dma(denali, false);
-	dma_sync_single_for_cpu(denali->dev, addr, size, DMA_TO_DEVICE);

 	return 0;
 }
@@ -1139,7 +1136,6 @@ static int denali_read_page(struct mtd_info *mtd,
struct nand_chip *chip,
 	setup_ecc_for_xfer(denali, true, false);

 	denali_enable_dma(denali, true);
-	dma_sync_single_for_device(denali->dev, addr, size, DMA_FROM_DEVICE);

 	clear_interrupts(denali);
 	denali_setup_dma(denali, DENALI_READ);
@@ -1147,8 +1143,6 @@ static int denali_read_page(struct mtd_info *mtd,
struct nand_chip *chip,
 	/* wait for operation to complete */
 	irq_status = wait_for_irq(denali, irq_mask);

-	dma_sync_single_for_cpu(denali->dev, addr, size, DMA_FROM_DEVICE);
-
 	memcpy(buf, denali->buf.buf, mtd->writesize);

 	check_erased_page = handle_ecc(denali, buf, irq_status, &max_bitflips);
@@ -1186,16 +1180,12 @@ static int denali_read_page_raw(struct mtd_info
*mtd, struct nand_chip *chip,
 	setup_ecc_for_xfer(denali, false, true);
 	denali_enable_dma(denali, true);

-	dma_sync_single_for_device(denali->dev, addr, size, DMA_FROM_DEVICE);
-
 	clear_interrupts(denali);
 	denali_setup_dma(denali, DENALI_READ);

 	/* wait for operation to complete */
 	wait_for_irq(denali, irq_mask);

-	dma_sync_single_for_cpu(denali->dev, addr, size, DMA_FROM_DEVICE);
-
 	denali_enable_dma(denali, false);

 	memcpy(buf, denali->buf.buf, mtd->writesize);
@@ -1466,16 +1456,6 @@ int denali_init(struct denali_nand_info *denali)
 	if (ret)
 		goto failed_req_irq;

-	/* allocate the right size buffer now */
-	devm_kfree(denali->dev, denali->buf.buf);
-	denali->buf.buf = devm_kzalloc(denali->dev,
-			     mtd->writesize + mtd->oobsize,
-			     GFP_KERNEL);
-	if (!denali->buf.buf) {
-		ret = -ENOMEM;
-		goto failed_req_irq;
-	}
-
 	/* Is 32-bit DMA supported? */
 	ret = dma_set_mask(denali->dev, DMA_BIT_MASK(32));
 	if (ret) {
@@ -1483,12 +1463,14 @@ int denali_init(struct denali_nand_info *denali)
 		goto failed_req_irq;
 	}

-	denali->buf.dma_buf = dma_map_single(denali->dev, denali->buf.buf,
-			     mtd->writesize + mtd->oobsize,
-			     DMA_BIDIRECTIONAL);
-	if (dma_mapping_error(denali->dev, denali->buf.dma_buf)) {
-		dev_err(denali->dev, "Failed to map DMA buffer\n");
-		ret = -EIO;
+	/* allocate the right size buffer now */
+	devm_kfree(denali->dev, denali->buf.buf);
+	denali->buf.buf = dmam_alloc_coherent(denali->dev,
+					      mtd->writesize + mtd->oobsize,
+					      &denali->buf.dma_buf,
+					      GFP_KERNEL);
+	if (!denali->buf.buf) {
+		ret = -ENOMEM;
 		goto failed_req_irq;
 	}

@@ -1598,7 +1580,5 @@ void denali_remove(struct denali_nand_info *denali)

 	nand_release(mtd);
 	denali_irq_cleanup(denali->irq, denali);
-	dma_unmap_single(denali->dev, denali->buf.dma_buf, bufsize,
-			 DMA_BIDIRECTIONAL);
 }
 EXPORT_SYMBOL(denali_remove);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-03-09 11:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-08 10:59 [Question] devm_kmalloc() for DMA ? Masahiro Yamada
2017-03-08 10:59 ` Masahiro Yamada
2017-03-08 11:15 ` Robin Murphy
2017-03-08 11:15   ` Robin Murphy
2017-03-08 18:06   ` Masahiro Yamada
2017-03-08 18:06     ` Masahiro Yamada
2017-03-08 19:48     ` Lars-Peter Clausen
2017-03-08 19:48       ` Lars-Peter Clausen
2017-03-08 19:59       ` Russell King - ARM Linux
2017-03-08 19:59         ` Russell King - ARM Linux
2017-03-08 20:44         ` Lars-Peter Clausen
2017-03-08 20:44           ` Lars-Peter Clausen
2017-03-08 21:19           ` Russell King - ARM Linux
2017-03-08 21:19             ` Russell King - ARM Linux
2017-03-08 21:33             ` Lars-Peter Clausen
2017-03-08 21:33               ` Lars-Peter Clausen
2017-03-09  3:25               ` Masahiro Yamada
2017-03-09  3:25                 ` Masahiro Yamada
2017-03-09  9:20                 ` Russell King - ARM Linux
2017-03-09  9:20                   ` Russell King - ARM Linux
2017-03-09 11:19     ` Robin Murphy
2017-03-09 11:19       ` Robin Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.