All of lore.kernel.org
 help / color / mirror / Atom feed
* kmalloc and uncached memory
@ 2014-04-16 18:11 ` Lin Ming
  0 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 18:11 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-mm, linux-arm-kernel

Hi Peter,

I have a performance problem(on ARM board) that cpu is very bus at
cache invalidation.
So I'm trying to alloc an uncached memory to eliminate cache invalidation.

But I also have problem with dma_alloc_coherent().
If I don't use dma_alloc_coherent(), is it OK to use below code to
alloc uncached memory?

struct page *page;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
void *cpu_addr;
dma_addr_t dma_addr;
unsigned int vaddr;

cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
vaddr = (unsigned int)uncached->cpu_addr;
pgd = pgd_offset_k(vaddr);
pud = pud_offset(pgd, vaddr);
pmd = pmd_offset(pud, vaddr);
pte = pte_offset_kernel(pmd, vaddr);
page = virt_to_page(vaddr);
set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);

/* This kmalloc memory won't be freed  */

Thanks,
Ming

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* kmalloc and uncached memory
@ 2014-04-16 18:11 ` Lin Ming
  0 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Peter,

I have a performance problem(on ARM board) that cpu is very bus at
cache invalidation.
So I'm trying to alloc an uncached memory to eliminate cache invalidation.

But I also have problem with dma_alloc_coherent().
If I don't use dma_alloc_coherent(), is it OK to use below code to
alloc uncached memory?

struct page *page;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
void *cpu_addr;
dma_addr_t dma_addr;
unsigned int vaddr;

cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
vaddr = (unsigned int)uncached->cpu_addr;
pgd = pgd_offset_k(vaddr);
pud = pud_offset(pgd, vaddr);
pmd = pmd_offset(pud, vaddr);
pte = pte_offset_kernel(pmd, vaddr);
page = virt_to_page(vaddr);
set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);

/* This kmalloc memory won't be freed  */

Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kmalloc and uncached memory
  2014-04-16 18:11 ` Lin Ming
@ 2014-04-16 18:33   ` Laura Abbott
  -1 siblings, 0 replies; 14+ messages in thread
From: Laura Abbott @ 2014-04-16 18:33 UTC (permalink / raw)
  To: Lin Ming, Peter Zijlstra; +Cc: linux-mm, linux-arm-kernel

On 4/16/2014 11:11 AM, Lin Ming wrote:
> Hi Peter,
> 
> I have a performance problem(on ARM board) that cpu is very bus at
> cache invalidation.
> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
> 
> But I also have problem with dma_alloc_coherent().
> If I don't use dma_alloc_coherent(), is it OK to use below code to
> alloc uncached memory?
> 
> struct page *page;
> pgd_t *pgd;
> pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
> void *cpu_addr;
> dma_addr_t dma_addr;
> unsigned int vaddr;
> 
> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
> vaddr = (unsigned int)uncached->cpu_addr;
> pgd = pgd_offset_k(vaddr);
> pud = pud_offset(pgd, vaddr);
> pmd = pmd_offset(pud, vaddr);
> pte = pte_offset_kernel(pmd, vaddr);
> page = virt_to_page(vaddr);
> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
> 
> /* This kmalloc memory won't be freed  */
> 

No, that will not work. lowmem pages are mapped with 1MB sections underneath
which cannot be (easily) changed at runtime. You really want to be using
dma_alloc_coherent here.

Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* kmalloc and uncached memory
@ 2014-04-16 18:33   ` Laura Abbott
  0 siblings, 0 replies; 14+ messages in thread
From: Laura Abbott @ 2014-04-16 18:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/16/2014 11:11 AM, Lin Ming wrote:
> Hi Peter,
> 
> I have a performance problem(on ARM board) that cpu is very bus at
> cache invalidation.
> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
> 
> But I also have problem with dma_alloc_coherent().
> If I don't use dma_alloc_coherent(), is it OK to use below code to
> alloc uncached memory?
> 
> struct page *page;
> pgd_t *pgd;
> pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
> void *cpu_addr;
> dma_addr_t dma_addr;
> unsigned int vaddr;
> 
> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
> vaddr = (unsigned int)uncached->cpu_addr;
> pgd = pgd_offset_k(vaddr);
> pud = pud_offset(pgd, vaddr);
> pmd = pmd_offset(pud, vaddr);
> pte = pte_offset_kernel(pmd, vaddr);
> page = virt_to_page(vaddr);
> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
> 
> /* This kmalloc memory won't be freed  */
> 

No, that will not work. lowmem pages are mapped with 1MB sections underneath
which cannot be (easily) changed at runtime. You really want to be using
dma_alloc_coherent here.

Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kmalloc and uncached memory
  2014-04-16 18:33   ` Laura Abbott
@ 2014-04-16 18:50     ` Lin Ming
  -1 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 18:50 UTC (permalink / raw)
  To: Laura Abbott; +Cc: Peter Zijlstra, linux-mm, linux-arm-kernel

On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
> On 4/16/2014 11:11 AM, Lin Ming wrote:
>> Hi Peter,
>>
>> I have a performance problem(on ARM board) that cpu is very bus at
>> cache invalidation.
>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>>
>> But I also have problem with dma_alloc_coherent().
>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>> alloc uncached memory?
>>
>> struct page *page;
>> pgd_t *pgd;
>> pud_t *pud;
>> pmd_t *pmd;
>> pte_t *pte;
>> void *cpu_addr;
>> dma_addr_t dma_addr;
>> unsigned int vaddr;
>>
>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>> vaddr = (unsigned int)uncached->cpu_addr;
>> pgd = pgd_offset_k(vaddr);
>> pud = pud_offset(pgd, vaddr);
>> pmd = pmd_offset(pud, vaddr);
>> pte = pte_offset_kernel(pmd, vaddr);
>> page = virt_to_page(vaddr);
>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>>
>> /* This kmalloc memory won't be freed  */
>>
>
> No, that will not work. lowmem pages are mapped with 1MB sections underneath
> which cannot be (easily) changed at runtime. You really want to be using
> dma_alloc_coherent here.

For "lowmem pages", do you mean the first 16M physical memory?
How about that if I only use highmem pages(>16M)?

Thanks.

>
> Laura
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* kmalloc and uncached memory
@ 2014-04-16 18:50     ` Lin Ming
  0 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
> On 4/16/2014 11:11 AM, Lin Ming wrote:
>> Hi Peter,
>>
>> I have a performance problem(on ARM board) that cpu is very bus at
>> cache invalidation.
>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>>
>> But I also have problem with dma_alloc_coherent().
>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>> alloc uncached memory?
>>
>> struct page *page;
>> pgd_t *pgd;
>> pud_t *pud;
>> pmd_t *pmd;
>> pte_t *pte;
>> void *cpu_addr;
>> dma_addr_t dma_addr;
>> unsigned int vaddr;
>>
>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>> vaddr = (unsigned int)uncached->cpu_addr;
>> pgd = pgd_offset_k(vaddr);
>> pud = pud_offset(pgd, vaddr);
>> pmd = pmd_offset(pud, vaddr);
>> pte = pte_offset_kernel(pmd, vaddr);
>> page = virt_to_page(vaddr);
>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>>
>> /* This kmalloc memory won't be freed  */
>>
>
> No, that will not work. lowmem pages are mapped with 1MB sections underneath
> which cannot be (easily) changed at runtime. You really want to be using
> dma_alloc_coherent here.

For "lowmem pages", do you mean the first 16M physical memory?
How about that if I only use highmem pages(>16M)?

Thanks.

>
> Laura
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kmalloc and uncached memory
  2014-04-16 18:50     ` Lin Ming
@ 2014-04-16 19:03       ` Laura Abbott
  -1 siblings, 0 replies; 14+ messages in thread
From: Laura Abbott @ 2014-04-16 19:03 UTC (permalink / raw)
  To: Lin Ming; +Cc: Peter Zijlstra, linux-mm, linux-arm-kernel

On 4/16/2014 11:50 AM, Lin Ming wrote:
> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> On 4/16/2014 11:11 AM, Lin Ming wrote:
>>> Hi Peter,
>>>
>>> I have a performance problem(on ARM board) that cpu is very bus at
>>> cache invalidation.
>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>>>
>>> But I also have problem with dma_alloc_coherent().
>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>>> alloc uncached memory?
>>>
>>> struct page *page;
>>> pgd_t *pgd;
>>> pud_t *pud;
>>> pmd_t *pmd;
>>> pte_t *pte;
>>> void *cpu_addr;
>>> dma_addr_t dma_addr;
>>> unsigned int vaddr;
>>>
>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>>> vaddr = (unsigned int)uncached->cpu_addr;
>>> pgd = pgd_offset_k(vaddr);
>>> pud = pud_offset(pgd, vaddr);
>>> pmd = pmd_offset(pud, vaddr);
>>> pte = pte_offset_kernel(pmd, vaddr);
>>> page = virt_to_page(vaddr);
>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>>>
>>> /* This kmalloc memory won't be freed  */
>>>
>>
>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
>> which cannot be (easily) changed at runtime. You really want to be using
>> dma_alloc_coherent here.
> 
> For "lowmem pages", do you mean the first 16M physical memory?
> How about that if I only use highmem pages(>16M)?
> 

By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
to pages which do not have a permanent mapping in the kernel address space. If
you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
region.

What's the reason you can't use dma_alloc_coherent?

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* kmalloc and uncached memory
@ 2014-04-16 19:03       ` Laura Abbott
  0 siblings, 0 replies; 14+ messages in thread
From: Laura Abbott @ 2014-04-16 19:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/16/2014 11:50 AM, Lin Ming wrote:
> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> On 4/16/2014 11:11 AM, Lin Ming wrote:
>>> Hi Peter,
>>>
>>> I have a performance problem(on ARM board) that cpu is very bus at
>>> cache invalidation.
>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>>>
>>> But I also have problem with dma_alloc_coherent().
>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>>> alloc uncached memory?
>>>
>>> struct page *page;
>>> pgd_t *pgd;
>>> pud_t *pud;
>>> pmd_t *pmd;
>>> pte_t *pte;
>>> void *cpu_addr;
>>> dma_addr_t dma_addr;
>>> unsigned int vaddr;
>>>
>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>>> vaddr = (unsigned int)uncached->cpu_addr;
>>> pgd = pgd_offset_k(vaddr);
>>> pud = pud_offset(pgd, vaddr);
>>> pmd = pmd_offset(pud, vaddr);
>>> pte = pte_offset_kernel(pmd, vaddr);
>>> page = virt_to_page(vaddr);
>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>>>
>>> /* This kmalloc memory won't be freed  */
>>>
>>
>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
>> which cannot be (easily) changed at runtime. You really want to be using
>> dma_alloc_coherent here.
> 
> For "lowmem pages", do you mean the first 16M physical memory?
> How about that if I only use highmem pages(>16M)?
> 

By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
to pages which do not have a permanent mapping in the kernel address space. If
you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
region.

What's the reason you can't use dma_alloc_coherent?

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kmalloc and uncached memory
  2014-04-16 19:03       ` Laura Abbott
@ 2014-04-16 21:28         ` Lin Ming
  -1 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 21:28 UTC (permalink / raw)
  To: Laura Abbott; +Cc: Peter Zijlstra, linux-mm, linux-arm-kernel

On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
> On 4/16/2014 11:50 AM, Lin Ming wrote:
>> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
>>> On 4/16/2014 11:11 AM, Lin Ming wrote:
>>>> Hi Peter,
>>>>
>>>> I have a performance problem(on ARM board) that cpu is very bus at
>>>> cache invalidation.
>>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>>>>
>>>> But I also have problem with dma_alloc_coherent().
>>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>>>> alloc uncached memory?
>>>>
>>>> struct page *page;
>>>> pgd_t *pgd;
>>>> pud_t *pud;
>>>> pmd_t *pmd;
>>>> pte_t *pte;
>>>> void *cpu_addr;
>>>> dma_addr_t dma_addr;
>>>> unsigned int vaddr;
>>>>
>>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>>>> vaddr = (unsigned int)uncached->cpu_addr;
>>>> pgd = pgd_offset_k(vaddr);
>>>> pud = pud_offset(pgd, vaddr);
>>>> pmd = pmd_offset(pud, vaddr);
>>>> pte = pte_offset_kernel(pmd, vaddr);
>>>> page = virt_to_page(vaddr);
>>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>>>>
>>>> /* This kmalloc memory won't be freed  */
>>>>
>>>
>>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
>>> which cannot be (easily) changed at runtime. You really want to be using
>>> dma_alloc_coherent here.
>>
>> For "lowmem pages", do you mean the first 16M physical memory?
>> How about that if I only use highmem pages(>16M)?
>>
>
> By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
> to pages which do not have a permanent mapping in the kernel address space. If
> you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
> region.

Thanks for the explanation.

>
> What's the reason you can't use dma_alloc_coherent?

I'm actually testing WIFI RX performance on a ARM based AP.
WIFI to Ethernet traffic, that is WIFI driver RX packets and then
Ethernet driver TX packets.

I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver
to receive packets.
But then Ethernet driver can't send packets successfully.

If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK.

I know this is too platform/drivers specific problem, but any
suggestion would be appreciated.

Thanks.

>
> Thanks,
> Laura
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* kmalloc and uncached memory
@ 2014-04-16 21:28         ` Lin Ming
  0 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 21:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
> On 4/16/2014 11:50 AM, Lin Ming wrote:
>> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
>>> On 4/16/2014 11:11 AM, Lin Ming wrote:
>>>> Hi Peter,
>>>>
>>>> I have a performance problem(on ARM board) that cpu is very bus at
>>>> cache invalidation.
>>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>>>>
>>>> But I also have problem with dma_alloc_coherent().
>>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>>>> alloc uncached memory?
>>>>
>>>> struct page *page;
>>>> pgd_t *pgd;
>>>> pud_t *pud;
>>>> pmd_t *pmd;
>>>> pte_t *pte;
>>>> void *cpu_addr;
>>>> dma_addr_t dma_addr;
>>>> unsigned int vaddr;
>>>>
>>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>>>> vaddr = (unsigned int)uncached->cpu_addr;
>>>> pgd = pgd_offset_k(vaddr);
>>>> pud = pud_offset(pgd, vaddr);
>>>> pmd = pmd_offset(pud, vaddr);
>>>> pte = pte_offset_kernel(pmd, vaddr);
>>>> page = virt_to_page(vaddr);
>>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>>>>
>>>> /* This kmalloc memory won't be freed  */
>>>>
>>>
>>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
>>> which cannot be (easily) changed at runtime. You really want to be using
>>> dma_alloc_coherent here.
>>
>> For "lowmem pages", do you mean the first 16M physical memory?
>> How about that if I only use highmem pages(>16M)?
>>
>
> By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
> to pages which do not have a permanent mapping in the kernel address space. If
> you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
> region.

Thanks for the explanation.

>
> What's the reason you can't use dma_alloc_coherent?

I'm actually testing WIFI RX performance on a ARM based AP.
WIFI to Ethernet traffic, that is WIFI driver RX packets and then
Ethernet driver TX packets.

I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver
to receive packets.
But then Ethernet driver can't send packets successfully.

If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK.

I know this is too platform/drivers specific problem, but any
suggestion would be appreciated.

Thanks.

>
> Thanks,
> Laura
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kmalloc and uncached memory
  2014-04-16 21:28         ` Lin Ming
@ 2014-04-16 22:43           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 14+ messages in thread
From: Russell King - ARM Linux @ 2014-04-16 22:43 UTC (permalink / raw)
  To: Lin Ming; +Cc: Laura Abbott, Peter Zijlstra, linux-mm, linux-arm-kernel

On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote:
> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
> > On 4/16/2014 11:50 AM, Lin Ming wrote:
> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
> >>> On 4/16/2014 11:11 AM, Lin Ming wrote:
> >>>> Hi Peter,
> >>>>
> >>>> I have a performance problem(on ARM board) that cpu is very bus at
> >>>> cache invalidation.
> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
> >>>>
> >>>> But I also have problem with dma_alloc_coherent().
> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
> >>>> alloc uncached memory?
> >>>>
> >>>> struct page *page;
> >>>> pgd_t *pgd;
> >>>> pud_t *pud;
> >>>> pmd_t *pmd;
> >>>> pte_t *pte;
> >>>> void *cpu_addr;
> >>>> dma_addr_t dma_addr;
> >>>> unsigned int vaddr;
> >>>>
> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
> >>>> vaddr = (unsigned int)uncached->cpu_addr;
> >>>> pgd = pgd_offset_k(vaddr);
> >>>> pud = pud_offset(pgd, vaddr);
> >>>> pmd = pmd_offset(pud, vaddr);
> >>>> pte = pte_offset_kernel(pmd, vaddr);
> >>>> page = virt_to_page(vaddr);
> >>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
> >>>>
> >>>> /* This kmalloc memory won't be freed  */
> >>>>
> >>>
> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
> >>> which cannot be (easily) changed at runtime. You really want to be using
> >>> dma_alloc_coherent here.
> >>
> >> For "lowmem pages", do you mean the first 16M physical memory?
> >> How about that if I only use highmem pages(>16M)?
> >>
> >
> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
> > to pages which do not have a permanent mapping in the kernel address space. If
> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
> > region.
> 
> Thanks for the explanation.
> 
> >
> > What's the reason you can't use dma_alloc_coherent?
> 
> I'm actually testing WIFI RX performance on a ARM based AP.
> WIFI to Ethernet traffic, that is WIFI driver RX packets and then
> Ethernet driver TX packets.
> 
> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver
> to receive packets.
> But then Ethernet driver can't send packets successfully.
> 
> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK.
> 
> I know this is too platform/drivers specific problem, but any
> suggestion would be appreciated.

So why are you trying to map the memory into userspace?

Given your fragment above, what you're doing there will be no different
from using dma_alloc_coherent() - think about what type of mapping you
end up with.

You have two options on ARM:

1. Use dma_alloc_coherent() - recommended for data which both the CPU and
   DMA can update simultaneously - eg, descriptor ring buffers typically
   found on ethernet devices.

2. Use dma_map_page/dma_map_single() for what we call streaming support,
   which can use kmalloc memory.  *But* there is only exactly *one* owner
   of the buffer at any one time - either the CPU owns it *or* the DMA
   device owns it.  *Only* the current owner may access the buffer.
   Such mappings must be unmapped before they are freed.

Since there's the requirement for ownership in (2), these are not really
suitable to be mapped into userspace while DMA is happening - accesses to
the buffer while DMA is in progress /can/ corrupt the data.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* kmalloc and uncached memory
@ 2014-04-16 22:43           ` Russell King - ARM Linux
  0 siblings, 0 replies; 14+ messages in thread
From: Russell King - ARM Linux @ 2014-04-16 22:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote:
> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
> > On 4/16/2014 11:50 AM, Lin Ming wrote:
> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
> >>> On 4/16/2014 11:11 AM, Lin Ming wrote:
> >>>> Hi Peter,
> >>>>
> >>>> I have a performance problem(on ARM board) that cpu is very bus at
> >>>> cache invalidation.
> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
> >>>>
> >>>> But I also have problem with dma_alloc_coherent().
> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
> >>>> alloc uncached memory?
> >>>>
> >>>> struct page *page;
> >>>> pgd_t *pgd;
> >>>> pud_t *pud;
> >>>> pmd_t *pmd;
> >>>> pte_t *pte;
> >>>> void *cpu_addr;
> >>>> dma_addr_t dma_addr;
> >>>> unsigned int vaddr;
> >>>>
> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
> >>>> vaddr = (unsigned int)uncached->cpu_addr;
> >>>> pgd = pgd_offset_k(vaddr);
> >>>> pud = pud_offset(pgd, vaddr);
> >>>> pmd = pmd_offset(pud, vaddr);
> >>>> pte = pte_offset_kernel(pmd, vaddr);
> >>>> page = virt_to_page(vaddr);
> >>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
> >>>>
> >>>> /* This kmalloc memory won't be freed  */
> >>>>
> >>>
> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
> >>> which cannot be (easily) changed at runtime. You really want to be using
> >>> dma_alloc_coherent here.
> >>
> >> For "lowmem pages", do you mean the first 16M physical memory?
> >> How about that if I only use highmem pages(>16M)?
> >>
> >
> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
> > to pages which do not have a permanent mapping in the kernel address space. If
> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
> > region.
> 
> Thanks for the explanation.
> 
> >
> > What's the reason you can't use dma_alloc_coherent?
> 
> I'm actually testing WIFI RX performance on a ARM based AP.
> WIFI to Ethernet traffic, that is WIFI driver RX packets and then
> Ethernet driver TX packets.
> 
> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver
> to receive packets.
> But then Ethernet driver can't send packets successfully.
> 
> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK.
> 
> I know this is too platform/drivers specific problem, but any
> suggestion would be appreciated.

So why are you trying to map the memory into userspace?

Given your fragment above, what you're doing there will be no different
from using dma_alloc_coherent() - think about what type of mapping you
end up with.

You have two options on ARM:

1. Use dma_alloc_coherent() - recommended for data which both the CPU and
   DMA can update simultaneously - eg, descriptor ring buffers typically
   found on ethernet devices.

2. Use dma_map_page/dma_map_single() for what we call streaming support,
   which can use kmalloc memory.  *But* there is only exactly *one* owner
   of the buffer at any one time - either the CPU owns it *or* the DMA
   device owns it.  *Only* the current owner may access the buffer.
   Such mappings must be unmapped before they are freed.

Since there's the requirement for ownership in (2), these are not really
suitable to be mapped into userspace while DMA is happening - accesses to
the buffer while DMA is in progress /can/ corrupt the data.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kmalloc and uncached memory
  2014-04-16 22:43           ` Russell King - ARM Linux
@ 2014-04-16 23:16             ` Lin Ming
  -1 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 23:16 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Laura Abbott, Peter Zijlstra, linux-mm, linux-arm-kernel

On Wed, Apr 16, 2014 at 3:43 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote:
>> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> > On 4/16/2014 11:50 AM, Lin Ming wrote:
>> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> >>> On 4/16/2014 11:11 AM, Lin Ming wrote:
>> >>>> Hi Peter,
>> >>>>
>> >>>> I have a performance problem(on ARM board) that cpu is very bus at
>> >>>> cache invalidation.
>> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>> >>>>
>> >>>> But I also have problem with dma_alloc_coherent().
>> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>> >>>> alloc uncached memory?
>> >>>>
>> >>>> struct page *page;
>> >>>> pgd_t *pgd;
>> >>>> pud_t *pud;
>> >>>> pmd_t *pmd;
>> >>>> pte_t *pte;
>> >>>> void *cpu_addr;
>> >>>> dma_addr_t dma_addr;
>> >>>> unsigned int vaddr;
>> >>>>
>> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>> >>>> vaddr = (unsigned int)uncached->cpu_addr;
>> >>>> pgd = pgd_offset_k(vaddr);
>> >>>> pud = pud_offset(pgd, vaddr);
>> >>>> pmd = pmd_offset(pud, vaddr);
>> >>>> pte = pte_offset_kernel(pmd, vaddr);
>> >>>> page = virt_to_page(vaddr);
>> >>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>> >>>>
>> >>>> /* This kmalloc memory won't be freed  */
>> >>>>
>> >>>
>> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
>> >>> which cannot be (easily) changed at runtime. You really want to be using
>> >>> dma_alloc_coherent here.
>> >>
>> >> For "lowmem pages", do you mean the first 16M physical memory?
>> >> How about that if I only use highmem pages(>16M)?
>> >>
>> >
>> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
>> > to pages which do not have a permanent mapping in the kernel address space. If
>> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
>> > region.
>>
>> Thanks for the explanation.
>>
>> >
>> > What's the reason you can't use dma_alloc_coherent?
>>
>> I'm actually testing WIFI RX performance on a ARM based AP.
>> WIFI to Ethernet traffic, that is WIFI driver RX packets and then
>> Ethernet driver TX packets.
>>
>> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver
>> to receive packets.
>> But then Ethernet driver can't send packets successfully.
>>
>> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK.
>>
>> I know this is too platform/drivers specific problem, but any
>> suggestion would be appreciated.
>
> So why are you trying to map the memory into userspace?

I didn't map the memory into userspace.
Or am I missing something obviously?

>
> Given your fragment above, what you're doing there will be no different
> from using dma_alloc_coherent() - think about what type of mapping you
> end up with.
>
> You have two options on ARM:
>
> 1. Use dma_alloc_coherent() - recommended for data which both the CPU and
>    DMA can update simultaneously - eg, descriptor ring buffers typically
>    found on ethernet devices.
>
> 2. Use dma_map_page/dma_map_single() for what we call streaming support,
>    which can use kmalloc memory.  *But* there is only exactly *one* owner
>    of the buffer at any one time - either the CPU owns it *or* the DMA
>    device owns it.  *Only* the current owner may access the buffer.
>    Such mappings must be unmapped before they are freed.

My WIFI RX driver did 2).
Here is a piece of perf_event log.
Seems the bottleneck is at CPU cache invalidate operation.

    33.86%  ksoftirqd/0  [kernel.kallsyms]  [k] v7_dma_inv_range
            |
            --- v7_dma_inv_range
               |
               |--51.46%-- ___dma_page_cpu_to_dev
               |          skb2rbd_attach
               |          vmac_rx_poll
               |          net_rx_action
               |          __do_softirq
               |          run_ksoftirqd
               |          kthread
               |          kernel_thread_exit
               |
                --48.54%-- ___dma_page_dev_to_cpu
                          vmac_rx_poll
                          net_rx_action
                          __do_softirq
                          run_ksoftirqd
                          kthread
                          kernel_thread_exit

So I try to do 1). Use dma_alloc_coherent() to eliminate cache
invalidate operation.
But for some reason, ethernet driver didn't TX successfully the
uncached buffer.

Thanks.

>
> Since there's the requirement for ownership in (2), these are not really
> suitable to be mapped into userspace while DMA is happening - accesses to
> the buffer while DMA is in progress /can/ corrupt the data.
>
> --
> FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
> improving, and getting towards what was expected from it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* kmalloc and uncached memory
@ 2014-04-16 23:16             ` Lin Ming
  0 siblings, 0 replies; 14+ messages in thread
From: Lin Ming @ 2014-04-16 23:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 16, 2014 at 3:43 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote:
>> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> > On 4/16/2014 11:50 AM, Lin Ming wrote:
>> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> >>> On 4/16/2014 11:11 AM, Lin Ming wrote:
>> >>>> Hi Peter,
>> >>>>
>> >>>> I have a performance problem(on ARM board) that cpu is very bus at
>> >>>> cache invalidation.
>> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation.
>> >>>>
>> >>>> But I also have problem with dma_alloc_coherent().
>> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to
>> >>>> alloc uncached memory?
>> >>>>
>> >>>> struct page *page;
>> >>>> pgd_t *pgd;
>> >>>> pud_t *pud;
>> >>>> pmd_t *pmd;
>> >>>> pte_t *pte;
>> >>>> void *cpu_addr;
>> >>>> dma_addr_t dma_addr;
>> >>>> unsigned int vaddr;
>> >>>>
>> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE);
>> >>>> vaddr = (unsigned int)uncached->cpu_addr;
>> >>>> pgd = pgd_offset_k(vaddr);
>> >>>> pud = pud_offset(pgd, vaddr);
>> >>>> pmd = pmd_offset(pud, vaddr);
>> >>>> pte = pte_offset_kernel(pmd, vaddr);
>> >>>> page = virt_to_page(vaddr);
>> >>>> set_pte_ext(pte, mk_pte(page,  pgprot_dmacoherent(pgprot_kernel)), 0);
>> >>>>
>> >>>> /* This kmalloc memory won't be freed  */
>> >>>>
>> >>>
>> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath
>> >>> which cannot be (easily) changed at runtime. You really want to be using
>> >>> dma_alloc_coherent here.
>> >>
>> >> For "lowmem pages", do you mean the first 16M physical memory?
>> >> How about that if I only use highmem pages(>16M)?
>> >>
>> >
>> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers
>> > to pages which do not have a permanent mapping in the kernel address space. If
>> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem
>> > region.
>>
>> Thanks for the explanation.
>>
>> >
>> > What's the reason you can't use dma_alloc_coherent?
>>
>> I'm actually testing WIFI RX performance on a ARM based AP.
>> WIFI to Ethernet traffic, that is WIFI driver RX packets and then
>> Ethernet driver TX packets.
>>
>> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver
>> to receive packets.
>> But then Ethernet driver can't send packets successfully.
>>
>> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK.
>>
>> I know this is too platform/drivers specific problem, but any
>> suggestion would be appreciated.
>
> So why are you trying to map the memory into userspace?

I didn't map the memory into userspace.
Or am I missing something obviously?

>
> Given your fragment above, what you're doing there will be no different
> from using dma_alloc_coherent() - think about what type of mapping you
> end up with.
>
> You have two options on ARM:
>
> 1. Use dma_alloc_coherent() - recommended for data which both the CPU and
>    DMA can update simultaneously - eg, descriptor ring buffers typically
>    found on ethernet devices.
>
> 2. Use dma_map_page/dma_map_single() for what we call streaming support,
>    which can use kmalloc memory.  *But* there is only exactly *one* owner
>    of the buffer at any one time - either the CPU owns it *or* the DMA
>    device owns it.  *Only* the current owner may access the buffer.
>    Such mappings must be unmapped before they are freed.

My WIFI RX driver did 2).
Here is a piece of perf_event log.
Seems the bottleneck is at CPU cache invalidate operation.

    33.86%  ksoftirqd/0  [kernel.kallsyms]  [k] v7_dma_inv_range
            |
            --- v7_dma_inv_range
               |
               |--51.46%-- ___dma_page_cpu_to_dev
               |          skb2rbd_attach
               |          vmac_rx_poll
               |          net_rx_action
               |          __do_softirq
               |          run_ksoftirqd
               |          kthread
               |          kernel_thread_exit
               |
                --48.54%-- ___dma_page_dev_to_cpu
                          vmac_rx_poll
                          net_rx_action
                          __do_softirq
                          run_ksoftirqd
                          kthread
                          kernel_thread_exit

So I try to do 1). Use dma_alloc_coherent() to eliminate cache
invalidate operation.
But for some reason, ethernet driver didn't TX successfully the
uncached buffer.

Thanks.

>
> Since there's the requirement for ownership in (2), these are not really
> suitable to be mapped into userspace while DMA is happening - accesses to
> the buffer while DMA is in progress /can/ corrupt the data.
>
> --
> FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
> improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-04-16 23:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-16 18:11 kmalloc and uncached memory Lin Ming
2014-04-16 18:11 ` Lin Ming
2014-04-16 18:33 ` Laura Abbott
2014-04-16 18:33   ` Laura Abbott
2014-04-16 18:50   ` Lin Ming
2014-04-16 18:50     ` Lin Ming
2014-04-16 19:03     ` Laura Abbott
2014-04-16 19:03       ` Laura Abbott
2014-04-16 21:28       ` Lin Ming
2014-04-16 21:28         ` Lin Ming
2014-04-16 22:43         ` Russell King - ARM Linux
2014-04-16 22:43           ` Russell King - ARM Linux
2014-04-16 23:16           ` Lin Ming
2014-04-16 23:16             ` Lin Ming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.