* kmalloc and uncached memory @ 2014-04-16 18:11 ` Lin Ming 0 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 18:11 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-mm, linux-arm-kernel Hi Peter, I have a performance problem(on ARM board) that cpu is very bus at cache invalidation. So I'm trying to alloc an uncached memory to eliminate cache invalidation. But I also have problem with dma_alloc_coherent(). If I don't use dma_alloc_coherent(), is it OK to use below code to alloc uncached memory? struct page *page; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *pte; void *cpu_addr; dma_addr_t dma_addr; unsigned int vaddr; cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); vaddr = (unsigned int)uncached->cpu_addr; pgd = pgd_offset_k(vaddr); pud = pud_offset(pgd, vaddr); pmd = pmd_offset(pud, vaddr); pte = pte_offset_kernel(pmd, vaddr); page = virt_to_page(vaddr); set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); /* This kmalloc memory won't be freed */ Thanks, Ming -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* kmalloc and uncached memory @ 2014-04-16 18:11 ` Lin Ming 0 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 18:11 UTC (permalink / raw) To: linux-arm-kernel Hi Peter, I have a performance problem(on ARM board) that cpu is very bus at cache invalidation. So I'm trying to alloc an uncached memory to eliminate cache invalidation. But I also have problem with dma_alloc_coherent(). If I don't use dma_alloc_coherent(), is it OK to use below code to alloc uncached memory? struct page *page; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *pte; void *cpu_addr; dma_addr_t dma_addr; unsigned int vaddr; cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); vaddr = (unsigned int)uncached->cpu_addr; pgd = pgd_offset_k(vaddr); pud = pud_offset(pgd, vaddr); pmd = pmd_offset(pud, vaddr); pte = pte_offset_kernel(pmd, vaddr); page = virt_to_page(vaddr); set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); /* This kmalloc memory won't be freed */ Thanks, Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kmalloc and uncached memory 2014-04-16 18:11 ` Lin Ming @ 2014-04-16 18:33 ` Laura Abbott -1 siblings, 0 replies; 14+ messages in thread From: Laura Abbott @ 2014-04-16 18:33 UTC (permalink / raw) To: Lin Ming, Peter Zijlstra; +Cc: linux-mm, linux-arm-kernel On 4/16/2014 11:11 AM, Lin Ming wrote: > Hi Peter, > > I have a performance problem(on ARM board) that cpu is very bus at > cache invalidation. > So I'm trying to alloc an uncached memory to eliminate cache invalidation. > > But I also have problem with dma_alloc_coherent(). > If I don't use dma_alloc_coherent(), is it OK to use below code to > alloc uncached memory? > > struct page *page; > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > pte_t *pte; > void *cpu_addr; > dma_addr_t dma_addr; > unsigned int vaddr; > > cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > vaddr = (unsigned int)uncached->cpu_addr; > pgd = pgd_offset_k(vaddr); > pud = pud_offset(pgd, vaddr); > pmd = pmd_offset(pud, vaddr); > pte = pte_offset_kernel(pmd, vaddr); > page = virt_to_page(vaddr); > set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > > /* This kmalloc memory won't be freed */ > No, that will not work. lowmem pages are mapped with 1MB sections underneath which cannot be (easily) changed at runtime. You really want to be using dma_alloc_coherent here. Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* kmalloc and uncached memory @ 2014-04-16 18:33 ` Laura Abbott 0 siblings, 0 replies; 14+ messages in thread From: Laura Abbott @ 2014-04-16 18:33 UTC (permalink / raw) To: linux-arm-kernel On 4/16/2014 11:11 AM, Lin Ming wrote: > Hi Peter, > > I have a performance problem(on ARM board) that cpu is very bus at > cache invalidation. > So I'm trying to alloc an uncached memory to eliminate cache invalidation. > > But I also have problem with dma_alloc_coherent(). > If I don't use dma_alloc_coherent(), is it OK to use below code to > alloc uncached memory? > > struct page *page; > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > pte_t *pte; > void *cpu_addr; > dma_addr_t dma_addr; > unsigned int vaddr; > > cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > vaddr = (unsigned int)uncached->cpu_addr; > pgd = pgd_offset_k(vaddr); > pud = pud_offset(pgd, vaddr); > pmd = pmd_offset(pud, vaddr); > pte = pte_offset_kernel(pmd, vaddr); > page = virt_to_page(vaddr); > set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > > /* This kmalloc memory won't be freed */ > No, that will not work. lowmem pages are mapped with 1MB sections underneath which cannot be (easily) changed at runtime. You really want to be using dma_alloc_coherent here. Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kmalloc and uncached memory 2014-04-16 18:33 ` Laura Abbott @ 2014-04-16 18:50 ` Lin Ming -1 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 18:50 UTC (permalink / raw) To: Laura Abbott; +Cc: Peter Zijlstra, linux-mm, linux-arm-kernel On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: > On 4/16/2014 11:11 AM, Lin Ming wrote: >> Hi Peter, >> >> I have a performance problem(on ARM board) that cpu is very bus at >> cache invalidation. >> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >> But I also have problem with dma_alloc_coherent(). >> If I don't use dma_alloc_coherent(), is it OK to use below code to >> alloc uncached memory? >> >> struct page *page; >> pgd_t *pgd; >> pud_t *pud; >> pmd_t *pmd; >> pte_t *pte; >> void *cpu_addr; >> dma_addr_t dma_addr; >> unsigned int vaddr; >> >> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> vaddr = (unsigned int)uncached->cpu_addr; >> pgd = pgd_offset_k(vaddr); >> pud = pud_offset(pgd, vaddr); >> pmd = pmd_offset(pud, vaddr); >> pte = pte_offset_kernel(pmd, vaddr); >> page = virt_to_page(vaddr); >> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >> /* This kmalloc memory won't be freed */ >> > > No, that will not work. lowmem pages are mapped with 1MB sections underneath > which cannot be (easily) changed at runtime. You really want to be using > dma_alloc_coherent here. For "lowmem pages", do you mean the first 16M physical memory? How about that if I only use highmem pages(>16M)? Thanks. > > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* kmalloc and uncached memory @ 2014-04-16 18:50 ` Lin Ming 0 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 18:50 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: > On 4/16/2014 11:11 AM, Lin Ming wrote: >> Hi Peter, >> >> I have a performance problem(on ARM board) that cpu is very bus at >> cache invalidation. >> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >> But I also have problem with dma_alloc_coherent(). >> If I don't use dma_alloc_coherent(), is it OK to use below code to >> alloc uncached memory? >> >> struct page *page; >> pgd_t *pgd; >> pud_t *pud; >> pmd_t *pmd; >> pte_t *pte; >> void *cpu_addr; >> dma_addr_t dma_addr; >> unsigned int vaddr; >> >> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> vaddr = (unsigned int)uncached->cpu_addr; >> pgd = pgd_offset_k(vaddr); >> pud = pud_offset(pgd, vaddr); >> pmd = pmd_offset(pud, vaddr); >> pte = pte_offset_kernel(pmd, vaddr); >> page = virt_to_page(vaddr); >> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >> /* This kmalloc memory won't be freed */ >> > > No, that will not work. lowmem pages are mapped with 1MB sections underneath > which cannot be (easily) changed at runtime. You really want to be using > dma_alloc_coherent here. For "lowmem pages", do you mean the first 16M physical memory? How about that if I only use highmem pages(>16M)? Thanks. > > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kmalloc and uncached memory 2014-04-16 18:50 ` Lin Ming @ 2014-04-16 19:03 ` Laura Abbott -1 siblings, 0 replies; 14+ messages in thread From: Laura Abbott @ 2014-04-16 19:03 UTC (permalink / raw) To: Lin Ming; +Cc: Peter Zijlstra, linux-mm, linux-arm-kernel On 4/16/2014 11:50 AM, Lin Ming wrote: > On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: >> On 4/16/2014 11:11 AM, Lin Ming wrote: >>> Hi Peter, >>> >>> I have a performance problem(on ARM board) that cpu is very bus at >>> cache invalidation. >>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>> >>> But I also have problem with dma_alloc_coherent(). >>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>> alloc uncached memory? >>> >>> struct page *page; >>> pgd_t *pgd; >>> pud_t *pud; >>> pmd_t *pmd; >>> pte_t *pte; >>> void *cpu_addr; >>> dma_addr_t dma_addr; >>> unsigned int vaddr; >>> >>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>> vaddr = (unsigned int)uncached->cpu_addr; >>> pgd = pgd_offset_k(vaddr); >>> pud = pud_offset(pgd, vaddr); >>> pmd = pmd_offset(pud, vaddr); >>> pte = pte_offset_kernel(pmd, vaddr); >>> page = virt_to_page(vaddr); >>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>> >>> /* This kmalloc memory won't be freed */ >>> >> >> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> which cannot be (easily) changed at runtime. You really want to be using >> dma_alloc_coherent here. > > For "lowmem pages", do you mean the first 16M physical memory? > How about that if I only use highmem pages(>16M)? > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers to pages which do not have a permanent mapping in the kernel address space. If you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem region. What's the reason you can't use dma_alloc_coherent? Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* kmalloc and uncached memory @ 2014-04-16 19:03 ` Laura Abbott 0 siblings, 0 replies; 14+ messages in thread From: Laura Abbott @ 2014-04-16 19:03 UTC (permalink / raw) To: linux-arm-kernel On 4/16/2014 11:50 AM, Lin Ming wrote: > On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: >> On 4/16/2014 11:11 AM, Lin Ming wrote: >>> Hi Peter, >>> >>> I have a performance problem(on ARM board) that cpu is very bus at >>> cache invalidation. >>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>> >>> But I also have problem with dma_alloc_coherent(). >>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>> alloc uncached memory? >>> >>> struct page *page; >>> pgd_t *pgd; >>> pud_t *pud; >>> pmd_t *pmd; >>> pte_t *pte; >>> void *cpu_addr; >>> dma_addr_t dma_addr; >>> unsigned int vaddr; >>> >>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>> vaddr = (unsigned int)uncached->cpu_addr; >>> pgd = pgd_offset_k(vaddr); >>> pud = pud_offset(pgd, vaddr); >>> pmd = pmd_offset(pud, vaddr); >>> pte = pte_offset_kernel(pmd, vaddr); >>> page = virt_to_page(vaddr); >>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>> >>> /* This kmalloc memory won't be freed */ >>> >> >> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> which cannot be (easily) changed at runtime. You really want to be using >> dma_alloc_coherent here. > > For "lowmem pages", do you mean the first 16M physical memory? > How about that if I only use highmem pages(>16M)? > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers to pages which do not have a permanent mapping in the kernel address space. If you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem region. What's the reason you can't use dma_alloc_coherent? Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kmalloc and uncached memory 2014-04-16 19:03 ` Laura Abbott @ 2014-04-16 21:28 ` Lin Ming -1 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 21:28 UTC (permalink / raw) To: Laura Abbott; +Cc: Peter Zijlstra, linux-mm, linux-arm-kernel On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote: > On 4/16/2014 11:50 AM, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >>>> Hi Peter, >>>> >>>> I have a performance problem(on ARM board) that cpu is very bus at >>>> cache invalidation. >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>>> >>>> But I also have problem with dma_alloc_coherent(). >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>>> alloc uncached memory? >>>> >>>> struct page *page; >>>> pgd_t *pgd; >>>> pud_t *pud; >>>> pmd_t *pmd; >>>> pte_t *pte; >>>> void *cpu_addr; >>>> dma_addr_t dma_addr; >>>> unsigned int vaddr; >>>> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>>> vaddr = (unsigned int)uncached->cpu_addr; >>>> pgd = pgd_offset_k(vaddr); >>>> pud = pud_offset(pgd, vaddr); >>>> pmd = pmd_offset(pud, vaddr); >>>> pte = pte_offset_kernel(pmd, vaddr); >>>> page = virt_to_page(vaddr); >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>>> >>>> /* This kmalloc memory won't be freed */ >>>> >>> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >>> which cannot be (easily) changed at runtime. You really want to be using >>> dma_alloc_coherent here. >> >> For "lowmem pages", do you mean the first 16M physical memory? >> How about that if I only use highmem pages(>16M)? >> > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > to pages which do not have a permanent mapping in the kernel address space. If > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > region. Thanks for the explanation. > > What's the reason you can't use dma_alloc_coherent? I'm actually testing WIFI RX performance on a ARM based AP. WIFI to Ethernet traffic, that is WIFI driver RX packets and then Ethernet driver TX packets. I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver to receive packets. But then Ethernet driver can't send packets successfully. If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. I know this is too platform/drivers specific problem, but any suggestion would be appreciated. Thanks. > > Thanks, > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* kmalloc and uncached memory @ 2014-04-16 21:28 ` Lin Ming 0 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 21:28 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote: > On 4/16/2014 11:50 AM, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >>>> Hi Peter, >>>> >>>> I have a performance problem(on ARM board) that cpu is very bus at >>>> cache invalidation. >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>>> >>>> But I also have problem with dma_alloc_coherent(). >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>>> alloc uncached memory? >>>> >>>> struct page *page; >>>> pgd_t *pgd; >>>> pud_t *pud; >>>> pmd_t *pmd; >>>> pte_t *pte; >>>> void *cpu_addr; >>>> dma_addr_t dma_addr; >>>> unsigned int vaddr; >>>> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>>> vaddr = (unsigned int)uncached->cpu_addr; >>>> pgd = pgd_offset_k(vaddr); >>>> pud = pud_offset(pgd, vaddr); >>>> pmd = pmd_offset(pud, vaddr); >>>> pte = pte_offset_kernel(pmd, vaddr); >>>> page = virt_to_page(vaddr); >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>>> >>>> /* This kmalloc memory won't be freed */ >>>> >>> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >>> which cannot be (easily) changed at runtime. You really want to be using >>> dma_alloc_coherent here. >> >> For "lowmem pages", do you mean the first 16M physical memory? >> How about that if I only use highmem pages(>16M)? >> > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > to pages which do not have a permanent mapping in the kernel address space. If > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > region. Thanks for the explanation. > > What's the reason you can't use dma_alloc_coherent? I'm actually testing WIFI RX performance on a ARM based AP. WIFI to Ethernet traffic, that is WIFI driver RX packets and then Ethernet driver TX packets. I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver to receive packets. But then Ethernet driver can't send packets successfully. If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. I know this is too platform/drivers specific problem, but any suggestion would be appreciated. Thanks. > > Thanks, > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kmalloc and uncached memory 2014-04-16 21:28 ` Lin Ming @ 2014-04-16 22:43 ` Russell King - ARM Linux -1 siblings, 0 replies; 14+ messages in thread From: Russell King - ARM Linux @ 2014-04-16 22:43 UTC (permalink / raw) To: Lin Ming; +Cc: Laura Abbott, Peter Zijlstra, linux-mm, linux-arm-kernel On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: > On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote: > > On 4/16/2014 11:50 AM, Lin Ming wrote: > >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: > >>> On 4/16/2014 11:11 AM, Lin Ming wrote: > >>>> Hi Peter, > >>>> > >>>> I have a performance problem(on ARM board) that cpu is very bus at > >>>> cache invalidation. > >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. > >>>> > >>>> But I also have problem with dma_alloc_coherent(). > >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to > >>>> alloc uncached memory? > >>>> > >>>> struct page *page; > >>>> pgd_t *pgd; > >>>> pud_t *pud; > >>>> pmd_t *pmd; > >>>> pte_t *pte; > >>>> void *cpu_addr; > >>>> dma_addr_t dma_addr; > >>>> unsigned int vaddr; > >>>> > >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > >>>> vaddr = (unsigned int)uncached->cpu_addr; > >>>> pgd = pgd_offset_k(vaddr); > >>>> pud = pud_offset(pgd, vaddr); > >>>> pmd = pmd_offset(pud, vaddr); > >>>> pte = pte_offset_kernel(pmd, vaddr); > >>>> page = virt_to_page(vaddr); > >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > >>>> > >>>> /* This kmalloc memory won't be freed */ > >>>> > >>> > >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath > >>> which cannot be (easily) changed at runtime. You really want to be using > >>> dma_alloc_coherent here. > >> > >> For "lowmem pages", do you mean the first 16M physical memory? > >> How about that if I only use highmem pages(>16M)? > >> > > > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > > to pages which do not have a permanent mapping in the kernel address space. If > > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > > region. > > Thanks for the explanation. > > > > > What's the reason you can't use dma_alloc_coherent? > > I'm actually testing WIFI RX performance on a ARM based AP. > WIFI to Ethernet traffic, that is WIFI driver RX packets and then > Ethernet driver TX packets. > > I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver > to receive packets. > But then Ethernet driver can't send packets successfully. > > If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. > > I know this is too platform/drivers specific problem, but any > suggestion would be appreciated. So why are you trying to map the memory into userspace? Given your fragment above, what you're doing there will be no different from using dma_alloc_coherent() - think about what type of mapping you end up with. You have two options on ARM: 1. Use dma_alloc_coherent() - recommended for data which both the CPU and DMA can update simultaneously - eg, descriptor ring buffers typically found on ethernet devices. 2. Use dma_map_page/dma_map_single() for what we call streaming support, which can use kmalloc memory. *But* there is only exactly *one* owner of the buffer at any one time - either the CPU owns it *or* the DMA device owns it. *Only* the current owner may access the buffer. Such mappings must be unmapped before they are freed. Since there's the requirement for ownership in (2), these are not really suitable to be mapped into userspace while DMA is happening - accesses to the buffer while DMA is in progress /can/ corrupt the data. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* kmalloc and uncached memory @ 2014-04-16 22:43 ` Russell King - ARM Linux 0 siblings, 0 replies; 14+ messages in thread From: Russell King - ARM Linux @ 2014-04-16 22:43 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: > On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote: > > On 4/16/2014 11:50 AM, Lin Ming wrote: > >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: > >>> On 4/16/2014 11:11 AM, Lin Ming wrote: > >>>> Hi Peter, > >>>> > >>>> I have a performance problem(on ARM board) that cpu is very bus at > >>>> cache invalidation. > >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. > >>>> > >>>> But I also have problem with dma_alloc_coherent(). > >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to > >>>> alloc uncached memory? > >>>> > >>>> struct page *page; > >>>> pgd_t *pgd; > >>>> pud_t *pud; > >>>> pmd_t *pmd; > >>>> pte_t *pte; > >>>> void *cpu_addr; > >>>> dma_addr_t dma_addr; > >>>> unsigned int vaddr; > >>>> > >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > >>>> vaddr = (unsigned int)uncached->cpu_addr; > >>>> pgd = pgd_offset_k(vaddr); > >>>> pud = pud_offset(pgd, vaddr); > >>>> pmd = pmd_offset(pud, vaddr); > >>>> pte = pte_offset_kernel(pmd, vaddr); > >>>> page = virt_to_page(vaddr); > >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > >>>> > >>>> /* This kmalloc memory won't be freed */ > >>>> > >>> > >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath > >>> which cannot be (easily) changed at runtime. You really want to be using > >>> dma_alloc_coherent here. > >> > >> For "lowmem pages", do you mean the first 16M physical memory? > >> How about that if I only use highmem pages(>16M)? > >> > > > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > > to pages which do not have a permanent mapping in the kernel address space. If > > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > > region. > > Thanks for the explanation. > > > > > What's the reason you can't use dma_alloc_coherent? > > I'm actually testing WIFI RX performance on a ARM based AP. > WIFI to Ethernet traffic, that is WIFI driver RX packets and then > Ethernet driver TX packets. > > I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver > to receive packets. > But then Ethernet driver can't send packets successfully. > > If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. > > I know this is too platform/drivers specific problem, but any > suggestion would be appreciated. So why are you trying to map the memory into userspace? Given your fragment above, what you're doing there will be no different from using dma_alloc_coherent() - think about what type of mapping you end up with. You have two options on ARM: 1. Use dma_alloc_coherent() - recommended for data which both the CPU and DMA can update simultaneously - eg, descriptor ring buffers typically found on ethernet devices. 2. Use dma_map_page/dma_map_single() for what we call streaming support, which can use kmalloc memory. *But* there is only exactly *one* owner of the buffer at any one time - either the CPU owns it *or* the DMA device owns it. *Only* the current owner may access the buffer. Such mappings must be unmapped before they are freed. Since there's the requirement for ownership in (2), these are not really suitable to be mapped into userspace while DMA is happening - accesses to the buffer while DMA is in progress /can/ corrupt the data. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kmalloc and uncached memory 2014-04-16 22:43 ` Russell King - ARM Linux @ 2014-04-16 23:16 ` Lin Ming -1 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 23:16 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Laura Abbott, Peter Zijlstra, linux-mm, linux-arm-kernel On Wed, Apr 16, 2014 at 3:43 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote: >> > On 4/16/2014 11:50 AM, Lin Ming wrote: >> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: >> >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >> >>>> Hi Peter, >> >>>> >> >>>> I have a performance problem(on ARM board) that cpu is very bus at >> >>>> cache invalidation. >> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >>>> >> >>>> But I also have problem with dma_alloc_coherent(). >> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >> >>>> alloc uncached memory? >> >>>> >> >>>> struct page *page; >> >>>> pgd_t *pgd; >> >>>> pud_t *pud; >> >>>> pmd_t *pmd; >> >>>> pte_t *pte; >> >>>> void *cpu_addr; >> >>>> dma_addr_t dma_addr; >> >>>> unsigned int vaddr; >> >>>> >> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> >>>> vaddr = (unsigned int)uncached->cpu_addr; >> >>>> pgd = pgd_offset_k(vaddr); >> >>>> pud = pud_offset(pgd, vaddr); >> >>>> pmd = pmd_offset(pud, vaddr); >> >>>> pte = pte_offset_kernel(pmd, vaddr); >> >>>> page = virt_to_page(vaddr); >> >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >>>> >> >>>> /* This kmalloc memory won't be freed */ >> >>>> >> >>> >> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> >>> which cannot be (easily) changed at runtime. You really want to be using >> >>> dma_alloc_coherent here. >> >> >> >> For "lowmem pages", do you mean the first 16M physical memory? >> >> How about that if I only use highmem pages(>16M)? >> >> >> > >> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers >> > to pages which do not have a permanent mapping in the kernel address space. If >> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem >> > region. >> >> Thanks for the explanation. >> >> > >> > What's the reason you can't use dma_alloc_coherent? >> >> I'm actually testing WIFI RX performance on a ARM based AP. >> WIFI to Ethernet traffic, that is WIFI driver RX packets and then >> Ethernet driver TX packets. >> >> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver >> to receive packets. >> But then Ethernet driver can't send packets successfully. >> >> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. >> >> I know this is too platform/drivers specific problem, but any >> suggestion would be appreciated. > > So why are you trying to map the memory into userspace? I didn't map the memory into userspace. Or am I missing something obviously? > > Given your fragment above, what you're doing there will be no different > from using dma_alloc_coherent() - think about what type of mapping you > end up with. > > You have two options on ARM: > > 1. Use dma_alloc_coherent() - recommended for data which both the CPU and > DMA can update simultaneously - eg, descriptor ring buffers typically > found on ethernet devices. > > 2. Use dma_map_page/dma_map_single() for what we call streaming support, > which can use kmalloc memory. *But* there is only exactly *one* owner > of the buffer at any one time - either the CPU owns it *or* the DMA > device owns it. *Only* the current owner may access the buffer. > Such mappings must be unmapped before they are freed. My WIFI RX driver did 2). Here is a piece of perf_event log. Seems the bottleneck is at CPU cache invalidate operation. 33.86% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range | --- v7_dma_inv_range | |--51.46%-- ___dma_page_cpu_to_dev | skb2rbd_attach | vmac_rx_poll | net_rx_action | __do_softirq | run_ksoftirqd | kthread | kernel_thread_exit | --48.54%-- ___dma_page_dev_to_cpu vmac_rx_poll net_rx_action __do_softirq run_ksoftirqd kthread kernel_thread_exit So I try to do 1). Use dma_alloc_coherent() to eliminate cache invalidate operation. But for some reason, ethernet driver didn't TX successfully the uncached buffer. Thanks. > > Since there's the requirement for ownership in (2), these are not really > suitable to be mapped into userspace while DMA is happening - accesses to > the buffer while DMA is in progress /can/ corrupt the data. > > -- > FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly > improving, and getting towards what was expected from it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* kmalloc and uncached memory @ 2014-04-16 23:16 ` Lin Ming 0 siblings, 0 replies; 14+ messages in thread From: Lin Ming @ 2014-04-16 23:16 UTC (permalink / raw) To: linux-arm-kernel On Wed, Apr 16, 2014 at 3:43 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott <lauraa@codeaurora.org> wrote: >> > On 4/16/2014 11:50 AM, Lin Ming wrote: >> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott <lauraa@codeaurora.org> wrote: >> >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >> >>>> Hi Peter, >> >>>> >> >>>> I have a performance problem(on ARM board) that cpu is very bus at >> >>>> cache invalidation. >> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >>>> >> >>>> But I also have problem with dma_alloc_coherent(). >> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >> >>>> alloc uncached memory? >> >>>> >> >>>> struct page *page; >> >>>> pgd_t *pgd; >> >>>> pud_t *pud; >> >>>> pmd_t *pmd; >> >>>> pte_t *pte; >> >>>> void *cpu_addr; >> >>>> dma_addr_t dma_addr; >> >>>> unsigned int vaddr; >> >>>> >> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> >>>> vaddr = (unsigned int)uncached->cpu_addr; >> >>>> pgd = pgd_offset_k(vaddr); >> >>>> pud = pud_offset(pgd, vaddr); >> >>>> pmd = pmd_offset(pud, vaddr); >> >>>> pte = pte_offset_kernel(pmd, vaddr); >> >>>> page = virt_to_page(vaddr); >> >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >>>> >> >>>> /* This kmalloc memory won't be freed */ >> >>>> >> >>> >> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> >>> which cannot be (easily) changed at runtime. You really want to be using >> >>> dma_alloc_coherent here. >> >> >> >> For "lowmem pages", do you mean the first 16M physical memory? >> >> How about that if I only use highmem pages(>16M)? >> >> >> > >> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers >> > to pages which do not have a permanent mapping in the kernel address space. If >> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem >> > region. >> >> Thanks for the explanation. >> >> > >> > What's the reason you can't use dma_alloc_coherent? >> >> I'm actually testing WIFI RX performance on a ARM based AP. >> WIFI to Ethernet traffic, that is WIFI driver RX packets and then >> Ethernet driver TX packets. >> >> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver >> to receive packets. >> But then Ethernet driver can't send packets successfully. >> >> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. >> >> I know this is too platform/drivers specific problem, but any >> suggestion would be appreciated. > > So why are you trying to map the memory into userspace? I didn't map the memory into userspace. Or am I missing something obviously? > > Given your fragment above, what you're doing there will be no different > from using dma_alloc_coherent() - think about what type of mapping you > end up with. > > You have two options on ARM: > > 1. Use dma_alloc_coherent() - recommended for data which both the CPU and > DMA can update simultaneously - eg, descriptor ring buffers typically > found on ethernet devices. > > 2. Use dma_map_page/dma_map_single() for what we call streaming support, > which can use kmalloc memory. *But* there is only exactly *one* owner > of the buffer at any one time - either the CPU owns it *or* the DMA > device owns it. *Only* the current owner may access the buffer. > Such mappings must be unmapped before they are freed. My WIFI RX driver did 2). Here is a piece of perf_event log. Seems the bottleneck is at CPU cache invalidate operation. 33.86% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range | --- v7_dma_inv_range | |--51.46%-- ___dma_page_cpu_to_dev | skb2rbd_attach | vmac_rx_poll | net_rx_action | __do_softirq | run_ksoftirqd | kthread | kernel_thread_exit | --48.54%-- ___dma_page_dev_to_cpu vmac_rx_poll net_rx_action __do_softirq run_ksoftirqd kthread kernel_thread_exit So I try to do 1). Use dma_alloc_coherent() to eliminate cache invalidate operation. But for some reason, ethernet driver didn't TX successfully the uncached buffer. Thanks. > > Since there's the requirement for ownership in (2), these are not really > suitable to be mapped into userspace while DMA is happening - accesses to > the buffer while DMA is in progress /can/ corrupt the data. > > -- > FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly > improving, and getting towards what was expected from it. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-04-16 23:16 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-04-16 18:11 kmalloc and uncached memory Lin Ming 2014-04-16 18:11 ` Lin Ming 2014-04-16 18:33 ` Laura Abbott 2014-04-16 18:33 ` Laura Abbott 2014-04-16 18:50 ` Lin Ming 2014-04-16 18:50 ` Lin Ming 2014-04-16 19:03 ` Laura Abbott 2014-04-16 19:03 ` Laura Abbott 2014-04-16 21:28 ` Lin Ming 2014-04-16 21:28 ` Lin Ming 2014-04-16 22:43 ` Russell King - ARM Linux 2014-04-16 22:43 ` Russell King - ARM Linux 2014-04-16 23:16 ` Lin Ming 2014-04-16 23:16 ` Lin Ming
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.