From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f173.google.com (mail-lb0-f173.google.com [209.85.217.173]) by kanga.kvack.org (Postfix) with ESMTP id CA6856B0075 for ; Wed, 16 Apr 2014 14:11:42 -0400 (EDT) Received: by mail-lb0-f173.google.com with SMTP id p9so8492649lbv.32 for ; Wed, 16 Apr 2014 11:11:41 -0700 (PDT) Received: from mail-lb0-x22b.google.com (mail-lb0-x22b.google.com [2a00:1450:4010:c04::22b]) by mx.google.com with ESMTPS id u5si15626223laa.178.2014.04.16.11.11.40 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 16 Apr 2014 11:11:40 -0700 (PDT) Received: by mail-lb0-f171.google.com with SMTP id w7so8352281lbi.2 for ; Wed, 16 Apr 2014 11:11:40 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 16 Apr 2014 11:11:39 -0700 Message-ID: Subject: kmalloc and uncached memory From: Lin Ming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra Cc: linux-mm , "linux-arm-kernel@lists.infradead.org" Hi Peter, I have a performance problem(on ARM board) that cpu is very bus at cache invalidation. So I'm trying to alloc an uncached memory to eliminate cache invalidation. But I also have problem with dma_alloc_coherent(). If I don't use dma_alloc_coherent(), is it OK to use below code to alloc uncached memory? struct page *page; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *pte; void *cpu_addr; dma_addr_t dma_addr; unsigned int vaddr; cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); vaddr = (unsigned int)uncached->cpu_addr; pgd = pgd_offset_k(vaddr); pud = pud_offset(pgd, vaddr); pmd = pmd_offset(pud, vaddr); pte = pte_offset_kernel(pmd, vaddr); page = virt_to_page(vaddr); set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); /* This kmalloc memory won't be freed */ Thanks, Ming -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f49.google.com (mail-pb0-f49.google.com [209.85.160.49]) by kanga.kvack.org (Postfix) with ESMTP id 296346B008A for ; Wed, 16 Apr 2014 14:33:18 -0400 (EDT) Received: by mail-pb0-f49.google.com with SMTP id jt11so11155185pbb.36 for ; Wed, 16 Apr 2014 11:33:17 -0700 (PDT) Received: from smtp.codeaurora.org (smtp.codeaurora.org. [198.145.11.231]) by mx.google.com with ESMTPS id bi5si13169997pbb.191.2014.04.16.11.33.16 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Apr 2014 11:33:16 -0700 (PDT) Message-ID: <534ECCEB.6090007@codeaurora.org> Date: Wed, 16 Apr 2014 11:33:15 -0700 From: Laura Abbott MIME-Version: 1.0 Subject: Re: kmalloc and uncached memory References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Lin Ming , Peter Zijlstra Cc: linux-mm , "linux-arm-kernel@lists.infradead.org" On 4/16/2014 11:11 AM, Lin Ming wrote: > Hi Peter, > > I have a performance problem(on ARM board) that cpu is very bus at > cache invalidation. > So I'm trying to alloc an uncached memory to eliminate cache invalidation. > > But I also have problem with dma_alloc_coherent(). > If I don't use dma_alloc_coherent(), is it OK to use below code to > alloc uncached memory? > > struct page *page; > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > pte_t *pte; > void *cpu_addr; > dma_addr_t dma_addr; > unsigned int vaddr; > > cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > vaddr = (unsigned int)uncached->cpu_addr; > pgd = pgd_offset_k(vaddr); > pud = pud_offset(pgd, vaddr); > pmd = pmd_offset(pud, vaddr); > pte = pte_offset_kernel(pmd, vaddr); > page = virt_to_page(vaddr); > set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > > /* This kmalloc memory won't be freed */ > No, that will not work. lowmem pages are mapped with 1MB sections underneath which cannot be (easily) changed at runtime. You really want to be using dma_alloc_coherent here. Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f49.google.com (mail-la0-f49.google.com [209.85.215.49]) by kanga.kvack.org (Postfix) with ESMTP id A64AC6B0096 for ; Wed, 16 Apr 2014 14:50:41 -0400 (EDT) Received: by mail-la0-f49.google.com with SMTP id mc6so8285142lab.8 for ; Wed, 16 Apr 2014 11:50:40 -0700 (PDT) Received: from mail-lb0-x230.google.com (mail-lb0-x230.google.com [2a00:1450:4010:c04::230]) by mx.google.com with ESMTPS id u5si15684533laa.199.2014.04.16.11.50.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 16 Apr 2014 11:50:39 -0700 (PDT) Received: by mail-lb0-f176.google.com with SMTP id 10so8457865lbg.35 for ; Wed, 16 Apr 2014 11:50:39 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <534ECCEB.6090007@codeaurora.org> References: <534ECCEB.6090007@codeaurora.org> Date: Wed, 16 Apr 2014 11:50:38 -0700 Message-ID: Subject: Re: kmalloc and uncached memory From: Lin Ming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott Cc: Peter Zijlstra , linux-mm , "linux-arm-kernel@lists.infradead.org" On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: > On 4/16/2014 11:11 AM, Lin Ming wrote: >> Hi Peter, >> >> I have a performance problem(on ARM board) that cpu is very bus at >> cache invalidation. >> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >> But I also have problem with dma_alloc_coherent(). >> If I don't use dma_alloc_coherent(), is it OK to use below code to >> alloc uncached memory? >> >> struct page *page; >> pgd_t *pgd; >> pud_t *pud; >> pmd_t *pmd; >> pte_t *pte; >> void *cpu_addr; >> dma_addr_t dma_addr; >> unsigned int vaddr; >> >> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> vaddr = (unsigned int)uncached->cpu_addr; >> pgd = pgd_offset_k(vaddr); >> pud = pud_offset(pgd, vaddr); >> pmd = pmd_offset(pud, vaddr); >> pte = pte_offset_kernel(pmd, vaddr); >> page = virt_to_page(vaddr); >> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >> /* This kmalloc memory won't be freed */ >> > > No, that will not work. lowmem pages are mapped with 1MB sections underneath > which cannot be (easily) changed at runtime. You really want to be using > dma_alloc_coherent here. For "lowmem pages", do you mean the first 16M physical memory? How about that if I only use highmem pages(>16M)? Thanks. > > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f52.google.com (mail-pb0-f52.google.com [209.85.160.52]) by kanga.kvack.org (Postfix) with ESMTP id 7374A6B0082 for ; Wed, 16 Apr 2014 15:03:49 -0400 (EDT) Received: by mail-pb0-f52.google.com with SMTP id rr13so11225461pbb.39 for ; Wed, 16 Apr 2014 12:03:49 -0700 (PDT) Received: from smtp.codeaurora.org (smtp.codeaurora.org. [198.145.11.231]) by mx.google.com with ESMTPS id pb4si13207171pac.154.2014.04.16.12.03.47 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Apr 2014 12:03:48 -0700 (PDT) Message-ID: <534ED412.1040909@codeaurora.org> Date: Wed, 16 Apr 2014 12:03:46 -0700 From: Laura Abbott MIME-Version: 1.0 Subject: Re: kmalloc and uncached memory References: <534ECCEB.6090007@codeaurora.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Lin Ming Cc: Peter Zijlstra , linux-mm , "linux-arm-kernel@lists.infradead.org" On 4/16/2014 11:50 AM, Lin Ming wrote: > On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: >> On 4/16/2014 11:11 AM, Lin Ming wrote: >>> Hi Peter, >>> >>> I have a performance problem(on ARM board) that cpu is very bus at >>> cache invalidation. >>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>> >>> But I also have problem with dma_alloc_coherent(). >>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>> alloc uncached memory? >>> >>> struct page *page; >>> pgd_t *pgd; >>> pud_t *pud; >>> pmd_t *pmd; >>> pte_t *pte; >>> void *cpu_addr; >>> dma_addr_t dma_addr; >>> unsigned int vaddr; >>> >>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>> vaddr = (unsigned int)uncached->cpu_addr; >>> pgd = pgd_offset_k(vaddr); >>> pud = pud_offset(pgd, vaddr); >>> pmd = pmd_offset(pud, vaddr); >>> pte = pte_offset_kernel(pmd, vaddr); >>> page = virt_to_page(vaddr); >>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>> >>> /* This kmalloc memory won't be freed */ >>> >> >> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> which cannot be (easily) changed at runtime. You really want to be using >> dma_alloc_coherent here. > > For "lowmem pages", do you mean the first 16M physical memory? > How about that if I only use highmem pages(>16M)? > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers to pages which do not have a permanent mapping in the kernel address space. If you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem region. What's the reason you can't use dma_alloc_coherent? Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by kanga.kvack.org (Postfix) with ESMTP id 0E9D56B0036 for ; Wed, 16 Apr 2014 17:28:47 -0400 (EDT) Received: by mail-lb0-f172.google.com with SMTP id c11so8653476lbj.31 for ; Wed, 16 Apr 2014 14:28:47 -0700 (PDT) Received: from mail-la0-x229.google.com (mail-la0-x229.google.com [2a00:1450:4010:c03::229]) by mx.google.com with ESMTPS id jg10si15926641lbc.17.2014.04.16.14.28.45 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 16 Apr 2014 14:28:46 -0700 (PDT) Received: by mail-la0-f41.google.com with SMTP id gl10so8795327lab.0 for ; Wed, 16 Apr 2014 14:28:45 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <534ED412.1040909@codeaurora.org> References: <534ECCEB.6090007@codeaurora.org> <534ED412.1040909@codeaurora.org> Date: Wed, 16 Apr 2014 14:28:45 -0700 Message-ID: Subject: Re: kmalloc and uncached memory From: Lin Ming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott Cc: Peter Zijlstra , linux-mm , "linux-arm-kernel@lists.infradead.org" On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott wrote: > On 4/16/2014 11:50 AM, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >>>> Hi Peter, >>>> >>>> I have a performance problem(on ARM board) that cpu is very bus at >>>> cache invalidation. >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>>> >>>> But I also have problem with dma_alloc_coherent(). >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>>> alloc uncached memory? >>>> >>>> struct page *page; >>>> pgd_t *pgd; >>>> pud_t *pud; >>>> pmd_t *pmd; >>>> pte_t *pte; >>>> void *cpu_addr; >>>> dma_addr_t dma_addr; >>>> unsigned int vaddr; >>>> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>>> vaddr = (unsigned int)uncached->cpu_addr; >>>> pgd = pgd_offset_k(vaddr); >>>> pud = pud_offset(pgd, vaddr); >>>> pmd = pmd_offset(pud, vaddr); >>>> pte = pte_offset_kernel(pmd, vaddr); >>>> page = virt_to_page(vaddr); >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>>> >>>> /* This kmalloc memory won't be freed */ >>>> >>> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >>> which cannot be (easily) changed at runtime. You really want to be using >>> dma_alloc_coherent here. >> >> For "lowmem pages", do you mean the first 16M physical memory? >> How about that if I only use highmem pages(>16M)? >> > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > to pages which do not have a permanent mapping in the kernel address space. If > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > region. Thanks for the explanation. > > What's the reason you can't use dma_alloc_coherent? I'm actually testing WIFI RX performance on a ARM based AP. WIFI to Ethernet traffic, that is WIFI driver RX packets and then Ethernet driver TX packets. I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver to receive packets. But then Ethernet driver can't send packets successfully. If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. I know this is too platform/drivers specific problem, but any suggestion would be appreciated. Thanks. > > Thanks, > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by kanga.kvack.org (Postfix) with ESMTP id C4F866B0044 for ; Wed, 16 Apr 2014 18:44:09 -0400 (EDT) Received: by mail-wi0-f178.google.com with SMTP id bs8so19231wib.5 for ; Wed, 16 Apr 2014 15:44:09 -0700 (PDT) Received: from pandora.arm.linux.org.uk (pandora.arm.linux.org.uk. [2001:4d48:ad52:3201:214:fdff:fe10:1be6]) by mx.google.com with ESMTPS id lm4si219822wic.116.2014.04.16.15.44.06 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 16 Apr 2014 15:44:07 -0700 (PDT) Date: Wed, 16 Apr 2014 23:43:24 +0100 From: Russell King - ARM Linux Subject: Re: kmalloc and uncached memory Message-ID: <20140416224324.GO24070@n2100.arm.linux.org.uk> References: <534ECCEB.6090007@codeaurora.org> <534ED412.1040909@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Lin Ming Cc: Laura Abbott , Peter Zijlstra , linux-mm , "linux-arm-kernel@lists.infradead.org" On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: > On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott wrote: > > On 4/16/2014 11:50 AM, Lin Ming wrote: > >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: > >>> On 4/16/2014 11:11 AM, Lin Ming wrote: > >>>> Hi Peter, > >>>> > >>>> I have a performance problem(on ARM board) that cpu is very bus at > >>>> cache invalidation. > >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. > >>>> > >>>> But I also have problem with dma_alloc_coherent(). > >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to > >>>> alloc uncached memory? > >>>> > >>>> struct page *page; > >>>> pgd_t *pgd; > >>>> pud_t *pud; > >>>> pmd_t *pmd; > >>>> pte_t *pte; > >>>> void *cpu_addr; > >>>> dma_addr_t dma_addr; > >>>> unsigned int vaddr; > >>>> > >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > >>>> vaddr = (unsigned int)uncached->cpu_addr; > >>>> pgd = pgd_offset_k(vaddr); > >>>> pud = pud_offset(pgd, vaddr); > >>>> pmd = pmd_offset(pud, vaddr); > >>>> pte = pte_offset_kernel(pmd, vaddr); > >>>> page = virt_to_page(vaddr); > >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > >>>> > >>>> /* This kmalloc memory won't be freed */ > >>>> > >>> > >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath > >>> which cannot be (easily) changed at runtime. You really want to be using > >>> dma_alloc_coherent here. > >> > >> For "lowmem pages", do you mean the first 16M physical memory? > >> How about that if I only use highmem pages(>16M)? > >> > > > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > > to pages which do not have a permanent mapping in the kernel address space. If > > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > > region. > > Thanks for the explanation. > > > > > What's the reason you can't use dma_alloc_coherent? > > I'm actually testing WIFI RX performance on a ARM based AP. > WIFI to Ethernet traffic, that is WIFI driver RX packets and then > Ethernet driver TX packets. > > I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver > to receive packets. > But then Ethernet driver can't send packets successfully. > > If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. > > I know this is too platform/drivers specific problem, but any > suggestion would be appreciated. So why are you trying to map the memory into userspace? Given your fragment above, what you're doing there will be no different from using dma_alloc_coherent() - think about what type of mapping you end up with. You have two options on ARM: 1. Use dma_alloc_coherent() - recommended for data which both the CPU and DMA can update simultaneously - eg, descriptor ring buffers typically found on ethernet devices. 2. Use dma_map_page/dma_map_single() for what we call streaming support, which can use kmalloc memory. *But* there is only exactly *one* owner of the buffer at any one time - either the CPU owns it *or* the DMA device owns it. *Only* the current owner may access the buffer. Such mappings must be unmapped before they are freed. Since there's the requirement for ownership in (2), these are not really suitable to be mapped into userspace while DMA is happening - accesses to the buffer while DMA is in progress /can/ corrupt the data. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f41.google.com (mail-la0-f41.google.com [209.85.215.41]) by kanga.kvack.org (Postfix) with ESMTP id D86DD6B0073 for ; Wed, 16 Apr 2014 19:16:18 -0400 (EDT) Received: by mail-la0-f41.google.com with SMTP id gl10so8914957lab.28 for ; Wed, 16 Apr 2014 16:16:18 -0700 (PDT) Received: from mail-la0-x22b.google.com (mail-la0-x22b.google.com [2a00:1450:4010:c03::22b]) by mx.google.com with ESMTPS id w4si15990151lad.122.2014.04.16.16.16.16 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 16 Apr 2014 16:16:17 -0700 (PDT) Received: by mail-la0-f43.google.com with SMTP id e16so8816412lan.30 for ; Wed, 16 Apr 2014 16:16:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20140416224324.GO24070@n2100.arm.linux.org.uk> References: <534ECCEB.6090007@codeaurora.org> <534ED412.1040909@codeaurora.org> <20140416224324.GO24070@n2100.arm.linux.org.uk> Date: Wed, 16 Apr 2014 16:16:16 -0700 Message-ID: Subject: Re: kmalloc and uncached memory From: Lin Ming Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Russell King - ARM Linux Cc: Laura Abbott , Peter Zijlstra , linux-mm , "linux-arm-kernel@lists.infradead.org" On Wed, Apr 16, 2014 at 3:43 PM, Russell King - ARM Linux wrote: > On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott wrote: >> > On 4/16/2014 11:50 AM, Lin Ming wrote: >> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: >> >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >> >>>> Hi Peter, >> >>>> >> >>>> I have a performance problem(on ARM board) that cpu is very bus at >> >>>> cache invalidation. >> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >>>> >> >>>> But I also have problem with dma_alloc_coherent(). >> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >> >>>> alloc uncached memory? >> >>>> >> >>>> struct page *page; >> >>>> pgd_t *pgd; >> >>>> pud_t *pud; >> >>>> pmd_t *pmd; >> >>>> pte_t *pte; >> >>>> void *cpu_addr; >> >>>> dma_addr_t dma_addr; >> >>>> unsigned int vaddr; >> >>>> >> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> >>>> vaddr = (unsigned int)uncached->cpu_addr; >> >>>> pgd = pgd_offset_k(vaddr); >> >>>> pud = pud_offset(pgd, vaddr); >> >>>> pmd = pmd_offset(pud, vaddr); >> >>>> pte = pte_offset_kernel(pmd, vaddr); >> >>>> page = virt_to_page(vaddr); >> >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >>>> >> >>>> /* This kmalloc memory won't be freed */ >> >>>> >> >>> >> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> >>> which cannot be (easily) changed at runtime. You really want to be using >> >>> dma_alloc_coherent here. >> >> >> >> For "lowmem pages", do you mean the first 16M physical memory? >> >> How about that if I only use highmem pages(>16M)? >> >> >> > >> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers >> > to pages which do not have a permanent mapping in the kernel address space. If >> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem >> > region. >> >> Thanks for the explanation. >> >> > >> > What's the reason you can't use dma_alloc_coherent? >> >> I'm actually testing WIFI RX performance on a ARM based AP. >> WIFI to Ethernet traffic, that is WIFI driver RX packets and then >> Ethernet driver TX packets. >> >> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver >> to receive packets. >> But then Ethernet driver can't send packets successfully. >> >> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. >> >> I know this is too platform/drivers specific problem, but any >> suggestion would be appreciated. > > So why are you trying to map the memory into userspace? I didn't map the memory into userspace. Or am I missing something obviously? > > Given your fragment above, what you're doing there will be no different > from using dma_alloc_coherent() - think about what type of mapping you > end up with. > > You have two options on ARM: > > 1. Use dma_alloc_coherent() - recommended for data which both the CPU and > DMA can update simultaneously - eg, descriptor ring buffers typically > found on ethernet devices. > > 2. Use dma_map_page/dma_map_single() for what we call streaming support, > which can use kmalloc memory. *But* there is only exactly *one* owner > of the buffer at any one time - either the CPU owns it *or* the DMA > device owns it. *Only* the current owner may access the buffer. > Such mappings must be unmapped before they are freed. My WIFI RX driver did 2). Here is a piece of perf_event log. Seems the bottleneck is at CPU cache invalidate operation. 33.86% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range | --- v7_dma_inv_range | |--51.46%-- ___dma_page_cpu_to_dev | skb2rbd_attach | vmac_rx_poll | net_rx_action | __do_softirq | run_ksoftirqd | kthread | kernel_thread_exit | --48.54%-- ___dma_page_dev_to_cpu vmac_rx_poll net_rx_action __do_softirq run_ksoftirqd kthread kernel_thread_exit So I try to do 1). Use dma_alloc_coherent() to eliminate cache invalidate operation. But for some reason, ethernet driver didn't TX successfully the uncached buffer. Thanks. > > Since there's the requirement for ownership in (2), these are not really > suitable to be mapped into userspace while DMA is happening - accesses to > the buffer while DMA is in progress /can/ corrupt the data. > > -- > FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly > improving, and getting towards what was expected from it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: minggr@gmail.com (Lin Ming) Date: Wed, 16 Apr 2014 11:11:39 -0700 Subject: kmalloc and uncached memory Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Peter, I have a performance problem(on ARM board) that cpu is very bus at cache invalidation. So I'm trying to alloc an uncached memory to eliminate cache invalidation. But I also have problem with dma_alloc_coherent(). If I don't use dma_alloc_coherent(), is it OK to use below code to alloc uncached memory? struct page *page; pgd_t *pgd; pud_t *pud; pmd_t *pmd; pte_t *pte; void *cpu_addr; dma_addr_t dma_addr; unsigned int vaddr; cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); vaddr = (unsigned int)uncached->cpu_addr; pgd = pgd_offset_k(vaddr); pud = pud_offset(pgd, vaddr); pmd = pmd_offset(pud, vaddr); pte = pte_offset_kernel(pmd, vaddr); page = virt_to_page(vaddr); set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); /* This kmalloc memory won't be freed */ Thanks, Ming From mboxrd@z Thu Jan 1 00:00:00 1970 From: lauraa@codeaurora.org (Laura Abbott) Date: Wed, 16 Apr 2014 11:33:15 -0700 Subject: kmalloc and uncached memory In-Reply-To: References: Message-ID: <534ECCEB.6090007@codeaurora.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 4/16/2014 11:11 AM, Lin Ming wrote: > Hi Peter, > > I have a performance problem(on ARM board) that cpu is very bus at > cache invalidation. > So I'm trying to alloc an uncached memory to eliminate cache invalidation. > > But I also have problem with dma_alloc_coherent(). > If I don't use dma_alloc_coherent(), is it OK to use below code to > alloc uncached memory? > > struct page *page; > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > pte_t *pte; > void *cpu_addr; > dma_addr_t dma_addr; > unsigned int vaddr; > > cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > vaddr = (unsigned int)uncached->cpu_addr; > pgd = pgd_offset_k(vaddr); > pud = pud_offset(pgd, vaddr); > pmd = pmd_offset(pud, vaddr); > pte = pte_offset_kernel(pmd, vaddr); > page = virt_to_page(vaddr); > set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > > /* This kmalloc memory won't be freed */ > No, that will not work. lowmem pages are mapped with 1MB sections underneath which cannot be (easily) changed at runtime. You really want to be using dma_alloc_coherent here. Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From mboxrd@z Thu Jan 1 00:00:00 1970 From: minggr@gmail.com (Lin Ming) Date: Wed, 16 Apr 2014 11:50:38 -0700 Subject: kmalloc and uncached memory In-Reply-To: <534ECCEB.6090007@codeaurora.org> References: <534ECCEB.6090007@codeaurora.org> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: > On 4/16/2014 11:11 AM, Lin Ming wrote: >> Hi Peter, >> >> I have a performance problem(on ARM board) that cpu is very bus at >> cache invalidation. >> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >> But I also have problem with dma_alloc_coherent(). >> If I don't use dma_alloc_coherent(), is it OK to use below code to >> alloc uncached memory? >> >> struct page *page; >> pgd_t *pgd; >> pud_t *pud; >> pmd_t *pmd; >> pte_t *pte; >> void *cpu_addr; >> dma_addr_t dma_addr; >> unsigned int vaddr; >> >> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> vaddr = (unsigned int)uncached->cpu_addr; >> pgd = pgd_offset_k(vaddr); >> pud = pud_offset(pgd, vaddr); >> pmd = pmd_offset(pud, vaddr); >> pte = pte_offset_kernel(pmd, vaddr); >> page = virt_to_page(vaddr); >> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >> /* This kmalloc memory won't be freed */ >> > > No, that will not work. lowmem pages are mapped with 1MB sections underneath > which cannot be (easily) changed at runtime. You really want to be using > dma_alloc_coherent here. For "lowmem pages", do you mean the first 16M physical memory? How about that if I only use highmem pages(>16M)? Thanks. > > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation From mboxrd@z Thu Jan 1 00:00:00 1970 From: lauraa@codeaurora.org (Laura Abbott) Date: Wed, 16 Apr 2014 12:03:46 -0700 Subject: kmalloc and uncached memory In-Reply-To: References: <534ECCEB.6090007@codeaurora.org> Message-ID: <534ED412.1040909@codeaurora.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 4/16/2014 11:50 AM, Lin Ming wrote: > On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: >> On 4/16/2014 11:11 AM, Lin Ming wrote: >>> Hi Peter, >>> >>> I have a performance problem(on ARM board) that cpu is very bus at >>> cache invalidation. >>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>> >>> But I also have problem with dma_alloc_coherent(). >>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>> alloc uncached memory? >>> >>> struct page *page; >>> pgd_t *pgd; >>> pud_t *pud; >>> pmd_t *pmd; >>> pte_t *pte; >>> void *cpu_addr; >>> dma_addr_t dma_addr; >>> unsigned int vaddr; >>> >>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>> vaddr = (unsigned int)uncached->cpu_addr; >>> pgd = pgd_offset_k(vaddr); >>> pud = pud_offset(pgd, vaddr); >>> pmd = pmd_offset(pud, vaddr); >>> pte = pte_offset_kernel(pmd, vaddr); >>> page = virt_to_page(vaddr); >>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>> >>> /* This kmalloc memory won't be freed */ >>> >> >> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> which cannot be (easily) changed at runtime. You really want to be using >> dma_alloc_coherent here. > > For "lowmem pages", do you mean the first 16M physical memory? > How about that if I only use highmem pages(>16M)? > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers to pages which do not have a permanent mapping in the kernel address space. If you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem region. What's the reason you can't use dma_alloc_coherent? Thanks, Laura -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation From mboxrd@z Thu Jan 1 00:00:00 1970 From: minggr@gmail.com (Lin Ming) Date: Wed, 16 Apr 2014 14:28:45 -0700 Subject: kmalloc and uncached memory In-Reply-To: <534ED412.1040909@codeaurora.org> References: <534ECCEB.6090007@codeaurora.org> <534ED412.1040909@codeaurora.org> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott wrote: > On 4/16/2014 11:50 AM, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >>>> Hi Peter, >>>> >>>> I have a performance problem(on ARM board) that cpu is very bus at >>>> cache invalidation. >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >>>> >>>> But I also have problem with dma_alloc_coherent(). >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >>>> alloc uncached memory? >>>> >>>> struct page *page; >>>> pgd_t *pgd; >>>> pud_t *pud; >>>> pmd_t *pmd; >>>> pte_t *pte; >>>> void *cpu_addr; >>>> dma_addr_t dma_addr; >>>> unsigned int vaddr; >>>> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >>>> vaddr = (unsigned int)uncached->cpu_addr; >>>> pgd = pgd_offset_k(vaddr); >>>> pud = pud_offset(pgd, vaddr); >>>> pmd = pmd_offset(pud, vaddr); >>>> pte = pte_offset_kernel(pmd, vaddr); >>>> page = virt_to_page(vaddr); >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >>>> >>>> /* This kmalloc memory won't be freed */ >>>> >>> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >>> which cannot be (easily) changed at runtime. You really want to be using >>> dma_alloc_coherent here. >> >> For "lowmem pages", do you mean the first 16M physical memory? >> How about that if I only use highmem pages(>16M)? >> > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > to pages which do not have a permanent mapping in the kernel address space. If > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > region. Thanks for the explanation. > > What's the reason you can't use dma_alloc_coherent? I'm actually testing WIFI RX performance on a ARM based AP. WIFI to Ethernet traffic, that is WIFI driver RX packets and then Ethernet driver TX packets. I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver to receive packets. But then Ethernet driver can't send packets successfully. If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. I know this is too platform/drivers specific problem, but any suggestion would be appreciated. Thanks. > > Thanks, > Laura > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@arm.linux.org.uk (Russell King - ARM Linux) Date: Wed, 16 Apr 2014 23:43:24 +0100 Subject: kmalloc and uncached memory In-Reply-To: References: <534ECCEB.6090007@codeaurora.org> <534ED412.1040909@codeaurora.org> Message-ID: <20140416224324.GO24070@n2100.arm.linux.org.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: > On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott wrote: > > On 4/16/2014 11:50 AM, Lin Ming wrote: > >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: > >>> On 4/16/2014 11:11 AM, Lin Ming wrote: > >>>> Hi Peter, > >>>> > >>>> I have a performance problem(on ARM board) that cpu is very bus at > >>>> cache invalidation. > >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. > >>>> > >>>> But I also have problem with dma_alloc_coherent(). > >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to > >>>> alloc uncached memory? > >>>> > >>>> struct page *page; > >>>> pgd_t *pgd; > >>>> pud_t *pud; > >>>> pmd_t *pmd; > >>>> pte_t *pte; > >>>> void *cpu_addr; > >>>> dma_addr_t dma_addr; > >>>> unsigned int vaddr; > >>>> > >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); > >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); > >>>> vaddr = (unsigned int)uncached->cpu_addr; > >>>> pgd = pgd_offset_k(vaddr); > >>>> pud = pud_offset(pgd, vaddr); > >>>> pmd = pmd_offset(pud, vaddr); > >>>> pte = pte_offset_kernel(pmd, vaddr); > >>>> page = virt_to_page(vaddr); > >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); > >>>> > >>>> /* This kmalloc memory won't be freed */ > >>>> > >>> > >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath > >>> which cannot be (easily) changed at runtime. You really want to be using > >>> dma_alloc_coherent here. > >> > >> For "lowmem pages", do you mean the first 16M physical memory? > >> How about that if I only use highmem pages(>16M)? > >> > > > > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers > > to pages which do not have a permanent mapping in the kernel address space. If > > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem > > region. > > Thanks for the explanation. > > > > > What's the reason you can't use dma_alloc_coherent? > > I'm actually testing WIFI RX performance on a ARM based AP. > WIFI to Ethernet traffic, that is WIFI driver RX packets and then > Ethernet driver TX packets. > > I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver > to receive packets. > But then Ethernet driver can't send packets successfully. > > If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. > > I know this is too platform/drivers specific problem, but any > suggestion would be appreciated. So why are you trying to map the memory into userspace? Given your fragment above, what you're doing there will be no different from using dma_alloc_coherent() - think about what type of mapping you end up with. You have two options on ARM: 1. Use dma_alloc_coherent() - recommended for data which both the CPU and DMA can update simultaneously - eg, descriptor ring buffers typically found on ethernet devices. 2. Use dma_map_page/dma_map_single() for what we call streaming support, which can use kmalloc memory. *But* there is only exactly *one* owner of the buffer at any one time - either the CPU owns it *or* the DMA device owns it. *Only* the current owner may access the buffer. Such mappings must be unmapped before they are freed. Since there's the requirement for ownership in (2), these are not really suitable to be mapped into userspace while DMA is happening - accesses to the buffer while DMA is in progress /can/ corrupt the data. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. From mboxrd@z Thu Jan 1 00:00:00 1970 From: minggr@gmail.com (Lin Ming) Date: Wed, 16 Apr 2014 16:16:16 -0700 Subject: kmalloc and uncached memory In-Reply-To: <20140416224324.GO24070@n2100.arm.linux.org.uk> References: <534ECCEB.6090007@codeaurora.org> <534ED412.1040909@codeaurora.org> <20140416224324.GO24070@n2100.arm.linux.org.uk> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Apr 16, 2014 at 3:43 PM, Russell King - ARM Linux wrote: > On Wed, Apr 16, 2014 at 02:28:45PM -0700, Lin Ming wrote: >> On Wed, Apr 16, 2014 at 12:03 PM, Laura Abbott wrote: >> > On 4/16/2014 11:50 AM, Lin Ming wrote: >> >> On Wed, Apr 16, 2014 at 11:33 AM, Laura Abbott wrote: >> >>> On 4/16/2014 11:11 AM, Lin Ming wrote: >> >>>> Hi Peter, >> >>>> >> >>>> I have a performance problem(on ARM board) that cpu is very bus at >> >>>> cache invalidation. >> >>>> So I'm trying to alloc an uncached memory to eliminate cache invalidation. >> >>>> >> >>>> But I also have problem with dma_alloc_coherent(). >> >>>> If I don't use dma_alloc_coherent(), is it OK to use below code to >> >>>> alloc uncached memory? >> >>>> >> >>>> struct page *page; >> >>>> pgd_t *pgd; >> >>>> pud_t *pud; >> >>>> pmd_t *pmd; >> >>>> pte_t *pte; >> >>>> void *cpu_addr; >> >>>> dma_addr_t dma_addr; >> >>>> unsigned int vaddr; >> >>>> >> >>>> cpu_addr = kmalloc(PAGE_SIZE, GFP_KERNEL); >> >>>> dma_addr = pci_map_single(NULL, cpu_addr, PAGE_SIZE, (int)DMA_FROM_DEVICE); >> >>>> vaddr = (unsigned int)uncached->cpu_addr; >> >>>> pgd = pgd_offset_k(vaddr); >> >>>> pud = pud_offset(pgd, vaddr); >> >>>> pmd = pmd_offset(pud, vaddr); >> >>>> pte = pte_offset_kernel(pmd, vaddr); >> >>>> page = virt_to_page(vaddr); >> >>>> set_pte_ext(pte, mk_pte(page, pgprot_dmacoherent(pgprot_kernel)), 0); >> >>>> >> >>>> /* This kmalloc memory won't be freed */ >> >>>> >> >>> >> >>> No, that will not work. lowmem pages are mapped with 1MB sections underneath >> >>> which cannot be (easily) changed at runtime. You really want to be using >> >>> dma_alloc_coherent here. >> >> >> >> For "lowmem pages", do you mean the first 16M physical memory? >> >> How about that if I only use highmem pages(>16M)? >> >> >> > >> > By lowmem pages I am referring to the direct mapped kernel area. Highmem refers >> > to pages which do not have a permanent mapping in the kernel address space. If >> > you are calling kmalloc with GFP_KERNEL you will be getting a page from the lowmem >> > region. >> >> Thanks for the explanation. >> >> > >> > What's the reason you can't use dma_alloc_coherent? >> >> I'm actually testing WIFI RX performance on a ARM based AP. >> WIFI to Ethernet traffic, that is WIFI driver RX packets and then >> Ethernet driver TX packets. >> >> I used dma_alloc_coherent() to allocate uncached buffer in WIFI driver >> to receive packets. >> But then Ethernet driver can't send packets successfully. >> >> If I used kmalloc() to allocate buffers in WIFI driver, then everything is OK. >> >> I know this is too platform/drivers specific problem, but any >> suggestion would be appreciated. > > So why are you trying to map the memory into userspace? I didn't map the memory into userspace. Or am I missing something obviously? > > Given your fragment above, what you're doing there will be no different > from using dma_alloc_coherent() - think about what type of mapping you > end up with. > > You have two options on ARM: > > 1. Use dma_alloc_coherent() - recommended for data which both the CPU and > DMA can update simultaneously - eg, descriptor ring buffers typically > found on ethernet devices. > > 2. Use dma_map_page/dma_map_single() for what we call streaming support, > which can use kmalloc memory. *But* there is only exactly *one* owner > of the buffer at any one time - either the CPU owns it *or* the DMA > device owns it. *Only* the current owner may access the buffer. > Such mappings must be unmapped before they are freed. My WIFI RX driver did 2). Here is a piece of perf_event log. Seems the bottleneck is at CPU cache invalidate operation. 33.86% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range | --- v7_dma_inv_range | |--51.46%-- ___dma_page_cpu_to_dev | skb2rbd_attach | vmac_rx_poll | net_rx_action | __do_softirq | run_ksoftirqd | kthread | kernel_thread_exit | --48.54%-- ___dma_page_dev_to_cpu vmac_rx_poll net_rx_action __do_softirq run_ksoftirqd kthread kernel_thread_exit So I try to do 1). Use dma_alloc_coherent() to eliminate cache invalidate operation. But for some reason, ethernet driver didn't TX successfully the uncached buffer. Thanks. > > Since there's the requirement for ownership in (2), these are not really > suitable to be mapped into userspace while DMA is happening - accesses to > the buffer while DMA is in progress /can/ corrupt the data. > > -- > FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly > improving, and getting towards what was expected from it.