From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751149AbeFCCL6 (ORCPT ); Sat, 2 Jun 2018 22:11:58 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:45878 "EHLO mail-oi0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750983AbeFCCL5 (ORCPT ); Sat, 2 Jun 2018 22:11:57 -0400 X-Google-Smtp-Source: ADUXVKL8FlmLTQTS4Ipbq2ngKN8sWVSbG5Eg4fA0lIOX9dz9Q5o2SzhT/wecH1bS86wbxrLLZJ1kThWBeB9VeBWT3l8= MIME-Version: 1.0 In-Reply-To: <8ef2eacf-97af-3449-c3ef-c64516ee05ea@arm.com> References: <892fabb55ebe7bb35aa3f0be03ee3f93132b7acc.1527745258.git.baolin.wang@linaro.org> <8ef2eacf-97af-3449-c3ef-c64516ee05ea@arm.com> From: Baolin Wang Date: Sun, 3 Jun 2018 10:11:55 +0800 Message-ID: Subject: Re: [PATCH 2/2] dma-coherent: Change the bitmap APIs To: Robin Murphy Cc: hch@lst.de, Marek Szyprowski , Greg KH , iommu@lists.linux-foundation.org, LKML , Arnd Bergmann , Mark Brown Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1 June 2018 at 01:38, Robin Murphy wrote: > On 31/05/18 06:55, Baolin Wang wrote: >> >> The device coherent memory uses the bitmap helper functions, which take an >> order of PAGE_SIZE, that means the pages size is always a power of 2 as >> the >> allocation region. For Example, allocating 33 MB from a 33 MB dma_mem >> region >> requires 64MB free memory in that region. >> >> Thus we can change to use bitmap_find_next_zero_area()/bitmap_set()/ >> bitmap_clear() to present the allocation coherent memory, and reduce the >> allocation granularity to be one PAGE_SIZE. >> >> Moreover from Arnd's description: >> "I believe that bitmap_allocate_region() was chosen here because it is >> more efficient than bitmap_find_next_zero_area(), at least that is the >> explanation given in >> https://en.wikipedia.org/wiki/Buddy_memory_allocation. >> >> It's quite possible that we don't actually care about efficiency of >> dma_alloc_*() since a) it's supposed to be called very rarely, and >> b) the overhead of accessing uncached memory is likely higher than the >> search through a relatively small bitmap". >> >> Thus I think we can convert to change the allocation granularity to be >> one PAGE_SIZE replacing with new bitmap APIs, which will not cause >> efficiency issue. > > > To quote the DMA API docs: > > "The CPU virtual address and the DMA address are both > guaranteed to be aligned to the smallest PAGE_SIZE order which > is greater than or equal to the requested size. This invariant > exists (for example) to guarantee that if you allocate a chunk > which is smaller than or equal to 64 kilobytes, the extent of the > buffer you receive will not cross a 64K boundary." > > Now obviously there's a point above which that stops being practical, but Agree. > it's probably safe to assume that a device asking for a multi-megabyte > buffer doesn't actually have stringent boundary requirements either. For > smaller sizes, though, it definitely can matter. At the very least up to > 64KB, and probably up as far as MAX_ORDER-size requests, we need to preserve > the existing behaviour or something, somewhere, will break. Some devices really don't care the boundary issue, and as you said maybe some devices will care for smaller sizes. So I think this is depend on the devices' requirement, then can we add one boundary parameter for the allocation according to the devices requirement? Or like I said, we will waste lots of reserved memory if the granularity is so large. Thanks for your comments. -- Baolin.wang Best Regards From mboxrd@z Thu Jan 1 00:00:00 1970 From: Baolin Wang Subject: Re: [PATCH 2/2] dma-coherent: Change the bitmap APIs Date: Sun, 3 Jun 2018 10:11:55 +0800 Message-ID: References: <892fabb55ebe7bb35aa3f0be03ee3f93132b7acc.1527745258.git.baolin.wang@linaro.org> <8ef2eacf-97af-3449-c3ef-c64516ee05ea@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <8ef2eacf-97af-3449-c3ef-c64516ee05ea-5wv7dgnIgG8@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Robin Murphy Cc: Arnd Bergmann , Greg KH , LKML , iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Mark Brown , hch-jcswGhMUV9g@public.gmane.org List-Id: iommu@lists.linux-foundation.org On 1 June 2018 at 01:38, Robin Murphy wrote: > On 31/05/18 06:55, Baolin Wang wrote: >> >> The device coherent memory uses the bitmap helper functions, which take an >> order of PAGE_SIZE, that means the pages size is always a power of 2 as >> the >> allocation region. For Example, allocating 33 MB from a 33 MB dma_mem >> region >> requires 64MB free memory in that region. >> >> Thus we can change to use bitmap_find_next_zero_area()/bitmap_set()/ >> bitmap_clear() to present the allocation coherent memory, and reduce the >> allocation granularity to be one PAGE_SIZE. >> >> Moreover from Arnd's description: >> "I believe that bitmap_allocate_region() was chosen here because it is >> more efficient than bitmap_find_next_zero_area(), at least that is the >> explanation given in >> https://en.wikipedia.org/wiki/Buddy_memory_allocation. >> >> It's quite possible that we don't actually care about efficiency of >> dma_alloc_*() since a) it's supposed to be called very rarely, and >> b) the overhead of accessing uncached memory is likely higher than the >> search through a relatively small bitmap". >> >> Thus I think we can convert to change the allocation granularity to be >> one PAGE_SIZE replacing with new bitmap APIs, which will not cause >> efficiency issue. > > > To quote the DMA API docs: > > "The CPU virtual address and the DMA address are both > guaranteed to be aligned to the smallest PAGE_SIZE order which > is greater than or equal to the requested size. This invariant > exists (for example) to guarantee that if you allocate a chunk > which is smaller than or equal to 64 kilobytes, the extent of the > buffer you receive will not cross a 64K boundary." > > Now obviously there's a point above which that stops being practical, but Agree. > it's probably safe to assume that a device asking for a multi-megabyte > buffer doesn't actually have stringent boundary requirements either. For > smaller sizes, though, it definitely can matter. At the very least up to > 64KB, and probably up as far as MAX_ORDER-size requests, we need to preserve > the existing behaviour or something, somewhere, will break. Some devices really don't care the boundary issue, and as you said maybe some devices will care for smaller sizes. So I think this is depend on the devices' requirement, then can we add one boundary parameter for the allocation according to the devices requirement? Or like I said, we will waste lots of reserved memory if the granularity is so large. Thanks for your comments. -- Baolin.wang Best Regards