* [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory @ 2018-05-31 5:55 Baolin Wang 2018-05-31 5:55 ` [PATCH 2/2] dma-coherent: Change the bitmap APIs Baolin Wang 2018-05-31 17:25 ` [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory Robin Murphy 0 siblings, 2 replies; 9+ messages in thread From: Baolin Wang @ 2018-05-31 5:55 UTC (permalink / raw) To: hch, m.szyprowski Cc: robin.murphy, gregkh, iommu, linux-kernel, arnd, broonie, baolin.wang It is incorrect to use mem->size to valid if there are enough coherent memory to allocate in __dma_alloc_from_coherent(), since some devices may mark some occupied coherent memory by dma_mark_declared_memory_occupied(). So we can introduce one 'avail' parameter to save the available device coherent memory, to valid if we have enough coherent memory for the device to allocate. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> --- drivers/base/dma-coherent.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index 597d408..ce19832 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -18,6 +18,7 @@ struct dma_coherent_mem { unsigned long *bitmap; spinlock_t spinlock; bool use_dev_dma_pfn_offset; + int avail; }; static struct dma_coherent_mem *dma_coherent_default_memory __ro_after_init; @@ -73,6 +74,7 @@ static int dma_init_coherent_memory( dma_mem->device_base = device_addr; dma_mem->pfn_base = PFN_DOWN(phys_addr); dma_mem->size = pages; + dma_mem->avail = pages; dma_mem->flags = flags; spin_lock_init(&dma_mem->spinlock); @@ -141,6 +143,7 @@ void *dma_mark_declared_memory_occupied(struct device *dev, dma_addr_t device_addr, size_t size) { struct dma_coherent_mem *mem = dev->dma_mem; + int order = get_order(size); unsigned long flags; int pos, err; @@ -151,7 +154,9 @@ void *dma_mark_declared_memory_occupied(struct device *dev, spin_lock_irqsave(&mem->spinlock, flags); pos = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); - err = bitmap_allocate_region(mem->bitmap, pos, get_order(size)); + err = bitmap_allocate_region(mem->bitmap, pos, order); + if (!err) + mem->avail -= 1 << order; spin_unlock_irqrestore(&mem->spinlock, flags); if (err != 0) @@ -170,7 +175,7 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, spin_lock_irqsave(&mem->spinlock, flags); - if (unlikely(size > (mem->size << PAGE_SHIFT))) + if (unlikely(size > (mem->avail << PAGE_SHIFT))) goto err; pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); @@ -182,6 +187,7 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, */ *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); ret = mem->virt_base + (pageno << PAGE_SHIFT); + mem->avail -= 1 << order; spin_unlock_irqrestore(&mem->spinlock, flags); memset(ret, 0, size); return ret; @@ -244,6 +250,7 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, spin_lock_irqsave(&mem->spinlock, flags); bitmap_release_region(mem->bitmap, page, order); + mem->avail += 1 << order; spin_unlock_irqrestore(&mem->spinlock, flags); return 1; } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] dma-coherent: Change the bitmap APIs 2018-05-31 5:55 [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory Baolin Wang @ 2018-05-31 5:55 ` Baolin Wang 2018-05-31 17:38 ` Robin Murphy 2018-05-31 17:25 ` [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory Robin Murphy 1 sibling, 1 reply; 9+ messages in thread From: Baolin Wang @ 2018-05-31 5:55 UTC (permalink / raw) To: hch, m.szyprowski Cc: robin.murphy, gregkh, iommu, linux-kernel, arnd, broonie, baolin.wang The device coherent memory uses the bitmap helper functions, which take an order of PAGE_SIZE, that means the pages size is always a power of 2 as the allocation region. For Example, allocating 33 MB from a 33 MB dma_mem region requires 64MB free memory in that region. Thus we can change to use bitmap_find_next_zero_area()/bitmap_set()/ bitmap_clear() to present the allocation coherent memory, and reduce the allocation granularity to be one PAGE_SIZE. Moreover from Arnd's description: "I believe that bitmap_allocate_region() was chosen here because it is more efficient than bitmap_find_next_zero_area(), at least that is the explanation given in https://en.wikipedia.org/wiki/Buddy_memory_allocation. It's quite possible that we don't actually care about efficiency of dma_alloc_*() since a) it's supposed to be called very rarely, and b) the overhead of accessing uncached memory is likely higher than the search through a relatively small bitmap". Thus I think we can convert to change the allocation granularity to be one PAGE_SIZE replacing with new bitmap APIs, which will not cause efficiency issue. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> --- drivers/base/dma-coherent.c | 54 +++++++++++++++++++++++-------------------- include/linux/dma-mapping.h | 6 ++--- 2 files changed, 32 insertions(+), 28 deletions(-) diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index ce19832..4131540 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -143,34 +143,31 @@ void *dma_mark_declared_memory_occupied(struct device *dev, dma_addr_t device_addr, size_t size) { struct dma_coherent_mem *mem = dev->dma_mem; - int order = get_order(size); unsigned long flags; - int pos, err; - - size += device_addr & ~PAGE_MASK; + int start_bit, nbits; if (!mem) return ERR_PTR(-EINVAL); + size += device_addr & ~PAGE_MASK; + nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; + spin_lock_irqsave(&mem->spinlock, flags); - pos = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); - err = bitmap_allocate_region(mem->bitmap, pos, order); - if (!err) - mem->avail -= 1 << order; + start_bit = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); + bitmap_set(mem->bitmap, start_bit, nbits); + mem->avail -= nbits; spin_unlock_irqrestore(&mem->spinlock, flags); - if (err != 0) - return ERR_PTR(err); - return mem->virt_base + (pos << PAGE_SHIFT); + return mem->virt_base + (start_bit << PAGE_SHIFT); } EXPORT_SYMBOL(dma_mark_declared_memory_occupied); static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, ssize_t size, dma_addr_t *dma_handle) { - int order = get_order(size); + int nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; unsigned long flags; - int pageno; + int start_bit, end_bit; void *ret; spin_lock_irqsave(&mem->spinlock, flags); @@ -178,16 +175,22 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, if (unlikely(size > (mem->avail << PAGE_SHIFT))) goto err; - pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); - if (unlikely(pageno < 0)) + start_bit = 0; + end_bit = mem->size; + + start_bit = bitmap_find_next_zero_area(mem->bitmap, end_bit, start_bit, + nbits, 0); + if (start_bit >= end_bit) goto err; + bitmap_set(mem->bitmap, start_bit, nbits); + /* * Memory was found in the coherent area. */ - *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); - ret = mem->virt_base + (pageno << PAGE_SHIFT); - mem->avail -= 1 << order; + *dma_handle = mem->device_base + (start_bit << PAGE_SHIFT); + ret = mem->virt_base + (start_bit << PAGE_SHIFT); + mem->avail -= nbits; spin_unlock_irqrestore(&mem->spinlock, flags); memset(ret, 0, size); return ret; @@ -241,16 +244,17 @@ void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle) } static int __dma_release_from_coherent(struct dma_coherent_mem *mem, - int order, void *vaddr) + int size, void *vaddr) { if (mem && vaddr >= mem->virt_base && vaddr < (mem->virt_base + (mem->size << PAGE_SHIFT))) { - int page = (vaddr - mem->virt_base) >> PAGE_SHIFT; + int nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; + int start_bit = (vaddr - mem->virt_base) >> PAGE_SHIFT; unsigned long flags; spin_lock_irqsave(&mem->spinlock, flags); - bitmap_release_region(mem->bitmap, page, order); - mem->avail += 1 << order; + bitmap_clear(mem->bitmap, start_bit, nbits); + mem->avail += nbits; spin_unlock_irqrestore(&mem->spinlock, flags); return 1; } @@ -260,7 +264,7 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, /** * dma_release_from_dev_coherent() - free memory to device coherent memory pool * @dev: device from which the memory was allocated - * @order: the order of pages allocated + * @size: size of release memory area * @vaddr: virtual address of allocated pages * * This checks whether the memory was allocated from the per-device @@ -269,11 +273,11 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, * Returns 1 if we correctly released the memory, or 0 if the caller should * proceed with releasing memory from generic pools. */ -int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr) +int dma_release_from_dev_coherent(struct device *dev, int size, void *vaddr) { struct dma_coherent_mem *mem = dev_get_coherent_memory(dev); - return __dma_release_from_coherent(mem, order, vaddr); + return __dma_release_from_coherent(mem, size, vaddr); } EXPORT_SYMBOL(dma_release_from_dev_coherent); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index f8ab1c0..29ae92e 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -162,7 +162,7 @@ static inline int is_device_dma_capable(struct device *dev) */ int dma_alloc_from_dev_coherent(struct device *dev, ssize_t size, dma_addr_t *dma_handle, void **ret); -int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr); +int dma_release_from_dev_coherent(struct device *dev, int size, void *vaddr); int dma_mmap_from_dev_coherent(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, size_t size, int *ret); @@ -174,7 +174,7 @@ int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *cpu_addr, #else #define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0) -#define dma_release_from_dev_coherent(dev, order, vaddr) (0) +#define dma_release_from_dev_coherent(dev, size, vaddr) (0) #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0) static inline void *dma_alloc_from_global_coherent(ssize_t size, @@ -540,7 +540,7 @@ static inline void dma_free_attrs(struct device *dev, size_t size, BUG_ON(!ops); WARN_ON(irqs_disabled()); - if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr)) + if (dma_release_from_dev_coherent(dev, size, cpu_addr)) return; if (!ops->free || !cpu_addr) -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] dma-coherent: Change the bitmap APIs @ 2018-05-31 17:38 ` Robin Murphy 0 siblings, 0 replies; 9+ messages in thread From: Robin Murphy @ 2018-05-31 17:38 UTC (permalink / raw) To: Baolin Wang, hch, m.szyprowski; +Cc: gregkh, iommu, linux-kernel, arnd, broonie On 31/05/18 06:55, Baolin Wang wrote: > The device coherent memory uses the bitmap helper functions, which take an > order of PAGE_SIZE, that means the pages size is always a power of 2 as the > allocation region. For Example, allocating 33 MB from a 33 MB dma_mem region > requires 64MB free memory in that region. > > Thus we can change to use bitmap_find_next_zero_area()/bitmap_set()/ > bitmap_clear() to present the allocation coherent memory, and reduce the > allocation granularity to be one PAGE_SIZE. > > Moreover from Arnd's description: > "I believe that bitmap_allocate_region() was chosen here because it is > more efficient than bitmap_find_next_zero_area(), at least that is the > explanation given in https://en.wikipedia.org/wiki/Buddy_memory_allocation. > > It's quite possible that we don't actually care about efficiency of > dma_alloc_*() since a) it's supposed to be called very rarely, and > b) the overhead of accessing uncached memory is likely higher than the > search through a relatively small bitmap". > > Thus I think we can convert to change the allocation granularity to be > one PAGE_SIZE replacing with new bitmap APIs, which will not cause > efficiency issue. To quote the DMA API docs: "The CPU virtual address and the DMA address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary." Now obviously there's a point above which that stops being practical, but it's probably safe to assume that a device asking for a multi-megabyte buffer doesn't actually have stringent boundary requirements either. For smaller sizes, though, it definitely can matter. At the very least up to 64KB, and probably up as far as MAX_ORDER-size requests, we need to preserve the existing behaviour or something, somewhere, will break. Robin. > Signed-off-by: Baolin Wang <baolin.wang@linaro.org> > --- > drivers/base/dma-coherent.c | 54 +++++++++++++++++++++++-------------------- > include/linux/dma-mapping.h | 6 ++--- > 2 files changed, 32 insertions(+), 28 deletions(-) > > diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c > index ce19832..4131540 100644 > --- a/drivers/base/dma-coherent.c > +++ b/drivers/base/dma-coherent.c > @@ -143,34 +143,31 @@ void *dma_mark_declared_memory_occupied(struct device *dev, > dma_addr_t device_addr, size_t size) > { > struct dma_coherent_mem *mem = dev->dma_mem; > - int order = get_order(size); > unsigned long flags; > - int pos, err; > - > - size += device_addr & ~PAGE_MASK; > + int start_bit, nbits; > > if (!mem) > return ERR_PTR(-EINVAL); > > + size += device_addr & ~PAGE_MASK; > + nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; > + > spin_lock_irqsave(&mem->spinlock, flags); > - pos = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); > - err = bitmap_allocate_region(mem->bitmap, pos, order); > - if (!err) > - mem->avail -= 1 << order; > + start_bit = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); > + bitmap_set(mem->bitmap, start_bit, nbits); > + mem->avail -= nbits; > spin_unlock_irqrestore(&mem->spinlock, flags); > > - if (err != 0) > - return ERR_PTR(err); > - return mem->virt_base + (pos << PAGE_SHIFT); > + return mem->virt_base + (start_bit << PAGE_SHIFT); > } > EXPORT_SYMBOL(dma_mark_declared_memory_occupied); > > static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, > ssize_t size, dma_addr_t *dma_handle) > { > - int order = get_order(size); > + int nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; > unsigned long flags; > - int pageno; > + int start_bit, end_bit; > void *ret; > > spin_lock_irqsave(&mem->spinlock, flags); > @@ -178,16 +175,22 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, > if (unlikely(size > (mem->avail << PAGE_SHIFT))) > goto err; > > - pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); > - if (unlikely(pageno < 0)) > + start_bit = 0; > + end_bit = mem->size; > + > + start_bit = bitmap_find_next_zero_area(mem->bitmap, end_bit, start_bit, > + nbits, 0); > + if (start_bit >= end_bit) > goto err; > > + bitmap_set(mem->bitmap, start_bit, nbits); > + > /* > * Memory was found in the coherent area. > */ > - *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); > - ret = mem->virt_base + (pageno << PAGE_SHIFT); > - mem->avail -= 1 << order; > + *dma_handle = mem->device_base + (start_bit << PAGE_SHIFT); > + ret = mem->virt_base + (start_bit << PAGE_SHIFT); > + mem->avail -= nbits; > spin_unlock_irqrestore(&mem->spinlock, flags); > memset(ret, 0, size); > return ret; > @@ -241,16 +244,17 @@ void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle) > } > > static int __dma_release_from_coherent(struct dma_coherent_mem *mem, > - int order, void *vaddr) > + int size, void *vaddr) > { > if (mem && vaddr >= mem->virt_base && vaddr < > (mem->virt_base + (mem->size << PAGE_SHIFT))) { > - int page = (vaddr - mem->virt_base) >> PAGE_SHIFT; > + int nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; > + int start_bit = (vaddr - mem->virt_base) >> PAGE_SHIFT; > unsigned long flags; > > spin_lock_irqsave(&mem->spinlock, flags); > - bitmap_release_region(mem->bitmap, page, order); > - mem->avail += 1 << order; > + bitmap_clear(mem->bitmap, start_bit, nbits); > + mem->avail += nbits; > spin_unlock_irqrestore(&mem->spinlock, flags); > return 1; > } > @@ -260,7 +264,7 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, > /** > * dma_release_from_dev_coherent() - free memory to device coherent memory pool > * @dev: device from which the memory was allocated > - * @order: the order of pages allocated > + * @size: size of release memory area > * @vaddr: virtual address of allocated pages > * > * This checks whether the memory was allocated from the per-device > @@ -269,11 +273,11 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, > * Returns 1 if we correctly released the memory, or 0 if the caller should > * proceed with releasing memory from generic pools. > */ > -int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr) > +int dma_release_from_dev_coherent(struct device *dev, int size, void *vaddr) > { > struct dma_coherent_mem *mem = dev_get_coherent_memory(dev); > > - return __dma_release_from_coherent(mem, order, vaddr); > + return __dma_release_from_coherent(mem, size, vaddr); > } > EXPORT_SYMBOL(dma_release_from_dev_coherent); > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h > index f8ab1c0..29ae92e 100644 > --- a/include/linux/dma-mapping.h > +++ b/include/linux/dma-mapping.h > @@ -162,7 +162,7 @@ static inline int is_device_dma_capable(struct device *dev) > */ > int dma_alloc_from_dev_coherent(struct device *dev, ssize_t size, > dma_addr_t *dma_handle, void **ret); > -int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr); > +int dma_release_from_dev_coherent(struct device *dev, int size, void *vaddr); > > int dma_mmap_from_dev_coherent(struct device *dev, struct vm_area_struct *vma, > void *cpu_addr, size_t size, int *ret); > @@ -174,7 +174,7 @@ int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *cpu_addr, > > #else > #define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0) > -#define dma_release_from_dev_coherent(dev, order, vaddr) (0) > +#define dma_release_from_dev_coherent(dev, size, vaddr) (0) > #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0) > > static inline void *dma_alloc_from_global_coherent(ssize_t size, > @@ -540,7 +540,7 @@ static inline void dma_free_attrs(struct device *dev, size_t size, > BUG_ON(!ops); > WARN_ON(irqs_disabled()); > > - if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr)) > + if (dma_release_from_dev_coherent(dev, size, cpu_addr)) > return; > > if (!ops->free || !cpu_addr) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] dma-coherent: Change the bitmap APIs @ 2018-05-31 17:38 ` Robin Murphy 0 siblings, 0 replies; 9+ messages in thread From: Robin Murphy @ 2018-05-31 17:38 UTC (permalink / raw) To: Baolin Wang, hch-jcswGhMUV9g, m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ Cc: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, broonie-DgEjT+Ai2ygdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4 On 31/05/18 06:55, Baolin Wang wrote: > The device coherent memory uses the bitmap helper functions, which take an > order of PAGE_SIZE, that means the pages size is always a power of 2 as the > allocation region. For Example, allocating 33 MB from a 33 MB dma_mem region > requires 64MB free memory in that region. > > Thus we can change to use bitmap_find_next_zero_area()/bitmap_set()/ > bitmap_clear() to present the allocation coherent memory, and reduce the > allocation granularity to be one PAGE_SIZE. > > Moreover from Arnd's description: > "I believe that bitmap_allocate_region() was chosen here because it is > more efficient than bitmap_find_next_zero_area(), at least that is the > explanation given in https://en.wikipedia.org/wiki/Buddy_memory_allocation. > > It's quite possible that we don't actually care about efficiency of > dma_alloc_*() since a) it's supposed to be called very rarely, and > b) the overhead of accessing uncached memory is likely higher than the > search through a relatively small bitmap". > > Thus I think we can convert to change the allocation granularity to be > one PAGE_SIZE replacing with new bitmap APIs, which will not cause > efficiency issue. To quote the DMA API docs: "The CPU virtual address and the DMA address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary." Now obviously there's a point above which that stops being practical, but it's probably safe to assume that a device asking for a multi-megabyte buffer doesn't actually have stringent boundary requirements either. For smaller sizes, though, it definitely can matter. At the very least up to 64KB, and probably up as far as MAX_ORDER-size requests, we need to preserve the existing behaviour or something, somewhere, will break. Robin. > Signed-off-by: Baolin Wang <baolin.wang-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> > --- > drivers/base/dma-coherent.c | 54 +++++++++++++++++++++++-------------------- > include/linux/dma-mapping.h | 6 ++--- > 2 files changed, 32 insertions(+), 28 deletions(-) > > diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c > index ce19832..4131540 100644 > --- a/drivers/base/dma-coherent.c > +++ b/drivers/base/dma-coherent.c > @@ -143,34 +143,31 @@ void *dma_mark_declared_memory_occupied(struct device *dev, > dma_addr_t device_addr, size_t size) > { > struct dma_coherent_mem *mem = dev->dma_mem; > - int order = get_order(size); > unsigned long flags; > - int pos, err; > - > - size += device_addr & ~PAGE_MASK; > + int start_bit, nbits; > > if (!mem) > return ERR_PTR(-EINVAL); > > + size += device_addr & ~PAGE_MASK; > + nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; > + > spin_lock_irqsave(&mem->spinlock, flags); > - pos = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); > - err = bitmap_allocate_region(mem->bitmap, pos, order); > - if (!err) > - mem->avail -= 1 << order; > + start_bit = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); > + bitmap_set(mem->bitmap, start_bit, nbits); > + mem->avail -= nbits; > spin_unlock_irqrestore(&mem->spinlock, flags); > > - if (err != 0) > - return ERR_PTR(err); > - return mem->virt_base + (pos << PAGE_SHIFT); > + return mem->virt_base + (start_bit << PAGE_SHIFT); > } > EXPORT_SYMBOL(dma_mark_declared_memory_occupied); > > static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, > ssize_t size, dma_addr_t *dma_handle) > { > - int order = get_order(size); > + int nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; > unsigned long flags; > - int pageno; > + int start_bit, end_bit; > void *ret; > > spin_lock_irqsave(&mem->spinlock, flags); > @@ -178,16 +175,22 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, > if (unlikely(size > (mem->avail << PAGE_SHIFT))) > goto err; > > - pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); > - if (unlikely(pageno < 0)) > + start_bit = 0; > + end_bit = mem->size; > + > + start_bit = bitmap_find_next_zero_area(mem->bitmap, end_bit, start_bit, > + nbits, 0); > + if (start_bit >= end_bit) > goto err; > > + bitmap_set(mem->bitmap, start_bit, nbits); > + > /* > * Memory was found in the coherent area. > */ > - *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); > - ret = mem->virt_base + (pageno << PAGE_SHIFT); > - mem->avail -= 1 << order; > + *dma_handle = mem->device_base + (start_bit << PAGE_SHIFT); > + ret = mem->virt_base + (start_bit << PAGE_SHIFT); > + mem->avail -= nbits; > spin_unlock_irqrestore(&mem->spinlock, flags); > memset(ret, 0, size); > return ret; > @@ -241,16 +244,17 @@ void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle) > } > > static int __dma_release_from_coherent(struct dma_coherent_mem *mem, > - int order, void *vaddr) > + int size, void *vaddr) > { > if (mem && vaddr >= mem->virt_base && vaddr < > (mem->virt_base + (mem->size << PAGE_SHIFT))) { > - int page = (vaddr - mem->virt_base) >> PAGE_SHIFT; > + int nbits = (size + (1UL << PAGE_SHIFT) - 1) >> PAGE_SHIFT; > + int start_bit = (vaddr - mem->virt_base) >> PAGE_SHIFT; > unsigned long flags; > > spin_lock_irqsave(&mem->spinlock, flags); > - bitmap_release_region(mem->bitmap, page, order); > - mem->avail += 1 << order; > + bitmap_clear(mem->bitmap, start_bit, nbits); > + mem->avail += nbits; > spin_unlock_irqrestore(&mem->spinlock, flags); > return 1; > } > @@ -260,7 +264,7 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, > /** > * dma_release_from_dev_coherent() - free memory to device coherent memory pool > * @dev: device from which the memory was allocated > - * @order: the order of pages allocated > + * @size: size of release memory area > * @vaddr: virtual address of allocated pages > * > * This checks whether the memory was allocated from the per-device > @@ -269,11 +273,11 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, > * Returns 1 if we correctly released the memory, or 0 if the caller should > * proceed with releasing memory from generic pools. > */ > -int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr) > +int dma_release_from_dev_coherent(struct device *dev, int size, void *vaddr) > { > struct dma_coherent_mem *mem = dev_get_coherent_memory(dev); > > - return __dma_release_from_coherent(mem, order, vaddr); > + return __dma_release_from_coherent(mem, size, vaddr); > } > EXPORT_SYMBOL(dma_release_from_dev_coherent); > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h > index f8ab1c0..29ae92e 100644 > --- a/include/linux/dma-mapping.h > +++ b/include/linux/dma-mapping.h > @@ -162,7 +162,7 @@ static inline int is_device_dma_capable(struct device *dev) > */ > int dma_alloc_from_dev_coherent(struct device *dev, ssize_t size, > dma_addr_t *dma_handle, void **ret); > -int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr); > +int dma_release_from_dev_coherent(struct device *dev, int size, void *vaddr); > > int dma_mmap_from_dev_coherent(struct device *dev, struct vm_area_struct *vma, > void *cpu_addr, size_t size, int *ret); > @@ -174,7 +174,7 @@ int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *cpu_addr, > > #else > #define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0) > -#define dma_release_from_dev_coherent(dev, order, vaddr) (0) > +#define dma_release_from_dev_coherent(dev, size, vaddr) (0) > #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0) > > static inline void *dma_alloc_from_global_coherent(ssize_t size, > @@ -540,7 +540,7 @@ static inline void dma_free_attrs(struct device *dev, size_t size, > BUG_ON(!ops); > WARN_ON(irqs_disabled()); > > - if (dma_release_from_dev_coherent(dev, get_order(size), cpu_addr)) > + if (dma_release_from_dev_coherent(dev, size, cpu_addr)) > return; > > if (!ops->free || !cpu_addr) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] dma-coherent: Change the bitmap APIs @ 2018-06-03 2:11 ` Baolin Wang 0 siblings, 0 replies; 9+ messages in thread From: Baolin Wang @ 2018-06-03 2:11 UTC (permalink / raw) To: Robin Murphy Cc: hch, Marek Szyprowski, Greg KH, iommu, LKML, Arnd Bergmann, Mark Brown On 1 June 2018 at 01:38, Robin Murphy <robin.murphy@arm.com> wrote: > On 31/05/18 06:55, Baolin Wang wrote: >> >> The device coherent memory uses the bitmap helper functions, which take an >> order of PAGE_SIZE, that means the pages size is always a power of 2 as >> the >> allocation region. For Example, allocating 33 MB from a 33 MB dma_mem >> region >> requires 64MB free memory in that region. >> >> Thus we can change to use bitmap_find_next_zero_area()/bitmap_set()/ >> bitmap_clear() to present the allocation coherent memory, and reduce the >> allocation granularity to be one PAGE_SIZE. >> >> Moreover from Arnd's description: >> "I believe that bitmap_allocate_region() was chosen here because it is >> more efficient than bitmap_find_next_zero_area(), at least that is the >> explanation given in >> https://en.wikipedia.org/wiki/Buddy_memory_allocation. >> >> It's quite possible that we don't actually care about efficiency of >> dma_alloc_*() since a) it's supposed to be called very rarely, and >> b) the overhead of accessing uncached memory is likely higher than the >> search through a relatively small bitmap". >> >> Thus I think we can convert to change the allocation granularity to be >> one PAGE_SIZE replacing with new bitmap APIs, which will not cause >> efficiency issue. > > > To quote the DMA API docs: > > "The CPU virtual address and the DMA address are both > guaranteed to be aligned to the smallest PAGE_SIZE order which > is greater than or equal to the requested size. This invariant > exists (for example) to guarantee that if you allocate a chunk > which is smaller than or equal to 64 kilobytes, the extent of the > buffer you receive will not cross a 64K boundary." > > Now obviously there's a point above which that stops being practical, but Agree. > it's probably safe to assume that a device asking for a multi-megabyte > buffer doesn't actually have stringent boundary requirements either. For > smaller sizes, though, it definitely can matter. At the very least up to > 64KB, and probably up as far as MAX_ORDER-size requests, we need to preserve > the existing behaviour or something, somewhere, will break. Some devices really don't care the boundary issue, and as you said maybe some devices will care for smaller sizes. So I think this is depend on the devices' requirement, then can we add one boundary parameter for the allocation according to the devices requirement? Or like I said, we will waste lots of reserved memory if the granularity is so large. Thanks for your comments. -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] dma-coherent: Change the bitmap APIs @ 2018-06-03 2:11 ` Baolin Wang 0 siblings, 0 replies; 9+ messages in thread From: Baolin Wang @ 2018-06-03 2:11 UTC (permalink / raw) To: Robin Murphy Cc: Arnd Bergmann, Greg KH, LKML, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mark Brown, hch-jcswGhMUV9g On 1 June 2018 at 01:38, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote: > On 31/05/18 06:55, Baolin Wang wrote: >> >> The device coherent memory uses the bitmap helper functions, which take an >> order of PAGE_SIZE, that means the pages size is always a power of 2 as >> the >> allocation region. For Example, allocating 33 MB from a 33 MB dma_mem >> region >> requires 64MB free memory in that region. >> >> Thus we can change to use bitmap_find_next_zero_area()/bitmap_set()/ >> bitmap_clear() to present the allocation coherent memory, and reduce the >> allocation granularity to be one PAGE_SIZE. >> >> Moreover from Arnd's description: >> "I believe that bitmap_allocate_region() was chosen here because it is >> more efficient than bitmap_find_next_zero_area(), at least that is the >> explanation given in >> https://en.wikipedia.org/wiki/Buddy_memory_allocation. >> >> It's quite possible that we don't actually care about efficiency of >> dma_alloc_*() since a) it's supposed to be called very rarely, and >> b) the overhead of accessing uncached memory is likely higher than the >> search through a relatively small bitmap". >> >> Thus I think we can convert to change the allocation granularity to be >> one PAGE_SIZE replacing with new bitmap APIs, which will not cause >> efficiency issue. > > > To quote the DMA API docs: > > "The CPU virtual address and the DMA address are both > guaranteed to be aligned to the smallest PAGE_SIZE order which > is greater than or equal to the requested size. This invariant > exists (for example) to guarantee that if you allocate a chunk > which is smaller than or equal to 64 kilobytes, the extent of the > buffer you receive will not cross a 64K boundary." > > Now obviously there's a point above which that stops being practical, but Agree. > it's probably safe to assume that a device asking for a multi-megabyte > buffer doesn't actually have stringent boundary requirements either. For > smaller sizes, though, it definitely can matter. At the very least up to > 64KB, and probably up as far as MAX_ORDER-size requests, we need to preserve > the existing behaviour or something, somewhere, will break. Some devices really don't care the boundary issue, and as you said maybe some devices will care for smaller sizes. So I think this is depend on the devices' requirement, then can we add one boundary parameter for the allocation according to the devices requirement? Or like I said, we will waste lots of reserved memory if the granularity is so large. Thanks for your comments. -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory 2018-05-31 5:55 [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory Baolin Wang 2018-05-31 5:55 ` [PATCH 2/2] dma-coherent: Change the bitmap APIs Baolin Wang @ 2018-05-31 17:25 ` Robin Murphy 2018-06-03 2:12 ` Baolin Wang 1 sibling, 1 reply; 9+ messages in thread From: Robin Murphy @ 2018-05-31 17:25 UTC (permalink / raw) To: Baolin Wang, hch, m.szyprowski; +Cc: gregkh, iommu, linux-kernel, arnd, broonie On 31/05/18 06:55, Baolin Wang wrote: > It is incorrect to use mem->size to valid if there are enough coherent > memory to allocate in __dma_alloc_from_coherent(), since some devices > may mark some occupied coherent memory by dma_mark_declared_memory_occupied(). > > So we can introduce one 'avail' parameter to save the available device > coherent memory, to valid if we have enough coherent memory for the device > to allocate. We already have dma_mem->size stored for other reasons, so there's little harm in using it for a rough sanity check to short-circuit the odd allocation which cannot possibly succeed, but adding machinery purely for the sake an ever-so-slightly more accurate, but still far from exhaustive, check doesn't really seem worth it. Even if size <= dma_mem->avail there's still no guarantee that the allocation will actually succeed, so what benefit does the explicit accounting provide? Robin. > Signed-off-by: Baolin Wang <baolin.wang@linaro.org> > --- > drivers/base/dma-coherent.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c > index 597d408..ce19832 100644 > --- a/drivers/base/dma-coherent.c > +++ b/drivers/base/dma-coherent.c > @@ -18,6 +18,7 @@ struct dma_coherent_mem { > unsigned long *bitmap; > spinlock_t spinlock; > bool use_dev_dma_pfn_offset; > + int avail; > }; > > static struct dma_coherent_mem *dma_coherent_default_memory __ro_after_init; > @@ -73,6 +74,7 @@ static int dma_init_coherent_memory( > dma_mem->device_base = device_addr; > dma_mem->pfn_base = PFN_DOWN(phys_addr); > dma_mem->size = pages; > + dma_mem->avail = pages; > dma_mem->flags = flags; > spin_lock_init(&dma_mem->spinlock); > > @@ -141,6 +143,7 @@ void *dma_mark_declared_memory_occupied(struct device *dev, > dma_addr_t device_addr, size_t size) > { > struct dma_coherent_mem *mem = dev->dma_mem; > + int order = get_order(size); > unsigned long flags; > int pos, err; > > @@ -151,7 +154,9 @@ void *dma_mark_declared_memory_occupied(struct device *dev, > > spin_lock_irqsave(&mem->spinlock, flags); > pos = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); > - err = bitmap_allocate_region(mem->bitmap, pos, get_order(size)); > + err = bitmap_allocate_region(mem->bitmap, pos, order); > + if (!err) > + mem->avail -= 1 << order; > spin_unlock_irqrestore(&mem->spinlock, flags); > > if (err != 0) > @@ -170,7 +175,7 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, > > spin_lock_irqsave(&mem->spinlock, flags); > > - if (unlikely(size > (mem->size << PAGE_SHIFT))) > + if (unlikely(size > (mem->avail << PAGE_SHIFT))) > goto err; > > pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); > @@ -182,6 +187,7 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, > */ > *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); > ret = mem->virt_base + (pageno << PAGE_SHIFT); > + mem->avail -= 1 << order; > spin_unlock_irqrestore(&mem->spinlock, flags); > memset(ret, 0, size); > return ret; > @@ -244,6 +250,7 @@ static int __dma_release_from_coherent(struct dma_coherent_mem *mem, > > spin_lock_irqsave(&mem->spinlock, flags); > bitmap_release_region(mem->bitmap, page, order); > + mem->avail += 1 << order; > spin_unlock_irqrestore(&mem->spinlock, flags); > return 1; > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory @ 2018-06-03 2:12 ` Baolin Wang 0 siblings, 0 replies; 9+ messages in thread From: Baolin Wang @ 2018-06-03 2:12 UTC (permalink / raw) To: Robin Murphy Cc: hch, Marek Szyprowski, Greg KH, iommu, LKML, Arnd Bergmann, Mark Brown On 1 June 2018 at 01:25, Robin Murphy <robin.murphy@arm.com> wrote: > On 31/05/18 06:55, Baolin Wang wrote: >> >> It is incorrect to use mem->size to valid if there are enough coherent >> memory to allocate in __dma_alloc_from_coherent(), since some devices >> may mark some occupied coherent memory by >> dma_mark_declared_memory_occupied(). >> >> So we can introduce one 'avail' parameter to save the available device >> coherent memory, to valid if we have enough coherent memory for the device >> to allocate. > > > We already have dma_mem->size stored for other reasons, so there's little > harm in using it for a rough sanity check to short-circuit the odd > allocation which cannot possibly succeed, but adding machinery purely for > the sake an ever-so-slightly more accurate, but still far from exhaustive, > check doesn't really seem worth it. > > Even if size <= dma_mem->avail there's still no guarantee that the > allocation will actually succeed, so what benefit does the explicit > accounting provide? > Yes, it can not guarantee one successful allocation, but it can avoid some redundant bitmap operation and reducing the preempt-disable time if we have not enough memory by checking the actual available coherent memory. By the way, I think we should also add check in dma_mark_declared_memory_occupied() and __dma_mmap_from_coherent(). -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory @ 2018-06-03 2:12 ` Baolin Wang 0 siblings, 0 replies; 9+ messages in thread From: Baolin Wang @ 2018-06-03 2:12 UTC (permalink / raw) To: Robin Murphy Cc: Arnd Bergmann, Greg KH, LKML, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Mark Brown, hch-jcswGhMUV9g On 1 June 2018 at 01:25, Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote: > On 31/05/18 06:55, Baolin Wang wrote: >> >> It is incorrect to use mem->size to valid if there are enough coherent >> memory to allocate in __dma_alloc_from_coherent(), since some devices >> may mark some occupied coherent memory by >> dma_mark_declared_memory_occupied(). >> >> So we can introduce one 'avail' parameter to save the available device >> coherent memory, to valid if we have enough coherent memory for the device >> to allocate. > > > We already have dma_mem->size stored for other reasons, so there's little > harm in using it for a rough sanity check to short-circuit the odd > allocation which cannot possibly succeed, but adding machinery purely for > the sake an ever-so-slightly more accurate, but still far from exhaustive, > check doesn't really seem worth it. > > Even if size <= dma_mem->avail there's still no guarantee that the > allocation will actually succeed, so what benefit does the explicit > accounting provide? > Yes, it can not guarantee one successful allocation, but it can avoid some redundant bitmap operation and reducing the preempt-disable time if we have not enough memory by checking the actual available coherent memory. By the way, I think we should also add check in dma_mark_declared_memory_occupied() and __dma_mmap_from_coherent(). -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-06-03 2:12 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-31 5:55 [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory Baolin Wang 2018-05-31 5:55 ` [PATCH 2/2] dma-coherent: Change the bitmap APIs Baolin Wang 2018-05-31 17:38 ` Robin Murphy 2018-05-31 17:38 ` Robin Murphy 2018-06-03 2:11 ` Baolin Wang 2018-06-03 2:11 ` Baolin Wang 2018-05-31 17:25 ` [PATCH 1/2] dma-coherent: Add one parameter to save available coherent memory Robin Murphy 2018-06-03 2:12 ` Baolin Wang 2018-06-03 2:12 ` Baolin Wang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.