* [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc @ 2020-09-24 5:16 Chris Goldsworthy 2020-09-24 5:16 ` Chris Goldsworthy 0 siblings, 1 reply; 6+ messages in thread From: Chris Goldsworthy @ 2020-09-24 5:16 UTC (permalink / raw) To: akpm, linux-mm, linux-arm-msm, linux-kernel, pratikp, pdaly, sudaraja, iamjoonsoo.kim, david Cc: Chris Goldsworthy V1: Introduces a retry loop that attempts a CMA allocation a finite number of times before giving up: https://lkml.org/lkml/2020/8/5/1097 https://lkml.org/lkml/2020/8/11/893 V2: Introduces an indefinite retry for CMA allocations. David Hildenbrand raised a page pinning example which precludes doing this infite-retrying for all CMA users: https://lkml.org/lkml/2020/9/17/984 V3: Re-introduce a GFP mask argument for cma_alloc(), that can take in __GFP_NOFAIL as an argument to indicate that a CMA allocation should be retried indefinitely. This lets callers of cma_alloc() decide if they want to perform indefinite retires. Also introduces a config option for controlling the duration of the sleep between retries. Chris Goldsworthy (1): mm: cma: indefinitely retry allocations in cma_alloc arch/powerpc/kvm/book3s_hv_builtin.c | 2 +- drivers/dma-buf/heaps/cma_heap.c | 2 +- drivers/s390/char/vmcp.c | 2 +- drivers/staging/android/ion/ion_cma_heap.c | 2 +- include/linux/cma.h | 2 +- kernel/dma/contiguous.c | 4 ++-- mm/Kconfig | 11 ++++++++++ mm/cma.c | 35 +++++++++++++++++++++++++----- mm/cma_debug.c | 2 +- mm/hugetlb.c | 4 ++-- 10 files changed, 50 insertions(+), 16 deletions(-) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc 2020-09-24 5:16 [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc Chris Goldsworthy @ 2020-09-24 5:16 ` Chris Goldsworthy 2020-09-25 12:18 ` David Hildenbrand 2020-09-27 19:23 ` Minchan Kim 0 siblings, 2 replies; 6+ messages in thread From: Chris Goldsworthy @ 2020-09-24 5:16 UTC (permalink / raw) To: akpm, linux-mm, linux-arm-msm, linux-kernel, pratikp, pdaly, sudaraja, iamjoonsoo.kim, david Cc: Chris Goldsworthy, Vinayak Menon CMA allocations will fail if 'pinned' pages are in a CMA area, since we cannot migrate pinned pages. The _refcount of a struct page being greater than _mapcount for that page can cause pinning for anonymous pages. This is because try_to_unmap(), which (1) is called in the CMA allocation path, and (2) decrements both _refcount and _mapcount for a page, will stop unmapping a page from VMAs once the _mapcount for a page reaches 0. This implies that after try_to_unmap() has finished successfully for a page where _recount > _mapcount, that _refcount will be greater than 0. Later in the CMA allocation path in migrate_page_move_mapping(), we will have one more reference count than intended for anonymous pages, meaning the allocation will fail for that page. If a process ends up causing _refcount > _mapcount for a page (by either incrementing _recount or decrementing _mapcount), such that the process is context switched out after modifying one refcount but before modifying the other, the page will be temporarily pinned. One example of where _refcount can be greater than _mapcount is inside of zap_pte_range(), which is called for all the entries of a PMD when a process is exiting, to unmap the process's memory. Inside of zap_pte_range(), after unammping a page with page_remove_rmap(), we have that _recount > _mapcount. _refcount can only be decremented after a TLB flush is performed for the page - this doesn't occur until enough pages have been batched together for flushing. The flush can either occur inside of zap_pte_range() (during the same invocation or a later one), or if there aren't enough pages collected by the time we unmap all of the pages in a process, the flush will occur in tlb_finish_mmu() in exit_mmap(). After the flush has occurred, tlb_batch_pages_flush() will decrement the references on the flushed pages. Another such example like the above is inside of copy_one_pte(), which is called during a fork. For PTEs for which pte_present(pte) == true, copy_one_pte() will increment the _refcount field followed by the _mapcount field of a page. So, inside of cma_alloc(), add the option of letting users pass in __GFP_NOFAIL to indicate that we should retry CMA allocations indefinitely, in the event that alloc_contig_range() returns -EBUSY after having scanned a whole CMA-region bitmap. Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org> Co-developed-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> --- arch/powerpc/kvm/book3s_hv_builtin.c | 2 +- drivers/dma-buf/heaps/cma_heap.c | 2 +- drivers/s390/char/vmcp.c | 2 +- drivers/staging/android/ion/ion_cma_heap.c | 2 +- include/linux/cma.h | 2 +- kernel/dma/contiguous.c | 4 ++-- mm/Kconfig | 11 ++++++++++ mm/cma.c | 35 +++++++++++++++++++++++++----- mm/cma_debug.c | 2 +- mm/hugetlb.c | 4 ++-- 10 files changed, 50 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 073617c..21c3f6a 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -74,7 +74,7 @@ struct page *kvm_alloc_hpt_cma(unsigned long nr_pages) VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES), - false); + 0); } EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma); diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c index 626cf7f..7657359 100644 --- a/drivers/dma-buf/heaps/cma_heap.c +++ b/drivers/dma-buf/heaps/cma_heap.c @@ -66,7 +66,7 @@ static int cma_heap_allocate(struct dma_heap *heap, helper_buffer->heap = heap; helper_buffer->size = len; - cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, false); + cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, 0); if (!cma_pages) goto free_buf; diff --git a/drivers/s390/char/vmcp.c b/drivers/s390/char/vmcp.c index 9e06628..11c4e3b 100644 --- a/drivers/s390/char/vmcp.c +++ b/drivers/s390/char/vmcp.c @@ -70,7 +70,7 @@ static void vmcp_response_alloc(struct vmcp_session *session) * anymore the system won't work anyway. */ if (order > 2) - page = cma_alloc(vmcp_cma, nr_pages, 0, false); + page = cma_alloc(vmcp_cma, nr_pages, 0, 0); if (page) { session->response = (char *)page_to_phys(page); session->cma_alloc = 1; diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c index bf65e67..128d3a5 100644 --- a/drivers/staging/android/ion/ion_cma_heap.c +++ b/drivers/staging/android/ion/ion_cma_heap.c @@ -39,7 +39,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, if (align > CONFIG_CMA_ALIGNMENT) align = CONFIG_CMA_ALIGNMENT; - pages = cma_alloc(cma_heap->cma, nr_pages, align, false); + pages = cma_alloc(cma_heap->cma, nr_pages, align, 0); if (!pages) return -ENOMEM; diff --git a/include/linux/cma.h b/include/linux/cma.h index 6ff79fe..2bd8544 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -43,7 +43,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, const char *name, struct cma **res_cma); extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, - bool no_warn); + gfp_t gfp_mask); extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index cff7e60..55c62b2 100644 --- a/kernel/dma/contiguous.c +++ b/kernel/dma/contiguous.c @@ -196,7 +196,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, size_t count, if (align > CONFIG_CMA_ALIGNMENT) align = CONFIG_CMA_ALIGNMENT; - return cma_alloc(dev_get_cma_area(dev), count, align, no_warn); + return cma_alloc(dev_get_cma_area(dev), count, align, no_warn ? __GFP_NOWARN : 0); } /** @@ -219,7 +219,7 @@ static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t gfp) { unsigned int align = min(get_order(size), CONFIG_CMA_ALIGNMENT); - return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp & __GFP_NOWARN); + return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp); } /** diff --git a/mm/Kconfig b/mm/Kconfig index 6c97488..83a5135 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -524,6 +524,17 @@ config CMA_AREAS If unsure, leave the default value "7". +config CMA_RETRY_SLEEP_DURATION + int "Sleep duration between retries" + depends on CMA + default 100 + help + Time to sleep for in milliseconds between the indefinite + retries of a CMA allocation that are induced by passing + __GFP_NOFAIL to cma_alloc(). + + If unsure, leave the value as "100". + config MEM_SOFT_DIRTY bool "Track memory changes" depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS diff --git a/mm/cma.c b/mm/cma.c index 7f415d7..4fbad2b 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -32,6 +32,7 @@ #include <linux/highmem.h> #include <linux/io.h> #include <linux/kmemleak.h> +#include <linux/delay.h> #include <trace/events/cma.h> #include "cma.h" @@ -403,13 +404,15 @@ static inline void cma_debug_show_areas(struct cma *cma) { } * @cma: Contiguous memory region for which the allocation is performed. * @count: Requested number of pages. * @align: Requested alignment of pages (in PAGE_SIZE order). - * @no_warn: Avoid printing message about failed allocation + * @gfp_mask: If __GFP_NOWARN is passed, suppress messages about failed + * allocations. If __GFP_NOFAIL is passed, try doing the CMA + * allocation indefinitely until the allocation succeeds. * * This function allocates part of contiguous memory on specific * contiguous memory area. */ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, - bool no_warn) + gfp_t gfp_mask) { unsigned long mask, offset; unsigned long pfn = -1; @@ -442,8 +445,28 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, bitmap_maxno, start, bitmap_count, mask, offset); if (bitmap_no >= bitmap_maxno) { - mutex_unlock(&cma->lock); - break; + if (ret == -EBUSY && gfp_mask & __GFP_NOFAIL) { + mutex_unlock(&cma->lock); + + /* + * Page may be momentarily pinned by some other + * process which has been scheduled out, e.g. + * in exit path, during unmap call, or process + * fork and so cannot be freed there. Sleep + * for 100ms and retry the allocation. + */ + start = 0; + ret = -ENOMEM; + msleep(CONFIG_CMA_RETRY_SLEEP_DURATION); + continue; + } else { + /* + * ret == -ENOMEM - all bits in cma->bitmap are + * set, so we break accordingly. + */ + mutex_unlock(&cma->lock); + break; + } } bitmap_set(cma->bitmap, bitmap_no, bitmap_count); /* @@ -456,7 +479,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); mutex_lock(&cma_mutex); ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, - GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0)); + GFP_KERNEL | (gfp_mask & __GFP_NOWARN)); mutex_unlock(&cma_mutex); if (ret == 0) { page = pfn_to_page(pfn); @@ -485,7 +508,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, page_kasan_tag_reset(page + i); } - if (ret && !no_warn) { + if (ret && !(gfp_mask & __GFP_NOWARN)) { pr_err("%s: alloc failed, req-size: %zu pages, ret: %d\n", __func__, count, ret); cma_debug_show_areas(cma); diff --git a/mm/cma_debug.c b/mm/cma_debug.c index d5bf8aa..76aea84 100644 --- a/mm/cma_debug.c +++ b/mm/cma_debug.c @@ -137,7 +137,7 @@ static int cma_alloc_mem(struct cma *cma, int count) if (!mem) return -ENOMEM; - p = cma_alloc(cma, count, 0, false); + p = cma_alloc(cma, count, 0, 0); if (!p) { kfree(mem); return -ENOMEM; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67fc6383..97bdba9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1260,7 +1260,7 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, if (hugetlb_cma[nid]) { page = cma_alloc(hugetlb_cma[nid], nr_pages, - huge_page_order(h), true); + huge_page_order(h), __GFP_NOWARN); if (page) return page; } @@ -1271,7 +1271,7 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, continue; page = cma_alloc(hugetlb_cma[node], nr_pages, - huge_page_order(h), true); + huge_page_order(h), __GFP_NOWARN); if (page) return page; } -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc 2020-09-24 5:16 ` Chris Goldsworthy @ 2020-09-25 12:18 ` David Hildenbrand 2020-09-25 23:13 ` Chris Goldsworthy 2020-09-27 19:23 ` Minchan Kim 1 sibling, 1 reply; 6+ messages in thread From: David Hildenbrand @ 2020-09-25 12:18 UTC (permalink / raw) To: Chris Goldsworthy, akpm, linux-mm, linux-arm-msm, linux-kernel, pratikp, pdaly, sudaraja, iamjoonsoo.kim Cc: Vinayak Menon On 24.09.20 07:16, Chris Goldsworthy wrote: > CMA allocations will fail if 'pinned' pages are in a CMA area, since we > cannot migrate pinned pages. The _refcount of a struct page being greater > than _mapcount for that page can cause pinning for anonymous pages. This > is because try_to_unmap(), which (1) is called in the CMA allocation path, > and (2) decrements both _refcount and _mapcount for a page, will stop > unmapping a page from VMAs once the _mapcount for a page reaches 0. This > implies that after try_to_unmap() has finished successfully for a page > where _recount > _mapcount, that _refcount will be greater than 0. Later > in the CMA allocation path in migrate_page_move_mapping(), we will have one > more reference count than intended for anonymous pages, meaning the > allocation will fail for that page. > > If a process ends up causing _refcount > _mapcount for a page (by either > incrementing _recount or decrementing _mapcount), such that the process is > context switched out after modifying one refcount but before modifying the > other, the page will be temporarily pinned. > > One example of where _refcount can be greater than _mapcount is inside of > zap_pte_range(), which is called for all the entries of a PMD when a > process is exiting, to unmap the process's memory. Inside of > zap_pte_range(), after unammping a page with page_remove_rmap(), we have > that _recount > _mapcount. _refcount can only be decremented after a TLB > flush is performed for the page - this doesn't occur until enough pages > have been batched together for flushing. The flush can either occur inside > of zap_pte_range() (during the same invocation or a later one), or if there > aren't enough pages collected by the time we unmap all of the pages in a > process, the flush will occur in tlb_finish_mmu() in exit_mmap(). After > the flush has occurred, tlb_batch_pages_flush() will decrement the > references on the flushed pages. > > Another such example like the above is inside of copy_one_pte(), which is > called during a fork. For PTEs for which pte_present(pte) == true, > copy_one_pte() will increment the _refcount field followed by the > _mapcount field of a page. > > So, inside of cma_alloc(), add the option of letting users pass in > __GFP_NOFAIL to indicate that we should retry CMA allocations indefinitely, > in the event that alloc_contig_range() returns -EBUSY after having scanned > a whole CMA-region bitmap. > > Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org> > Co-developed-by: Vinayak Menon <vinmenon@codeaurora.org> > Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> > --- > arch/powerpc/kvm/book3s_hv_builtin.c | 2 +- > drivers/dma-buf/heaps/cma_heap.c | 2 +- > drivers/s390/char/vmcp.c | 2 +- > drivers/staging/android/ion/ion_cma_heap.c | 2 +- > include/linux/cma.h | 2 +- > kernel/dma/contiguous.c | 4 ++-- > mm/Kconfig | 11 ++++++++++ > mm/cma.c | 35 +++++++++++++++++++++++++----- > mm/cma_debug.c | 2 +- > mm/hugetlb.c | 4 ++-- > 10 files changed, 50 insertions(+), 16 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c > index 073617c..21c3f6a 100644 > --- a/arch/powerpc/kvm/book3s_hv_builtin.c > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c > @@ -74,7 +74,7 @@ struct page *kvm_alloc_hpt_cma(unsigned long nr_pages) > VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); > > return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES), > - false); > + 0); > } > EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma); > > diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c > index 626cf7f..7657359 100644 > --- a/drivers/dma-buf/heaps/cma_heap.c > +++ b/drivers/dma-buf/heaps/cma_heap.c > @@ -66,7 +66,7 @@ static int cma_heap_allocate(struct dma_heap *heap, > helper_buffer->heap = heap; > helper_buffer->size = len; > > - cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, false); > + cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, 0); > if (!cma_pages) > goto free_buf; > > diff --git a/drivers/s390/char/vmcp.c b/drivers/s390/char/vmcp.c > index 9e06628..11c4e3b 100644 > --- a/drivers/s390/char/vmcp.c > +++ b/drivers/s390/char/vmcp.c > @@ -70,7 +70,7 @@ static void vmcp_response_alloc(struct vmcp_session *session) > * anymore the system won't work anyway. > */ > if (order > 2) > - page = cma_alloc(vmcp_cma, nr_pages, 0, false); > + page = cma_alloc(vmcp_cma, nr_pages, 0, 0); > if (page) { > session->response = (char *)page_to_phys(page); > session->cma_alloc = 1; > diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c > index bf65e67..128d3a5 100644 > --- a/drivers/staging/android/ion/ion_cma_heap.c > +++ b/drivers/staging/android/ion/ion_cma_heap.c > @@ -39,7 +39,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, > if (align > CONFIG_CMA_ALIGNMENT) > align = CONFIG_CMA_ALIGNMENT; > > - pages = cma_alloc(cma_heap->cma, nr_pages, align, false); > + pages = cma_alloc(cma_heap->cma, nr_pages, align, 0); > if (!pages) > return -ENOMEM; > > diff --git a/include/linux/cma.h b/include/linux/cma.h > index 6ff79fe..2bd8544 100644 > --- a/include/linux/cma.h > +++ b/include/linux/cma.h > @@ -43,7 +43,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, > const char *name, > struct cma **res_cma); > extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > - bool no_warn); > + gfp_t gfp_mask); > extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); > > extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c > index cff7e60..55c62b2 100644 > --- a/kernel/dma/contiguous.c > +++ b/kernel/dma/contiguous.c > @@ -196,7 +196,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, size_t count, > if (align > CONFIG_CMA_ALIGNMENT) > align = CONFIG_CMA_ALIGNMENT; > > - return cma_alloc(dev_get_cma_area(dev), count, align, no_warn); > + return cma_alloc(dev_get_cma_area(dev), count, align, no_warn ? __GFP_NOWARN : 0); > } > > /** > @@ -219,7 +219,7 @@ static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t gfp) > { > unsigned int align = min(get_order(size), CONFIG_CMA_ALIGNMENT); > > - return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp & __GFP_NOWARN); > + return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp); > } > > /** > diff --git a/mm/Kconfig b/mm/Kconfig > index 6c97488..83a5135 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -524,6 +524,17 @@ config CMA_AREAS > > If unsure, leave the default value "7". > > +config CMA_RETRY_SLEEP_DURATION > + int "Sleep duration between retries" > + depends on CMA > + default 100 > + help > + Time to sleep for in milliseconds between the indefinite > + retries of a CMA allocation that are induced by passing > + __GFP_NOFAIL to cma_alloc(). > + > + If unsure, leave the value as "100". > + > config MEM_SOFT_DIRTY > bool "Track memory changes" > depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS > diff --git a/mm/cma.c b/mm/cma.c > index 7f415d7..4fbad2b 100644 > --- a/mm/cma.c > +++ b/mm/cma.c > @@ -32,6 +32,7 @@ > #include <linux/highmem.h> > #include <linux/io.h> > #include <linux/kmemleak.h> > +#include <linux/delay.h> > #include <trace/events/cma.h> > > #include "cma.h" > @@ -403,13 +404,15 @@ static inline void cma_debug_show_areas(struct cma *cma) { } > * @cma: Contiguous memory region for which the allocation is performed. > * @count: Requested number of pages. > * @align: Requested alignment of pages (in PAGE_SIZE order). > - * @no_warn: Avoid printing message about failed allocation > + * @gfp_mask: If __GFP_NOWARN is passed, suppress messages about failed > + * allocations. If __GFP_NOFAIL is passed, try doing the CMA > + * allocation indefinitely until the allocation succeeds. > * > * This function allocates part of contiguous memory on specific > * contiguous memory area. > */ > struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > - bool no_warn) > + gfp_t gfp_mask) > { > unsigned long mask, offset; > unsigned long pfn = -1; > @@ -442,8 +445,28 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > bitmap_maxno, start, bitmap_count, mask, > offset); > if (bitmap_no >= bitmap_maxno) { > - mutex_unlock(&cma->lock); > - break; > + if (ret == -EBUSY && gfp_mask & __GFP_NOFAIL) { > + mutex_unlock(&cma->lock); > + > + /* > + * Page may be momentarily pinned by some other > + * process which has been scheduled out, e.g. > + * in exit path, during unmap call, or process > + * fork and so cannot be freed there. Sleep > + * for 100ms and retry the allocation. > + */ > + start = 0; > + ret = -ENOMEM; > + msleep(CONFIG_CMA_RETRY_SLEEP_DURATION); > + continue; > + } else { > + /* > + * ret == -ENOMEM - all bits in cma->bitmap are > + * set, so we break accordingly. > + */ > + mutex_unlock(&cma->lock); > + break; > + } > } > bitmap_set(cma->bitmap, bitmap_no, bitmap_count); > /* > @@ -456,7 +479,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); > mutex_lock(&cma_mutex); > ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, > - GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0)); > + GFP_KERNEL | (gfp_mask & __GFP_NOWARN)); Right, we definetly don't want to pass the flag further down. Alternative would be cma_alloc_nofail(). That helps avoid people passing stuff like GFP_USER and wondering why it doesn't have an effect. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc 2020-09-25 12:18 ` David Hildenbrand @ 2020-09-25 23:13 ` Chris Goldsworthy 0 siblings, 0 replies; 6+ messages in thread From: Chris Goldsworthy @ 2020-09-25 23:13 UTC (permalink / raw) To: David Hildenbrand Cc: akpm, linux-mm, linux-arm-msm, linux-kernel, pratikp, pdaly, sudaraja, iamjoonsoo.kim, Vinayak Menon On 2020-09-25 05:18, David Hildenbrand wrote: > On 24.09.20 07:16, Chris Goldsworthy wrote: >> - GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0)); >> + GFP_KERNEL | (gfp_mask & __GFP_NOWARN)); > > Right, we definetly don't want to pass the flag further down. > > Alternative would be cma_alloc_nofail(). That helps avoid people > passing > stuff like GFP_USER and wondering why it doesn't have an effect. But since we're doing a logical AND with __GFP_NOWARN, we're not passing any other values down - this makes it equivalent to the previous version, in that only __GFP_NOWARN can be passed to alloc_contig_range(). -- The Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc 2020-09-24 5:16 ` Chris Goldsworthy 2020-09-25 12:18 ` David Hildenbrand @ 2020-09-27 19:23 ` Minchan Kim 2020-09-28 20:10 ` Chris Goldsworthy 1 sibling, 1 reply; 6+ messages in thread From: Minchan Kim @ 2020-09-27 19:23 UTC (permalink / raw) To: Chris Goldsworthy Cc: akpm, linux-mm, linux-arm-msm, linux-kernel, pratikp, pdaly, sudaraja, iamjoonsoo.kim, david, Vinayak Menon On Wed, Sep 23, 2020 at 10:16:25PM -0700, Chris Goldsworthy wrote: > CMA allocations will fail if 'pinned' pages are in a CMA area, since we > cannot migrate pinned pages. The _refcount of a struct page being greater > than _mapcount for that page can cause pinning for anonymous pages. This > is because try_to_unmap(), which (1) is called in the CMA allocation path, > and (2) decrements both _refcount and _mapcount for a page, will stop > unmapping a page from VMAs once the _mapcount for a page reaches 0. This > implies that after try_to_unmap() has finished successfully for a page > where _recount > _mapcount, that _refcount will be greater than 0. Later > in the CMA allocation path in migrate_page_move_mapping(), we will have one > more reference count than intended for anonymous pages, meaning the > allocation will fail for that page. > > If a process ends up causing _refcount > _mapcount for a page (by either > incrementing _recount or decrementing _mapcount), such that the process is > context switched out after modifying one refcount but before modifying the > other, the page will be temporarily pinned. > > One example of where _refcount can be greater than _mapcount is inside of > zap_pte_range(), which is called for all the entries of a PMD when a > process is exiting, to unmap the process's memory. Inside of > zap_pte_range(), after unammping a page with page_remove_rmap(), we have > that _recount > _mapcount. _refcount can only be decremented after a TLB > flush is performed for the page - this doesn't occur until enough pages > have been batched together for flushing. The flush can either occur inside > of zap_pte_range() (during the same invocation or a later one), or if there > aren't enough pages collected by the time we unmap all of the pages in a > process, the flush will occur in tlb_finish_mmu() in exit_mmap(). After > the flush has occurred, tlb_batch_pages_flush() will decrement the > references on the flushed pages. > > Another such example like the above is inside of copy_one_pte(), which is > called during a fork. For PTEs for which pte_present(pte) == true, > copy_one_pte() will increment the _refcount field followed by the > _mapcount field of a page. > > So, inside of cma_alloc(), add the option of letting users pass in > __GFP_NOFAIL to indicate that we should retry CMA allocations indefinitely, > in the event that alloc_contig_range() returns -EBUSY after having scanned > a whole CMA-region bitmap. > > Signed-off-by: Chris Goldsworthy <cgoldswo@codeaurora.org> > Co-developed-by: Vinayak Menon <vinmenon@codeaurora.org> > Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> > --- > arch/powerpc/kvm/book3s_hv_builtin.c | 2 +- > drivers/dma-buf/heaps/cma_heap.c | 2 +- > drivers/s390/char/vmcp.c | 2 +- > drivers/staging/android/ion/ion_cma_heap.c | 2 +- > include/linux/cma.h | 2 +- > kernel/dma/contiguous.c | 4 ++-- > mm/Kconfig | 11 ++++++++++ > mm/cma.c | 35 +++++++++++++++++++++++++----- > mm/cma_debug.c | 2 +- > mm/hugetlb.c | 4 ++-- > 10 files changed, 50 insertions(+), 16 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c > index 073617c..21c3f6a 100644 > --- a/arch/powerpc/kvm/book3s_hv_builtin.c > +++ b/arch/powerpc/kvm/book3s_hv_builtin.c > @@ -74,7 +74,7 @@ struct page *kvm_alloc_hpt_cma(unsigned long nr_pages) > VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT); > > return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES), > - false); > + 0); > } > EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma); > > diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c > index 626cf7f..7657359 100644 > --- a/drivers/dma-buf/heaps/cma_heap.c > +++ b/drivers/dma-buf/heaps/cma_heap.c > @@ -66,7 +66,7 @@ static int cma_heap_allocate(struct dma_heap *heap, > helper_buffer->heap = heap; > helper_buffer->size = len; > > - cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, false); > + cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, 0); > if (!cma_pages) > goto free_buf; > > diff --git a/drivers/s390/char/vmcp.c b/drivers/s390/char/vmcp.c > index 9e06628..11c4e3b 100644 > --- a/drivers/s390/char/vmcp.c > +++ b/drivers/s390/char/vmcp.c > @@ -70,7 +70,7 @@ static void vmcp_response_alloc(struct vmcp_session *session) > * anymore the system won't work anyway. > */ > if (order > 2) > - page = cma_alloc(vmcp_cma, nr_pages, 0, false); > + page = cma_alloc(vmcp_cma, nr_pages, 0, 0); > if (page) { > session->response = (char *)page_to_phys(page); > session->cma_alloc = 1; > diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c > index bf65e67..128d3a5 100644 > --- a/drivers/staging/android/ion/ion_cma_heap.c > +++ b/drivers/staging/android/ion/ion_cma_heap.c > @@ -39,7 +39,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, > if (align > CONFIG_CMA_ALIGNMENT) > align = CONFIG_CMA_ALIGNMENT; > > - pages = cma_alloc(cma_heap->cma, nr_pages, align, false); > + pages = cma_alloc(cma_heap->cma, nr_pages, align, 0); > if (!pages) > return -ENOMEM; > > diff --git a/include/linux/cma.h b/include/linux/cma.h > index 6ff79fe..2bd8544 100644 > --- a/include/linux/cma.h > +++ b/include/linux/cma.h > @@ -43,7 +43,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, > const char *name, > struct cma **res_cma); > extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > - bool no_warn); > + gfp_t gfp_mask); > extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); > > extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c > index cff7e60..55c62b2 100644 > --- a/kernel/dma/contiguous.c > +++ b/kernel/dma/contiguous.c > @@ -196,7 +196,7 @@ struct page *dma_alloc_from_contiguous(struct device *dev, size_t count, > if (align > CONFIG_CMA_ALIGNMENT) > align = CONFIG_CMA_ALIGNMENT; > > - return cma_alloc(dev_get_cma_area(dev), count, align, no_warn); > + return cma_alloc(dev_get_cma_area(dev), count, align, no_warn ? __GFP_NOWARN : 0); > } > > /** > @@ -219,7 +219,7 @@ static struct page *cma_alloc_aligned(struct cma *cma, size_t size, gfp_t gfp) > { > unsigned int align = min(get_order(size), CONFIG_CMA_ALIGNMENT); > > - return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp & __GFP_NOWARN); > + return cma_alloc(cma, size >> PAGE_SHIFT, align, gfp); > } > > /** > diff --git a/mm/Kconfig b/mm/Kconfig > index 6c97488..83a5135 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -524,6 +524,17 @@ config CMA_AREAS > > If unsure, leave the default value "7". > > +config CMA_RETRY_SLEEP_DURATION > + int "Sleep duration between retries" > + depends on CMA > + default 100 > + help > + Time to sleep for in milliseconds between the indefinite > + retries of a CMA allocation that are induced by passing > + __GFP_NOFAIL to cma_alloc(). > + > + If unsure, leave the value as "100". What's the point of this new config? If we need it, How could admin set their value? How does it make bad if we just use simple default vaule? IOW, I'd like to avoid introducing new config if we don't see good justifcation. > + > config MEM_SOFT_DIRTY > bool "Track memory changes" > depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS > diff --git a/mm/cma.c b/mm/cma.c > index 7f415d7..4fbad2b 100644 > --- a/mm/cma.c > +++ b/mm/cma.c > @@ -32,6 +32,7 @@ > #include <linux/highmem.h> > #include <linux/io.h> > #include <linux/kmemleak.h> > +#include <linux/delay.h> > #include <trace/events/cma.h> > > #include "cma.h" > @@ -403,13 +404,15 @@ static inline void cma_debug_show_areas(struct cma *cma) { } > * @cma: Contiguous memory region for which the allocation is performed. > * @count: Requested number of pages. > * @align: Requested alignment of pages (in PAGE_SIZE order). > - * @no_warn: Avoid printing message about failed allocation > + * @gfp_mask: If __GFP_NOWARN is passed, suppress messages about failed > + * allocations. If __GFP_NOFAIL is passed, try doing the CMA > + * allocation indefinitely until the allocation succeeds. > * > * This function allocates part of contiguous memory on specific > * contiguous memory area. > */ > struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > - bool no_warn) > + gfp_t gfp_mask) > { > unsigned long mask, offset; > unsigned long pfn = -1; > @@ -442,8 +445,28 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, > bitmap_maxno, start, bitmap_count, mask, > offset); > if (bitmap_no >= bitmap_maxno) { > - mutex_unlock(&cma->lock); > - break; > + if (ret == -EBUSY && gfp_mask & __GFP_NOFAIL) { > + mutex_unlock(&cma->lock); > + > + /* > + * Page may be momentarily pinned by some other > + * process which has been scheduled out, e.g. > + * in exit path, during unmap call, or process > + * fork and so cannot be freed there. Sleep > + * for 100ms and retry the allocation. > + */ > + start = 0; > + ret = -ENOMEM; > + msleep(CONFIG_CMA_RETRY_SLEEP_DURATION); Should it be uninterruptible, really? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc 2020-09-27 19:23 ` Minchan Kim @ 2020-09-28 20:10 ` Chris Goldsworthy 0 siblings, 0 replies; 6+ messages in thread From: Chris Goldsworthy @ 2020-09-28 20:10 UTC (permalink / raw) To: Minchan Kim Cc: akpm, linux-mm, linux-arm-msm, linux-kernel, pratikp, pdaly, sudaraja, iamjoonsoo.kim, david, Vinayak Menon, Minchan Kim On 2020-09-27 12:23, Minchan Kim wrote: > On Wed, Sep 23, 2020 at 10:16:25PM -0700, Chris Goldsworthy wrote: >> CMA allocations will fail if 'pinned' pages are in a CMA area, since >> we >> >> +config CMA_RETRY_SLEEP_DURATION >> + int "Sleep duration between retries" >> + depends on CMA >> + default 100 >> + help >> + Time to sleep for in milliseconds between the indefinite >> + retries of a CMA allocation that are induced by passing >> + __GFP_NOFAIL to cma_alloc(). >> + >> + If unsure, leave the value as "100". > > What's the point of this new config? If we need it, How could admin > set their value? > How does it make bad if we just use simple default vaule? > IOW, I'd like to avoid introducing new config if we don't see good > justifcation. I thought that it would be useful for developers, but I guess it would be much better for this to be runtime configurable. But, I don't think there's a strong reason to be able to configure the value - 100 ms has worked for us. I'll update scrap this option in a follow-up patch, and will use 100 ms as the sleeping time. >> + >> config MEM_SOFT_DIRTY >> bool "Track memory changes" >> depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS >> diff --git a/mm/cma.c b/mm/cma.c >> index 7f415d7..4fbad2b 100644 >> --- a/mm/cma.c >> +++ b/mm/cma.c >> @@ -32,6 +32,7 @@ >> #include <linux/highmem.h> >> #include <linux/io.h> >> #include <linux/kmemleak.h> >> +#include <linux/delay.h> >> #include <trace/events/cma.h> >> >> #include "cma.h" >> @@ -403,13 +404,15 @@ static inline void cma_debug_show_areas(struct >> cma *cma) { } >> * @cma: Contiguous memory region for which the allocation is >> performed. >> * @count: Requested number of pages. >> * @align: Requested alignment of pages (in PAGE_SIZE order). >> - * @no_warn: Avoid printing message about failed allocation >> + * @gfp_mask: If __GFP_NOWARN is passed, suppress messages about >> failed >> + * allocations. If __GFP_NOFAIL is passed, try doing the CMA >> + * allocation indefinitely until the allocation succeeds. >> * >> * This function allocates part of contiguous memory on specific >> * contiguous memory area. >> */ >> struct page *cma_alloc(struct cma *cma, size_t count, unsigned int >> align, >> - bool no_warn) >> + gfp_t gfp_mask) >> { >> unsigned long mask, offset; >> unsigned long pfn = -1; >> @@ -442,8 +445,28 @@ struct page *cma_alloc(struct cma *cma, size_t >> count, unsigned int align, >> bitmap_maxno, start, bitmap_count, mask, >> offset); >> if (bitmap_no >= bitmap_maxno) { >> - mutex_unlock(&cma->lock); >> - break; >> + if (ret == -EBUSY && gfp_mask & __GFP_NOFAIL) { >> + mutex_unlock(&cma->lock); >> + >> + /* >> + * Page may be momentarily pinned by some other >> + * process which has been scheduled out, e.g. >> + * in exit path, during unmap call, or process >> + * fork and so cannot be freed there. Sleep >> + * for 100ms and retry the allocation. >> + */ >> + start = 0; >> + ret = -ENOMEM; >> + msleep(CONFIG_CMA_RETRY_SLEEP_DURATION); > > Should it be uninterruptible, really? Good point - I'll replace the msleep() this with schedule_timeout_killable() in the follow-up patch. -- The Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-09-28 20:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-24 5:16 [PATCH v3] mm: cma: indefinitely retry allocations in cma_alloc Chris Goldsworthy 2020-09-24 5:16 ` Chris Goldsworthy 2020-09-25 12:18 ` David Hildenbrand 2020-09-25 23:13 ` Chris Goldsworthy 2020-09-27 19:23 ` Minchan Kim 2020-09-28 20:10 ` Chris Goldsworthy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).