linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] mm: fix cma allocation fail sometimes
@ 2022-03-15 14:45 Dong Aisheng
  2022-03-15 14:45 ` [PATCH v3 1/2] mm: cma: fix allocation may " Dong Aisheng
  2022-03-15 14:45 ` [PATCH v3 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry Dong Aisheng
  0 siblings, 2 replies; 14+ messages in thread
From: Dong Aisheng @ 2022-03-15 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, dongas86, shawnguo, linux-imx,
	akpm, m.szyprowski, lecopzer.chen, david, vbabka, stable,
	shijie.qin, Dong Aisheng

We observed an issue with NXP 5.15 LTS kernel that dma_alloc_coherent()
may fail sometimes when there're multiple processes trying to allocate
CMA memory.

This issue can be very easily reproduced on MX6Q SDB board with latest
linux-next kernel by writing a test module creating 16 or 32 threads
allocating random size of CMA memory in parallel at the background.
Or simply enabling CONFIG_CMA_DEBUG, you can see endless of CMA alloc
retries during booting:
[    1.452124] cma: cma_alloc(): memory range at (ptrval) is busy,retrying
....
(thousands of reties)

The root cause of this issue is that since commit a4efc174b382
("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
memory allocation.
It's possible that the memory range process A try to alloc has already
been isolated by the allocation of process B during memory migration.

The problem here is that the memory range isolated during one allocation
by start_isolate_page_range() could be much bigger than the real size we
want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.

Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
is big (e.g. 32M with max_order 14) and CMA memory is relatively small
(e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
all CMA memory may have already been isolated by other processes when
one trying to allocate memory using dma_alloc_coherent().
Since current CMA code will only scan one time of whole available CMA
memory, then dma_alloc_coherent() may easy fail due to contention with
other processes.

This patchset introduces a retry mechanism to rescan CMA bitmap for -EBUSY
error in case the target pageblock may has been temporarily isolated
by others and released later.

It also improves the CMA allocation performance by trying the next
MAX_ORDER_NR_PAGES range during reties rather than looping within the
same isolated range in small steps which wasting CPU mips.

The following test is based on linux-next: next-20211213.

Without the fix, it's easily fail.
# insmod cma_alloc.ko pnum=16
[  274.322369] CMA alloc test enter: thread number: 16
[  274.329948] cpu: 0, pid: 692, index 4 pages 144
[  274.330143] cpu: 1, pid: 694, index 2 pages 44
[  274.330359] cpu: 2, pid: 695, index 7 pages 757
[  274.330760] cpu: 2, pid: 696, index 4 pages 144
[  274.330974] cpu: 2, pid: 697, index 6 pages 512
[  274.331223] cpu: 2, pid: 698, index 6 pages 512
[  274.331499] cpu: 2, pid: 699, index 2 pages 44
[  274.332228] cpu: 2, pid: 700, index 0 pages 7
[  274.337421] cpu: 0, pid: 701, index 1 pages 38
[  274.337618] cpu: 2, pid: 702, index 0 pages 7
[  274.344669] cpu: 1, pid: 703, index 0 pages 7
[  274.344807] cpu: 3, pid: 704, index 6 pages 512
[  274.348269] cpu: 2, pid: 705, index 5 pages 148
[  274.349490] cma: cma_alloc: reserved: alloc failed, req-size: 38 pages, ret: -16
[  274.366292] cpu: 1, pid: 706, index 4 pages 144
[  274.366562] cpu: 0, pid: 707, index 3 pages 128
[  274.367356] cma: cma_alloc: reserved: alloc failed, req-size: 128 pages, ret: -16
[  274.367370] cpu: 0, pid: 707, index 3 pages 128 failed
[  274.371148] cma: cma_alloc: reserved: alloc failed, req-size: 148 pages, ret: -16
[  274.375348] cma: cma_alloc: reserved: alloc failed, req-size: 144 pages, ret: -16
[  274.384256] cpu: 2, pid: 708, index 0 pages 7
....

With the fix, 32 threads allocating in parallel can pass overnight
stress test.

root@imx6qpdlsolox:~# insmod cma_alloc.ko pnum=32
[  112.976809] cma_alloc: loading out-of-tree module taints kernel.
[  112.984128] CMA alloc test enter: thread number: 32
[  112.989748] cpu: 2, pid: 707, index 6 pages 512
[  112.994342] cpu: 1, pid: 708, index 6 pages 512
[  112.995162] cpu: 0, pid: 709, index 3 pages 128
[  112.995867] cpu: 2, pid: 710, index 0 pages 7
[  112.995910] cpu: 3, pid: 711, index 2 pages 44
[  112.996005] cpu: 3, pid: 712, index 7 pages 757
[  112.996098] cpu: 3, pid: 713, index 7 pages 757
...
[41877.368163] cpu: 1, pid: 737, index 2 pages 44
[41877.369388] cpu: 1, pid: 736, index 3 pages 128
[41878.486516] cpu: 0, pid: 737, index 2 pages 44
[41878.486515] cpu: 2, pid: 739, index 4 pages 144
[41878.486622] cpu: 1, pid: 736, index 3 pages 128
[41878.486948] cpu: 2, pid: 735, index 7 pages 757
[41878.487279] cpu: 2, pid: 738, index 4 pages 144
[41879.526603] cpu: 1, pid: 739, index 3 pages 128
[41879.606491] cpu: 2, pid: 737, index 3 pages 128
[41879.606550] cpu: 0, pid: 736, index 0 pages 7
[41879.612271] cpu: 2, pid: 738, index 4 pages 144
...
v1:
https://patchwork.kernel.org/project/linux-mm/cover/20211215080242.3034856-1-aisheng.dong@nxp.com/

v2:
https://patchwork.kernel.org/project/linux-mm/cover/20220112131552.3329380-1-aisheng.dong@nxp.com/

Dong Aisheng (2):
  mm: cma: fix allocation may fail sometimes
  mm: cma: try next MAX_ORDER_NR_PAGES during retry

 mm/cma.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-15 14:45 [PATCH v3 0/2] mm: fix cma allocation fail sometimes Dong Aisheng
@ 2022-03-15 14:45 ` Dong Aisheng
  2022-03-15 22:58   ` Andrew Morton
  2022-03-17 10:55   ` David Hildenbrand
  2022-03-15 14:45 ` [PATCH v3 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry Dong Aisheng
  1 sibling, 2 replies; 14+ messages in thread
From: Dong Aisheng @ 2022-03-15 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, dongas86, shawnguo, linux-imx,
	akpm, m.szyprowski, lecopzer.chen, david, vbabka, stable,
	shijie.qin, Dong Aisheng

When there're multiple process allocing dma memory in parallel
by calling dma_alloc_coherent(), it may fail sometimes as follows:

Error log:
cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
cma: number of available pages:
3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages

When issue happened, we saw there were still 33161 pages (129M) free CMA
memory and a lot available free slots for 148 pages in CMA bitmap that we
want to allocate.

If dumping memory info, we found that there was also ~342M normal memory,
but only 1352K CMA memory left in buddy system while a lot of pageblocks
were isolated.

Memory info log:
Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
	    active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
	    unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
	    bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
	36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
	8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB

The root cause of this issue is that since commit a4efc174b382
("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
memory allocation. It's possible that the memory range process A trying
to alloc has already been isolated by the allocation of process B during
memory migration.

The problem here is that the memory range isolated during one allocation
by start_isolate_page_range() could be much bigger than the real size we
want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.

Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
is big (e.g. 32M with max_order 14) and CMA memory is relatively small
(e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
all CMA memory may have already been isolated by other processes when
one trying to allocate memory using dma_alloc_coherent().
Since current CMA code will only scan one time of whole available CMA
memory, then dma_alloc_coherent() may easy fail due to contention with
other processes.

This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
error in case the target memory range may has been temporarily isolated
by others and released later.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
CC: stable@vger.kernel.org # 5.11+
Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
---
ChangeLog:
 * v2->v3: Improve commit messages
 * v1->v2: no changes
---
 mm/cma.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/cma.c b/mm/cma.c
index eaa4b5c920a2..46a9fd9f92c4 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -430,6 +430,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 	unsigned long i;
 	struct page *page = NULL;
 	int ret = -ENOMEM;
+	int loop = 0;
 
 	if (!cma || !cma->count || !cma->bitmap)
 		goto out;
@@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 				offset);
 		if (bitmap_no >= bitmap_maxno) {
 			spin_unlock_irq(&cma->lock);
+			pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
+			/*
+			 * rescan as others may finish the memory migration
+			 * and quit if no available CMA memory found finally
+			 */
+			if (start) {
+				schedule();
+				start = 0;
+				continue;
+			}
 			break;
 		}
 		bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry
  2022-03-15 14:45 [PATCH v3 0/2] mm: fix cma allocation fail sometimes Dong Aisheng
  2022-03-15 14:45 ` [PATCH v3 1/2] mm: cma: fix allocation may " Dong Aisheng
@ 2022-03-15 14:45 ` Dong Aisheng
  1 sibling, 0 replies; 14+ messages in thread
From: Dong Aisheng @ 2022-03-15 14:45 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-arm-kernel, dongas86, shawnguo, linux-imx,
	akpm, m.szyprowski, lecopzer.chen, david, vbabka, stable,
	shijie.qin, Dong Aisheng

On an ARMv7 platform with 1G memory, when MAX_ORDER_NR_PAGES
is big (e.g. 32M with max_order 14) and CMA memory is relatively small
(e.g. 128M), we observed a huge number of repeat retries of CMA
allocation (1k+) during booting when allocating one page for each
of 3 mmc instance probe.

This is caused by CMA now supports concurrent allocation since commit
a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock").
The memory range, MAX_ORDER_NR_PAGES aligned, from which we are trying
to allocate memory may have already been acquired and isolated by others
(see: alloc_contig_range()).

Current cma_alloc() will retry the next area by a small step of
bitmap_no + mask + 1 which are very likely within the same isolated range
and fail again. So when the MAX_ORDER_NR_PAGES is big (e.g. 32M),
keep retrying in a small step become meaningless because it will be known
to fail at a huge number of times due to that memory range has been isolated
by others, especially when allocating only one or two pages.

Instead of looping in the same isolated range and wasting CPU mips a lot,
especially for big MAX_ORDER systems (e.g. 16M or 32M),
we try the next MAX_ORDER_NR_PAGES directly.

Doing this way can greatly mitigate the situtation.

Below is the original error log during booting:
[    2.004804] cma: cma_alloc(cma (ptrval), count 1, align 0)
[    2.010318] cma: cma_alloc(cma (ptrval), count 1, align 0)
[    2.010776] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010785] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010793] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010800] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010807] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010814] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
.... (+1K retries)

After fix, the 1200+ reties can be reduced to 0.
Another test running 8 threads running dma_alloc_coherent() in parallel
shows that 1500+ retries dropped to ~145.

IOW this patch can improve the CMA allocation speed a lot when there're
enough CMA memory by reducing retries significantly.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
CC: stable@vger.kernel.org # 5.11+
Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
---
v2->v3:
 * Improve commit messeages
v1->v2:
 * change to align with MAX_ORDER_NR_PAGES instead of pageblock_nr_pages
---
 mm/cma.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index 46a9fd9f92c4..46bc12fe28b3 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -496,8 +496,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 
 		trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
 					   count, align);
-		/* try again with a bit different memory target */
-		start = bitmap_no + mask + 1;
+		/*
+		 * Try again with a bit different memory target.
+		 * Since memory isolated in alloc_contig_range() is aligned
+		 * with MAX_ORDER_NR_PAGES, instead of retrying in a small
+		 * step within the same isolated range, we try the next
+		 * available memory range directly.
+		 */
+		start = ALIGN(bitmap_no + mask + 1,
+			      MAX_ORDER_NR_PAGES >> cma->order_per_bit);
+
 	}
 
 	trace_cma_alloc_finish(cma->name, pfn, page, count, align);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-15 14:45 ` [PATCH v3 1/2] mm: cma: fix allocation may " Dong Aisheng
@ 2022-03-15 22:58   ` Andrew Morton
  2022-03-16  3:41     ` Dong Aisheng
  2022-03-17 10:55   ` David Hildenbrand
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2022-03-15 22:58 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: linux-mm, linux-kernel, linux-arm-kernel, dongas86, shawnguo,
	linux-imx, m.szyprowski, lecopzer.chen, david, vbabka, stable,
	shijie.qin

On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@nxp.com> wrote:

> --- a/mm/cma.c
> +++ b/mm/cma.c
>
> ...
>
> @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>  				offset);
>  		if (bitmap_no >= bitmap_maxno) {
>  			spin_unlock_irq(&cma->lock);
> +			pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
> +			/*
> +			 * rescan as others may finish the memory migration
> +			 * and quit if no available CMA memory found finally
> +			 */
> +			if (start) {
> +				schedule();
> +				start = 0;
> +				continue;
> +			}
>  			break;

The schedule() is problematic.  For a start, we'd normally use
cond_resched() here, so we avoid calling the more expensive schedule()
if we know it won't perform any action.

But cond_resched() is problematic if this thread has realtime
scheduling policy and the process we're waiting on does not.  One way
to address that is to use an unconditional msleep(1), but that's still
just a hack.

A much cleaner solution is to use appropriate locking so that various
threads run this code in order, without messing each other up.

And it looks like the way to do that is to simply revert the commit
which caused this regression, a4efc174b382 ("mm/cma.c: remove redundant
cma_mutex lock")?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-15 22:58   ` Andrew Morton
@ 2022-03-16  3:41     ` Dong Aisheng
  2022-03-16 21:09       ` Andrew Morton
  0 siblings, 1 reply; 14+ messages in thread
From: Dong Aisheng @ 2022-03-16  3:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dong Aisheng, linux-mm, linux-kernel, linux-arm-kernel, shawnguo,
	linux-imx, m.szyprowski, lecopzer.chen, david, vbabka, stable,
	shijie.qin

On Wed, Mar 16, 2022 at 6:58 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@nxp.com> wrote:
>
> > --- a/mm/cma.c
> > +++ b/mm/cma.c
> >
> > ...
> >
> > @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> >                               offset);
> >               if (bitmap_no >= bitmap_maxno) {
> >                       spin_unlock_irq(&cma->lock);
> > +                     pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
> > +                     /*
> > +                      * rescan as others may finish the memory migration
> > +                      * and quit if no available CMA memory found finally
> > +                      */
> > +                     if (start) {
> > +                             schedule();
> > +                             start = 0;
> > +                             continue;
> > +                     }
> >                       break;
>
> The schedule() is problematic. For a start, we'd normally use
> cond_resched() here, so we avoid calling the more expensive schedule()
> if we know it won't perform any action.
>
> But cond_resched() is problematic if this thread has realtime
> scheduling policy and the process we're waiting on does not.  One way
> to address that is to use an unconditional msleep(1), but that's still
> just a hack.
>

I think we can simply drop schedule() here during the second round of retry
as the estimated delay may not be really needed.

Do you think that's ok?

> A much cleaner solution is to use appropriate locking so that various
> threads run this code in order, without messing each other up.
>
> And it looks like the way to do that is to simply revert the commit
> which caused this regression, a4efc174b382 ("mm/cma.c: remove redundant
> cma_mutex lock")?

Yes, agree it could be a backup solution if not better ideas.

Regards
Aisheng

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-16  3:41     ` Dong Aisheng
@ 2022-03-16 21:09       ` Andrew Morton
  2022-03-17  3:49         ` Dong Aisheng
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2022-03-16 21:09 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: Dong Aisheng, linux-mm, linux-kernel, linux-arm-kernel, shawnguo,
	linux-imx, m.szyprowski, lecopzer.chen, david, vbabka, stable,
	shijie.qin

On Wed, 16 Mar 2022 11:41:37 +0800 Dong Aisheng <dongas86@gmail.com> wrote:

> On Wed, Mar 16, 2022 at 6:58 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@nxp.com> wrote:
> >
> > > --- a/mm/cma.c
> > > +++ b/mm/cma.c
> > >
> > > ...
> > >
> > > @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> > >                               offset);
> > >               if (bitmap_no >= bitmap_maxno) {
> > >                       spin_unlock_irq(&cma->lock);
> > > +                     pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
> > > +                     /*
> > > +                      * rescan as others may finish the memory migration
> > > +                      * and quit if no available CMA memory found finally
> > > +                      */
> > > +                     if (start) {
> > > +                             schedule();
> > > +                             start = 0;
> > > +                             continue;
> > > +                     }
> > >                       break;
> >
> > The schedule() is problematic. For a start, we'd normally use
> > cond_resched() here, so we avoid calling the more expensive schedule()
> > if we know it won't perform any action.
> >
> > But cond_resched() is problematic if this thread has realtime
> > scheduling policy and the process we're waiting on does not.  One way
> > to address that is to use an unconditional msleep(1), but that's still
> > just a hack.
> >
> 
> I think we can simply drop schedule() here during the second round of retry
> as the estimated delay may not be really needed.

That will simply cause a tight loop, so I'm obviously not understanding
the proposal.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-16 21:09       ` Andrew Morton
@ 2022-03-17  3:49         ` Dong Aisheng
  0 siblings, 0 replies; 14+ messages in thread
From: Dong Aisheng @ 2022-03-17  3:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dong Aisheng, linux-mm, linux-kernel, linux-arm-kernel, shawnguo,
	linux-imx, m.szyprowski, lecopzer.chen, david, vbabka, stable,
	shijie.qin

On Thu, Mar 17, 2022 at 5:09 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 16 Mar 2022 11:41:37 +0800 Dong Aisheng <dongas86@gmail.com> wrote:
>
> > On Wed, Mar 16, 2022 at 6:58 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng <aisheng.dong@nxp.com> wrote:
> > >
> > > > --- a/mm/cma.c
> > > > +++ b/mm/cma.c
> > > >
> > > > ...
> > > >
> > > > @@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> > > >                               offset);
> > > >               if (bitmap_no >= bitmap_maxno) {
> > > >                       spin_unlock_irq(&cma->lock);
> > > > +                     pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
> > > > +                     /*
> > > > +                      * rescan as others may finish the memory migration
> > > > +                      * and quit if no available CMA memory found finally
> > > > +                      */
> > > > +                     if (start) {
> > > > +                             schedule();
> > > > +                             start = 0;
> > > > +                             continue;
> > > > +                     }
> > > >                       break;
> > >
> > > The schedule() is problematic. For a start, we'd normally use
> > > cond_resched() here, so we avoid calling the more expensive schedule()
> > > if we know it won't perform any action.
> > >
> > > But cond_resched() is problematic if this thread has realtime
> > > scheduling policy and the process we're waiting on does not.  One way
> > > to address that is to use an unconditional msleep(1), but that's still
> > > just a hack.
> > >
> >
> > I think we can simply drop schedule() here during the second round of retry
> > as the estimated delay may not be really needed.
>
> That will simply cause a tight loop, so I'm obviously not understanding
> the proposal.
>

IIUC the original code is already a tight loop, isn't it?
You could also see my observation, thousands of retries, in patch 2.
The logic in this patch is just retry the original loop in case in case there's
a false possive error return.

Or you mean infinite loop? The loop will break out when meet an non EBUSY
error in alloc_contig_range().

BTW, the tight loop situation could be improved a lot by my patch 2.

And after Zi Yan's patchset [1] got merged, the situation could be
further improved by retring in pageblock step.
1. [v7,0/5] Use pageblock_order for cma and alloc_contig_range
alignment. - Patchwork (kernel.org)

So generally i wonder it seems still better than simply revert.
Please fix me if i still missed something.

Regards
Aisheng

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-15 14:45 ` [PATCH v3 1/2] mm: cma: fix allocation may " Dong Aisheng
  2022-03-15 22:58   ` Andrew Morton
@ 2022-03-17 10:55   ` David Hildenbrand
  2022-03-17 14:26     ` Dong Aisheng
  1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2022-03-17 10:55 UTC (permalink / raw)
  To: Dong Aisheng, linux-mm
  Cc: linux-kernel, linux-arm-kernel, dongas86, shawnguo, linux-imx,
	akpm, m.szyprowski, lecopzer.chen, vbabka, stable, shijie.qin

On 15.03.22 15:45, Dong Aisheng wrote:
> When there're multiple process allocing dma memory in parallel

s/allocing/allocating/

> by calling dma_alloc_coherent(), it may fail sometimes as follows:
> 
> Error log:
> cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
> cma: number of available pages:
> 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
> 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
> 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
> 
> When issue happened, we saw there were still 33161 pages (129M) free CMA
> memory and a lot available free slots for 148 pages in CMA bitmap that we
> want to allocate.
> 
> If dumping memory info, we found that there was also ~342M normal memory,
> but only 1352K CMA memory left in buddy system while a lot of pageblocks
> were isolated.

s/If/When/

> 
> Memory info log:
> Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
> 	    active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
> 	    unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
> 	    bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
> Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
> 	36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
> 	8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
> 
> The root cause of this issue is that since commit a4efc174b382
> ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
> memory allocation. It's possible that the memory range process A trying
> to alloc has already been isolated by the allocation of process B during
> memory migration.
> 
> The problem here is that the memory range isolated during one allocation
> by start_isolate_page_range() could be much bigger than the real size we
> want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
> 
> Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
> is big (e.g. 32M with max_order 14) and CMA memory is relatively small
> (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
> all CMA memory may have already been isolated by other processes when
> one trying to allocate memory using dma_alloc_coherent().
> Since current CMA code will only scan one time of whole available CMA
> memory, then dma_alloc_coherent() may easy fail due to contention with
> other processes.
> 
> This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
> error in case the target memory range may has been temporarily isolated
> by others and released later.

But you patch doesn't check for -EBUSY and instead might retry forever,
on any allocation error, no?

I'd really suggest letting alloc_contig_range() return -EAGAIN in case
the isolation failed and handling -EAGAIN only in a special way instead.

In addition, we might want to stop once we looped to often I assume.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-17 10:55   ` David Hildenbrand
@ 2022-03-17 14:26     ` Dong Aisheng
  2022-03-17 17:12       ` Minchan Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Dong Aisheng @ 2022-03-17 14:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dong Aisheng, linux-mm, linux-kernel, linux-arm-kernel, shawnguo,
	linux-imx, akpm, m.szyprowski, lecopzer.chen, vbabka, stable,
	shijie.qin

On Thu, Mar 17, 2022 at 6:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 15.03.22 15:45, Dong Aisheng wrote:
> > When there're multiple process allocing dma memory in parallel
>
> s/allocing/allocating/
>
> > by calling dma_alloc_coherent(), it may fail sometimes as follows:
> >
> > Error log:
> > cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
> > cma: number of available pages:
> > 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
> > 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
> > 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
> >
> > When issue happened, we saw there were still 33161 pages (129M) free CMA
> > memory and a lot available free slots for 148 pages in CMA bitmap that we
> > want to allocate.
> >
> > If dumping memory info, we found that there was also ~342M normal memory,
> > but only 1352K CMA memory left in buddy system while a lot of pageblocks
> > were isolated.
>
> s/If/When/
>

Will fix them all, thanks.

> >
> > Memory info log:
> > Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
> >           active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
> >           unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
> >           bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
> > Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
> >       36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
> >       8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
> >
> > The root cause of this issue is that since commit a4efc174b382
> > ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
> > memory allocation. It's possible that the memory range process A trying
> > to alloc has already been isolated by the allocation of process B during
> > memory migration.
> >
> > The problem here is that the memory range isolated during one allocation
> > by start_isolate_page_range() could be much bigger than the real size we
> > want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
> >
> > Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
> > is big (e.g. 32M with max_order 14) and CMA memory is relatively small
> > (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
> > all CMA memory may have already been isolated by other processes when
> > one trying to allocate memory using dma_alloc_coherent().
> > Since current CMA code will only scan one time of whole available CMA
> > memory, then dma_alloc_coherent() may easy fail due to contention with
> > other processes.
> >
> > This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
> > error in case the target memory range may has been temporarily isolated
> > by others and released later.
>
> But you patch doesn't check for -EBUSY and instead might retry forever,
> on any allocation error, no?
>

My patch seems not need check it because there's no chance to retry the loop
in case an non -EBUS error happened earlier.

for (;;) {
        if (bitmap_no >= bitmap_maxno) {
                retry_the_whole_loop;
        }

        pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
        ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
                             GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));

        if (ret != -EBUSY)
                break;
}

> I'd really suggest letting alloc_contig_range() return -EAGAIN in case
> the isolation failed and handling -EAGAIN only in a special way instead.
>

Yes, i guess that's another improvement and is applicable.

> In addition, we might want to stop once we looped to often I assume.
>

I wonder if really retried un-reasonably too often, we probably may
need figure out
what's going on inside alloc_contig_range() and fix it rather than
return EBUSY error to
users in case there're still a lot of avaiable memories.
So currently i didn't add a maximum retry loop outside.

Additionaly, for a small CMA system (128M with 32M max_order pages),
the retry would
be frequently when multiple process allocating memory, it also depends
on system running
state, so it's hard to define a reasonable and stable maxinum retry count.

Regards
Aisheng

> --
> Thanks,
>
> David / dhildenb
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-17 14:26     ` Dong Aisheng
@ 2022-03-17 17:12       ` Minchan Kim
  2022-03-18  3:43         ` Dong Aisheng
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2022-03-17 17:12 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: David Hildenbrand, Dong Aisheng, linux-mm, linux-kernel,
	linux-arm-kernel, shawnguo, linux-imx, akpm, m.szyprowski,
	lecopzer.chen, vbabka, stable, shijie.qin

On Thu, Mar 17, 2022 at 10:26:42PM +0800, Dong Aisheng wrote:
> On Thu, Mar 17, 2022 at 6:55 PM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 15.03.22 15:45, Dong Aisheng wrote:
> > > When there're multiple process allocing dma memory in parallel
> >
> > s/allocing/allocating/
> >
> > > by calling dma_alloc_coherent(), it may fail sometimes as follows:
> > >
> > > Error log:
> > > cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
> > > cma: number of available pages:
> > > 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
> > > 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
> > > 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
> > >
> > > When issue happened, we saw there were still 33161 pages (129M) free CMA
> > > memory and a lot available free slots for 148 pages in CMA bitmap that we
> > > want to allocate.

Yes, I also have met the problem especially when the multiple threads
compete cma allocation. Thanks for bringing up the issue.

> > >
> > > If dumping memory info, we found that there was also ~342M normal memory,
> > > but only 1352K CMA memory left in buddy system while a lot of pageblocks
> > > were isolated.
> >
> > s/If/When/
> >
> 
> Will fix them all, thanks.
> 
> > >
> > > Memory info log:
> > > Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
> > >           active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
> > >           unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
> > >           bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
> > > Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
> > >       36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
> > >       8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
> > >
> > > The root cause of this issue is that since commit a4efc174b382
> > > ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
> > > memory allocation. It's possible that the memory range process A trying
> > > to alloc has already been isolated by the allocation of process B during
> > > memory migration.
> > >
> > > The problem here is that the memory range isolated during one allocation
> > > by start_isolate_page_range() could be much bigger than the real size we
> > > want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
> > >
> > > Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
> > > is big (e.g. 32M with max_order 14) and CMA memory is relatively small
> > > (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
> > > all CMA memory may have already been isolated by other processes when
> > > one trying to allocate memory using dma_alloc_coherent().
> > > Since current CMA code will only scan one time of whole available CMA
> > > memory, then dma_alloc_coherent() may easy fail due to contention with
> > > other processes.
> > >
> > > This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
> > > error in case the target memory range may has been temporarily isolated
> > > by others and released later.
> >
> > But you patch doesn't check for -EBUSY and instead might retry forever,
> > on any allocation error, no?
> >
> 
> My patch seems not need check it because there's no chance to retry the loop
> in case an non -EBUS error happened earlier.
> 
> for (;;) {
>         if (bitmap_no >= bitmap_maxno) {
>                 retry_the_whole_loop;
>         }
> 
>         pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>         ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
>                              GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> 
>         if (ret != -EBUSY)
>                 break;
> }
> 
> > I'd really suggest letting alloc_contig_range() return -EAGAIN in case
> > the isolation failed and handling -EAGAIN only in a special way instead.
> >
> 
> Yes, i guess that's another improvement and is applicable.
> 
> > In addition, we might want to stop once we looped to often I assume.
> >
> 
> I wonder if really retried un-reasonably too often, we probably may
> need figure out
> what's going on inside alloc_contig_range() and fix it rather than
> return EBUSY error to
> users in case there're still a lot of avaiable memories.
> So currently i didn't add a maximum retry loop outside.
> 
> Additionaly, for a small CMA system (128M with 32M max_order pages),
> the retry would
> be frequently when multiple process allocating memory, it also depends
> on system running
> state, so it's hard to define a reasonable and stable maxinum retry count.

IMO, when the CMA see the -EAGAIN, it should put the task into
cma->wait_queue and then be woken up by other thread which finish
work of the cma. So it's similar with cma_mutex but we don't need to
synchronize for !EAGAIN cases and make the cma allocatoin fair.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-17 17:12       ` Minchan Kim
@ 2022-03-18  3:43         ` Dong Aisheng
  2022-03-18 16:20           ` Minchan Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Dong Aisheng @ 2022-03-18  3:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Hildenbrand, Dong Aisheng, linux-mm, linux-kernel,
	linux-arm-kernel, shawnguo, linux-imx, akpm, m.szyprowski,
	lecopzer.chen, vbabka, stable, shijie.qin

On Fri, Mar 18, 2022 at 1:12 AM Minchan Kim <minchan@kernel.org> wrote:
>
> On Thu, Mar 17, 2022 at 10:26:42PM +0800, Dong Aisheng wrote:
> > On Thu, Mar 17, 2022 at 6:55 PM David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 15.03.22 15:45, Dong Aisheng wrote:
> > > > When there're multiple process allocing dma memory in parallel
> > >
> > > s/allocing/allocating/
> > >
> > > > by calling dma_alloc_coherent(), it may fail sometimes as follows:
> > > >
> > > > Error log:
> > > > cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
> > > > cma: number of available pages:
> > > > 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
> > > > 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
> > > > 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
> > > >
> > > > When issue happened, we saw there were still 33161 pages (129M) free CMA
> > > > memory and a lot available free slots for 148 pages in CMA bitmap that we
> > > > want to allocate.
>
> Yes, I also have met the problem especially when the multiple threads
> compete cma allocation. Thanks for bringing up the issue.
>
> > > >
> > > > If dumping memory info, we found that there was also ~342M normal memory,
> > > > but only 1352K CMA memory left in buddy system while a lot of pageblocks
> > > > were isolated.
> > >
> > > s/If/When/
> > >
> >
> > Will fix them all, thanks.
> >
> > > >
> > > > Memory info log:
> > > > Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
> > > >           active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
> > > >           unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
> > > >           bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
> > > > Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
> > > >       36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
> > > >       8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
> > > >
> > > > The root cause of this issue is that since commit a4efc174b382
> > > > ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
> > > > memory allocation. It's possible that the memory range process A trying
> > > > to alloc has already been isolated by the allocation of process B during
> > > > memory migration.
> > > >
> > > > The problem here is that the memory range isolated during one allocation
> > > > by start_isolate_page_range() could be much bigger than the real size we
> > > > want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
> > > >
> > > > Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
> > > > is big (e.g. 32M with max_order 14) and CMA memory is relatively small
> > > > (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
> > > > all CMA memory may have already been isolated by other processes when
> > > > one trying to allocate memory using dma_alloc_coherent().
> > > > Since current CMA code will only scan one time of whole available CMA
> > > > memory, then dma_alloc_coherent() may easy fail due to contention with
> > > > other processes.
> > > >
> > > > This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
> > > > error in case the target memory range may has been temporarily isolated
> > > > by others and released later.
> > >
> > > But you patch doesn't check for -EBUSY and instead might retry forever,
> > > on any allocation error, no?
> > >
> >
> > My patch seems not need check it because there's no chance to retry the loop
> > in case an non -EBUS error happened earlier.
> >
> > for (;;) {
> >         if (bitmap_no >= bitmap_maxno) {
> >                 retry_the_whole_loop;
> >         }
> >
> >         pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> >         ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> >                              GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> >
> >         if (ret != -EBUSY)
> >                 break;
> > }
> >
> > > I'd really suggest letting alloc_contig_range() return -EAGAIN in case
> > > the isolation failed and handling -EAGAIN only in a special way instead.
> > >
> >
> > Yes, i guess that's another improvement and is applicable.
> >
> > > In addition, we might want to stop once we looped to often I assume.
> > >
> >
> > I wonder if really retried un-reasonably too often, we probably may
> > need figure out
> > what's going on inside alloc_contig_range() and fix it rather than
> > return EBUSY error to
> > users in case there're still a lot of avaiable memories.
> > So currently i didn't add a maximum retry loop outside.
> >
> > Additionaly, for a small CMA system (128M with 32M max_order pages),
> > the retry would
> > be frequently when multiple process allocating memory, it also depends
> > on system running
> > state, so it's hard to define a reasonable and stable maxinum retry count.
>
> IMO, when the CMA see the -EAGAIN, it should put the task into
> cma->wait_queue and then be woken up by other thread which finish
> work of the cma. So it's similar with cma_mutex but we don't need to
> synchronize for !EAGAIN cases and make the cma allocatoin fair.

Okay, that's another approach which is completely different from the
existing one.
Instead of blocking on the CMA memory range which we want to allocate,
the existing code will try the next available memory ranges.
The question is whether we need to change this behavior?
It looks to me both ways have pros and cons.

And for sleeping on -EAGAIN case, do we need an accurate wakeup?
IOW only wakes the sleeper when the exact memory range is released
which means we need create some more complicated code logic
to track different CMA memory range usage.
Otherwise, there will be possible false positive wakeups and the requester may
quickly sleep again.

I'm not sure if it's worth it. Might need to think a bit more.

Regards
Aisheng

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-18  3:43         ` Dong Aisheng
@ 2022-03-18 16:20           ` Minchan Kim
  2022-05-04 15:52             ` Dong Aisheng
  0 siblings, 1 reply; 14+ messages in thread
From: Minchan Kim @ 2022-03-18 16:20 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: David Hildenbrand, Dong Aisheng, linux-mm, linux-kernel,
	linux-arm-kernel, shawnguo, linux-imx, akpm, m.szyprowski,
	lecopzer.chen, vbabka, stable, shijie.qin

On Fri, Mar 18, 2022 at 11:43:41AM +0800, Dong Aisheng wrote:
> On Fri, Mar 18, 2022 at 1:12 AM Minchan Kim <minchan@kernel.org> wrote:
> >
> > On Thu, Mar 17, 2022 at 10:26:42PM +0800, Dong Aisheng wrote:
> > > On Thu, Mar 17, 2022 at 6:55 PM David Hildenbrand <david@redhat.com> wrote:
> > > >
> > > > On 15.03.22 15:45, Dong Aisheng wrote:
> > > > > When there're multiple process allocing dma memory in parallel
> > > >
> > > > s/allocing/allocating/
> > > >
> > > > > by calling dma_alloc_coherent(), it may fail sometimes as follows:
> > > > >
> > > > > Error log:
> > > > > cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
> > > > > cma: number of available pages:
> > > > > 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
> > > > > 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
> > > > > 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
> > > > >
> > > > > When issue happened, we saw there were still 33161 pages (129M) free CMA
> > > > > memory and a lot available free slots for 148 pages in CMA bitmap that we
> > > > > want to allocate.
> >
> > Yes, I also have met the problem especially when the multiple threads
> > compete cma allocation. Thanks for bringing up the issue.
> >
> > > > >
> > > > > If dumping memory info, we found that there was also ~342M normal memory,
> > > > > but only 1352K CMA memory left in buddy system while a lot of pageblocks
> > > > > were isolated.
> > > >
> > > > s/If/When/
> > > >
> > >
> > > Will fix them all, thanks.
> > >
> > > > >
> > > > > Memory info log:
> > > > > Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
> > > > >           active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
> > > > >           unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
> > > > >           bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
> > > > > Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
> > > > >       36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
> > > > >       8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
> > > > >
> > > > > The root cause of this issue is that since commit a4efc174b382
> > > > > ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
> > > > > memory allocation. It's possible that the memory range process A trying
> > > > > to alloc has already been isolated by the allocation of process B during
> > > > > memory migration.
> > > > >
> > > > > The problem here is that the memory range isolated during one allocation
> > > > > by start_isolate_page_range() could be much bigger than the real size we
> > > > > want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
> > > > >
> > > > > Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
> > > > > is big (e.g. 32M with max_order 14) and CMA memory is relatively small
> > > > > (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
> > > > > all CMA memory may have already been isolated by other processes when
> > > > > one trying to allocate memory using dma_alloc_coherent().
> > > > > Since current CMA code will only scan one time of whole available CMA
> > > > > memory, then dma_alloc_coherent() may easy fail due to contention with
> > > > > other processes.
> > > > >
> > > > > This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
> > > > > error in case the target memory range may has been temporarily isolated
> > > > > by others and released later.
> > > >
> > > > But you patch doesn't check for -EBUSY and instead might retry forever,
> > > > on any allocation error, no?
> > > >
> > >
> > > My patch seems not need check it because there's no chance to retry the loop
> > > in case an non -EBUS error happened earlier.
> > >
> > > for (;;) {
> > >         if (bitmap_no >= bitmap_maxno) {
> > >                 retry_the_whole_loop;
> > >         }
> > >
> > >         pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> > >         ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> > >                              GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> > >
> > >         if (ret != -EBUSY)
> > >                 break;
> > > }
> > >
> > > > I'd really suggest letting alloc_contig_range() return -EAGAIN in case
> > > > the isolation failed and handling -EAGAIN only in a special way instead.
> > > >
> > >
> > > Yes, i guess that's another improvement and is applicable.
> > >
> > > > In addition, we might want to stop once we looped to often I assume.
> > > >
> > >
> > > I wonder if really retried un-reasonably too often, we probably may
> > > need figure out
> > > what's going on inside alloc_contig_range() and fix it rather than
> > > return EBUSY error to
> > > users in case there're still a lot of avaiable memories.
> > > So currently i didn't add a maximum retry loop outside.
> > >
> > > Additionaly, for a small CMA system (128M with 32M max_order pages),
> > > the retry would
> > > be frequently when multiple process allocating memory, it also depends
> > > on system running
> > > state, so it's hard to define a reasonable and stable maxinum retry count.
> >
> > IMO, when the CMA see the -EAGAIN, it should put the task into
> > cma->wait_queue and then be woken up by other thread which finish
> > work of the cma. So it's similar with cma_mutex but we don't need to
> > synchronize for !EAGAIN cases and make the cma allocatoin fair.
> 
> Okay, that's another approach which is completely different from the
> existing one.
> Instead of blocking on the CMA memory range which we want to allocate,
> the existing code will try the next available memory ranges.
> The question is whether we need to change this behavior?

It could wait only if it scan the whole range but everyblock were
taken and then one more trial after woken up. If the cma_alloc return
-EAGAIN in the end, user could decide the policy.

> It looks to me both ways have pros and cons.
> 
> And for sleeping on -EAGAIN case, do we need an accurate wakeup?
> IOW only wakes the sleeper when the exact memory range is released
> which means we need create some more complicated code logic
> to track different CMA memory range usage.
> Otherwise, there will be possible false positive wakeups and the requester may
> quickly sleep again.

I even didn't consider such complicated model since we would have
the race anyway.(Never tested but wanted to show the intention)

diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..cccf684da587 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -449,6 +449,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 	offset = cma_bitmap_aligned_offset(cma, align);
 	bitmap_maxno = cma_bitmap_maxno(cma);
 	bitmap_count = cma_bitmap_pages_to_bits(cma, count);
+	bool retried = false;
 
 	if (bitmap_count > bitmap_maxno)
 		goto out;
@@ -460,6 +461,12 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 				offset);
 		if (bitmap_no >= bitmap_maxno) {
 			spin_unlock_irq(&cma->lock);
+			if (ret == -EAGAIN && !retried) {
+				wait_event_killable(cma->wq);
+				retried = true;
+				start = 0;
+				continue;
+			}
 			break;
 		}
 		bitmap_set(cma->bitmap, bitmap_no, bitmap_count);
@@ -471,6 +478,10 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 		spin_unlock_irq(&cma->lock);
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
+		if (is_migrate_isolate_page(pfn_to_page(pfn))) {
+			ret = -EAGAIN;
+			goto next;
+		}
 		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
 				     GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
 
@@ -478,9 +489,9 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 			page = pfn_to_page(pfn);
 			break;
 		}
-
+next:
 		cma_clear_bitmap(cma, pfn, count);
-		if (ret != -EBUSY)
+		if (ret != -EBUSY && ret != -EAGAIN)
 			break;
 
 		pr_debug("%s(): memory range at %p is busy, retrying\n",
@@ -489,7 +500,10 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 		trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
 					   count, align);
 		/* try again with a bit different memory target */
-		start = bitmap_no + mask + 1;
+		if (ret == -EAGAIN)
+			start = (pfn_max_align_up(pfn + 1) - cma->base_pfn) >> cma->order_per_bit;
+		else
+			start = bitmap_no + mask + 1;
 	}
 
 	trace_cma_alloc_finish(cma->name, pfn, page, count, align);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f67c4c70f17f..f5a16a82552a 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -182,7 +182,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * might be used to flush and disable pcplist before isolation and enable after
  * unisolation.
  *
- * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
+ * Return: 0 on success and -EAGAIN if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, int flags)
@@ -199,7 +199,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page && set_migratetype_isolate(page, migratetype, flags)) {
 			undo_isolate_page_range(start_pfn, pfn, migratetype);
-			return -EBUSY;
+			return -EAGAIN
 		}
 	}
 	return 0;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-03-18 16:20           ` Minchan Kim
@ 2022-05-04 15:52             ` Dong Aisheng
  2022-05-04 23:25               ` Minchan Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Dong Aisheng @ 2022-05-04 15:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Hildenbrand, Dong Aisheng, linux-mm, linux-kernel,
	linux-arm-kernel, shawnguo, linux-imx, akpm, m.szyprowski,
	lecopzer.chen, vbabka, stable, shijie.qin

Hi Minchan & David,

On Sat, Mar 19, 2022 at 12:20 AM Minchan Kim <minchan@kernel.org> wrote:
>
> On Fri, Mar 18, 2022 at 11:43:41AM +0800, Dong Aisheng wrote:
> > On Fri, Mar 18, 2022 at 1:12 AM Minchan Kim <minchan@kernel.org> wrote:
> > >
> > > On Thu, Mar 17, 2022 at 10:26:42PM +0800, Dong Aisheng wrote:
> > > > On Thu, Mar 17, 2022 at 6:55 PM David Hildenbrand <david@redhat.com> wrote:
> > > > >
> > > > > On 15.03.22 15:45, Dong Aisheng wrote:
> > > > > > When there're multiple process allocing dma memory in parallel
> > > > >
> > > > > s/allocing/allocating/
> > > > >
> > > > > > by calling dma_alloc_coherent(), it may fail sometimes as follows:
> > > > > >
> > > > > > Error log:
> > > > > > cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
> > > > > > cma: number of available pages:
> > > > > > 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
> > > > > > 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
> > > > > > 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
> > > > > >
> > > > > > When issue happened, we saw there were still 33161 pages (129M) free CMA
> > > > > > memory and a lot available free slots for 148 pages in CMA bitmap that we
> > > > > > want to allocate.
> > >
> > > Yes, I also have met the problem especially when the multiple threads
> > > compete cma allocation. Thanks for bringing up the issue.
> > >
> > > > > >
> > > > > > If dumping memory info, we found that there was also ~342M normal memory,
> > > > > > but only 1352K CMA memory left in buddy system while a lot of pageblocks
> > > > > > were isolated.
> > > > >
> > > > > s/If/When/
> > > > >
> > > >
> > > > Will fix them all, thanks.
> > > >
> > > > > >
> > > > > > Memory info log:
> > > > > > Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
> > > > > >           active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
> > > > > >           unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
> > > > > >           bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
> > > > > > Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
> > > > > >       36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
> > > > > >       8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
> > > > > >
> > > > > > The root cause of this issue is that since commit a4efc174b382
> > > > > > ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
> > > > > > memory allocation. It's possible that the memory range process A trying
> > > > > > to alloc has already been isolated by the allocation of process B during
> > > > > > memory migration.
> > > > > >
> > > > > > The problem here is that the memory range isolated during one allocation
> > > > > > by start_isolate_page_range() could be much bigger than the real size we
> > > > > > want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
> > > > > >
> > > > > > Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
> > > > > > is big (e.g. 32M with max_order 14) and CMA memory is relatively small
> > > > > > (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
> > > > > > all CMA memory may have already been isolated by other processes when
> > > > > > one trying to allocate memory using dma_alloc_coherent().
> > > > > > Since current CMA code will only scan one time of whole available CMA
> > > > > > memory, then dma_alloc_coherent() may easy fail due to contention with
> > > > > > other processes.
> > > > > >
> > > > > > This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
> > > > > > error in case the target memory range may has been temporarily isolated
> > > > > > by others and released later.
> > > > >
> > > > > But you patch doesn't check for -EBUSY and instead might retry forever,
> > > > > on any allocation error, no?
> > > > >
> > > >
> > > > My patch seems not need check it because there's no chance to retry the loop
> > > > in case an non -EBUS error happened earlier.
> > > >
> > > > for (;;) {
> > > >         if (bitmap_no >= bitmap_maxno) {
> > > >                 retry_the_whole_loop;
> > > >         }
> > > >
> > > >         pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> > > >         ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> > > >                              GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> > > >
> > > >         if (ret != -EBUSY)
> > > >                 break;
> > > > }
> > > >
> > > > > I'd really suggest letting alloc_contig_range() return -EAGAIN in case
> > > > > the isolation failed and handling -EAGAIN only in a special way instead.
> > > > >
> > > >
> > > > Yes, i guess that's another improvement and is applicable.
> > > >
> > > > > In addition, we might want to stop once we looped to often I assume.
> > > > >
> > > >
> > > > I wonder if really retried un-reasonably too often, we probably may
> > > > need figure out
> > > > what's going on inside alloc_contig_range() and fix it rather than
> > > > return EBUSY error to
> > > > users in case there're still a lot of avaiable memories.
> > > > So currently i didn't add a maximum retry loop outside.
> > > >
> > > > Additionaly, for a small CMA system (128M with 32M max_order pages),
> > > > the retry would
> > > > be frequently when multiple process allocating memory, it also depends
> > > > on system running
> > > > state, so it's hard to define a reasonable and stable maxinum retry count.
> > >
> > > IMO, when the CMA see the -EAGAIN, it should put the task into
> > > cma->wait_queue and then be woken up by other thread which finish
> > > work of the cma. So it's similar with cma_mutex but we don't need to
> > > synchronize for !EAGAIN cases and make the cma allocatoin fair.
> >
> > Okay, that's another approach which is completely different from the
> > existing one.
> > Instead of blocking on the CMA memory range which we want to allocate,
> > the existing code will try the next available memory ranges.
> > The question is whether we need to change this behavior?
>
> It could wait only if it scan the whole range but everyblock were
> taken and then one more trial after woken up. If the cma_alloc return
> -EAGAIN in the end, user could decide the policy.
>
> > It looks to me both ways have pros and cons.
> >
> > And for sleeping on -EAGAIN case, do we need an accurate wakeup?
> > IOW only wakes the sleeper when the exact memory range is released
> > which means we need create some more complicated code logic
> > to track different CMA memory range usage.
> > Otherwise, there will be possible false positive wakeups and the requester may
> > quickly sleep again.
>
> I even didn't consider such complicated model since we would have
> the race anyway.(Never tested but wanted to show the intention)
>
> diff --git a/mm/cma.c b/mm/cma.c
> index bc9ca8f3c487..cccf684da587 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -449,6 +449,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>         offset = cma_bitmap_aligned_offset(cma, align);
>         bitmap_maxno = cma_bitmap_maxno(cma);
>         bitmap_count = cma_bitmap_pages_to_bits(cma, count);
> +       bool retried = false;
>
>         if (bitmap_count > bitmap_maxno)
>                 goto out;
> @@ -460,6 +461,12 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>                                 offset);
>                 if (bitmap_no >= bitmap_maxno) {
>                         spin_unlock_irq(&cma->lock);
> +                       if (ret == -EAGAIN && !retried) {
> +                               wait_event_killable(cma->wq);

I spent a few days reading the code carefully and implementing your suggestions.
But I think the problem was still when to wake up multiple callers
properly who were sleeping
on isolation contention error.  The issue I observed was that there's
a synchronization problem
between the sleeper and waker in current code logic due to:
1) we only need to wake up the caller when detecting users sleeping on
the isolation contention.
A flag needs to be introduced to check dynamically.
2) wakeup code may be finished before the flag was set which means the
last sleeper
may be missed to wake up. So a timeout mechanism needs to be introduced.
3) the code can't avoid the wait_for_completion() and complete() run in parallel
without introducing extra complexity to do proper synchronization.
(during my test, &cma->pending can be < 0 sometimes)

I wonder if it's worth introducing such complexity to support sleeping
on isolation contention.
Maybe the simple way was just as Andrew said that  let's revert that
broken patch first.

Do you have any suggestions?

Below is the demo code I ran against next-20220420.

diff --git a/mm/cma.c b/mm/cma.c
index eaa4b5c920a2..682fb8a90572 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -119,6 +119,7 @@ static void __init cma_activate_area(struct cma *cma)
                init_cma_reserved_pageblock(pfn_to_page(pfn));

        spin_lock_init(&cma->lock);
+       init_completion(&cma->completion);

 #ifdef CONFIG_CMA_DEBUGFS
        INIT_HLIST_HEAD(&cma->mem_head);
@@ -429,7 +430,9 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
        unsigned long bitmap_maxno, bitmap_no, bitmap_count;
        unsigned long i;
        struct page *page = NULL;
+       bool rescan = false;
        int ret = -ENOMEM;
+       int loop = 0;

        if (!cma || !cma->count || !cma->bitmap)
                goto out;
@@ -456,6 +459,23 @@ struct page *cma_alloc(struct cma *cma, unsigned
long count,
                                bitmap_maxno, start, bitmap_count, mask,
                                offset);
                if (bitmap_no >= bitmap_maxno) {
+                       pr_debug("%s(): alloc fail, rescan %d\n", __func__,
+                                loop++);
+                       /*
+                        * rescan CMA as the previous isolated memory range we
+                        * wanted to use may have been release by others now.
+                       */
+                       if (rescan) {
+                               rescan = false;
+                               start = 0;
+                               atomic_add(1, &cma->pending);
+                               spin_unlock_irq(&cma->lock);
+                               ret =
wait_for_completion_timeout(&cma->completion, HZ / 2);
+                               if (!ret) {
+                                       atomic_sub(1, &cma->pending);
+                               }
+                               continue;
+                       }
                        spin_unlock_irq(&cma->lock);
                        break;
                }
@@ -468,16 +488,29 @@ struct page *cma_alloc(struct cma *cma, unsigned
long count,
                spin_unlock_irq(&cma->lock);

                pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
+
+               /*
+                * Memory range may be temporarily isolated by others.
+                * Recording it once met and giving another chance to rescan
+                * the whole CMA range in case we failed to find a suitable
+                * memory finally during the initial scan and assuming the
+                * isolated range may be released sometime later.
+                */
                ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
                                     GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));

                if (ret == 0) {
                        page = pfn_to_page(pfn);
+                       if (atomic_read(&cma->pending)) {
+                               complete(&cma->completion);
+                               atomic_sub(1, &cma->pending);
+                       }
                        break;
+               } else if ( ret == -EAGAIN) {
+                       rescan = true;
                }

                cma_clear_bitmap(cma, pfn, count);
-               if (ret != -EBUSY)
+               if (ret != -EBUSY && ret != -EAGAIN)
                        break;

                pr_debug("%s(): memory range at %p is busy, retrying\n",
@@ -486,7 +519,8 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
                trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
                                           count, align);
                /* try again with a bit different memory target */
-               start = bitmap_no + mask + 1;
+               start = ALIGN(bitmap_no + mask + 1,
+                             MAX_ORDER_NR_PAGES >> cma->order_per_bit);
        }

        trace_cma_alloc_finish(cma->name, pfn, page, count, align);
diff --git a/mm/cma.h b/mm/cma.h
index 88a0595670b7..35b4d33d12e4 100644
--- a/mm/cma.h
+++ b/mm/cma.h
@@ -31,6 +31,8 @@ struct cma {
        struct cma_kobject *cma_kobj;
 #endif
        bool reserve_pages_on_error;
+       struct completion completion;
+       atomic_t pending;
 };

 extern struct cma cma_areas[MAX_CMA_AREAS];
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index ff0ea6308299..a2a46eebb0bc 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -30,7 +30,7 @@ static int set_migratetype_isolate(struct page
*page, int migratetype, int isol_
         */
        if (is_migrate_isolate_page(page)) {
                spin_unlock_irqrestore(&zone->lock, flags);
-               return -EBUSY;
+               return -EAGAIN;
        }

        /*
@@ -179,13 +179,15 @@ __first_valid_page(unsigned long pfn, unsigned
long nr_pages)
  * might be used to flush and disable pcplist before isolation and enable after
  * unisolation.
  *
- * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
+ * Return: 0 on success and -EAGAIN or -EBUSY if any part of range cannot be
+ * isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
                             unsigned migratetype, int flags)
 {
        unsigned long pfn;
        struct page *page;
+       int ret;

        BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
        BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
@@ -194,9 +196,13 @@ int start_isolate_page_range(unsigned long
start_pfn, unsigned long end_pfn,
             pfn < end_pfn;
             pfn += pageblock_nr_pages) {
                page = __first_valid_page(pfn, pageblock_nr_pages);
-               if (page && set_migratetype_isolate(page, migratetype, flags)) {
+               if (!page)
+                       continue;
+
+               ret = set_migratetype_isolate(page, migratetype, flags);
+               if (ret) {
                        undo_isolate_page_range(start_pfn, pfn, migratetype);
-                       return -EBUSY;
+                       return ret;
                }
        }
        return 0;

Regards
Aisheng

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/2] mm: cma: fix allocation may fail sometimes
  2022-05-04 15:52             ` Dong Aisheng
@ 2022-05-04 23:25               ` Minchan Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2022-05-04 23:25 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: David Hildenbrand, Dong Aisheng, linux-mm, linux-kernel,
	linux-arm-kernel, shawnguo, linux-imx, akpm, m.szyprowski,
	lecopzer.chen, vbabka, stable, shijie.qin

On Wed, May 04, 2022 at 11:52:30PM +0800, Dong Aisheng wrote:
> Hi Minchan & David,
> 
> On Sat, Mar 19, 2022 at 12:20 AM Minchan Kim <minchan@kernel.org> wrote:
> >
> > On Fri, Mar 18, 2022 at 11:43:41AM +0800, Dong Aisheng wrote:
> > > On Fri, Mar 18, 2022 at 1:12 AM Minchan Kim <minchan@kernel.org> wrote:
> > > >
> > > > On Thu, Mar 17, 2022 at 10:26:42PM +0800, Dong Aisheng wrote:
> > > > > On Thu, Mar 17, 2022 at 6:55 PM David Hildenbrand <david@redhat.com> wrote:
> > > > > >
> > > > > > On 15.03.22 15:45, Dong Aisheng wrote:
> > > > > > > When there're multiple process allocing dma memory in parallel
> > > > > >
> > > > > > s/allocing/allocating/
> > > > > >
> > > > > > > by calling dma_alloc_coherent(), it may fail sometimes as follows:
> > > > > > >
> > > > > > > Error log:
> > > > > > > cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
> > > > > > > cma: number of available pages:
> > > > > > > 3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
> > > > > > > 108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
> > > > > > > 7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
> > > > > > >
> > > > > > > When issue happened, we saw there were still 33161 pages (129M) free CMA
> > > > > > > memory and a lot available free slots for 148 pages in CMA bitmap that we
> > > > > > > want to allocate.
> > > >
> > > > Yes, I also have met the problem especially when the multiple threads
> > > > compete cma allocation. Thanks for bringing up the issue.
> > > >
> > > > > > >
> > > > > > > If dumping memory info, we found that there was also ~342M normal memory,
> > > > > > > but only 1352K CMA memory left in buddy system while a lot of pageblocks
> > > > > > > were isolated.
> > > > > >
> > > > > > s/If/When/
> > > > > >
> > > > >
> > > > > Will fix them all, thanks.
> > > > >
> > > > > > >
> > > > > > > Memory info log:
> > > > > > > Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
> > > > > > >           active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
> > > > > > >           unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
> > > > > > >           bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
> > > > > > > Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
> > > > > > >       36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
> > > > > > >       8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
> > > > > > >
> > > > > > > The root cause of this issue is that since commit a4efc174b382
> > > > > > > ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
> > > > > > > memory allocation. It's possible that the memory range process A trying
> > > > > > > to alloc has already been isolated by the allocation of process B during
> > > > > > > memory migration.
> > > > > > >
> > > > > > > The problem here is that the memory range isolated during one allocation
> > > > > > > by start_isolate_page_range() could be much bigger than the real size we
> > > > > > > want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
> > > > > > >
> > > > > > > Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES
> > > > > > > is big (e.g. 32M with max_order 14) and CMA memory is relatively small
> > > > > > > (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that
> > > > > > > all CMA memory may have already been isolated by other processes when
> > > > > > > one trying to allocate memory using dma_alloc_coherent().
> > > > > > > Since current CMA code will only scan one time of whole available CMA
> > > > > > > memory, then dma_alloc_coherent() may easy fail due to contention with
> > > > > > > other processes.
> > > > > > >
> > > > > > > This patch introduces a retry mechanism to rescan CMA bitmap for -EBUSY
> > > > > > > error in case the target memory range may has been temporarily isolated
> > > > > > > by others and released later.
> > > > > >
> > > > > > But you patch doesn't check for -EBUSY and instead might retry forever,
> > > > > > on any allocation error, no?
> > > > > >
> > > > >
> > > > > My patch seems not need check it because there's no chance to retry the loop
> > > > > in case an non -EBUS error happened earlier.
> > > > >
> > > > > for (;;) {
> > > > >         if (bitmap_no >= bitmap_maxno) {
> > > > >                 retry_the_whole_loop;
> > > > >         }
> > > > >
> > > > >         pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> > > > >         ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> > > > >                              GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> > > > >
> > > > >         if (ret != -EBUSY)
> > > > >                 break;
> > > > > }
> > > > >
> > > > > > I'd really suggest letting alloc_contig_range() return -EAGAIN in case
> > > > > > the isolation failed and handling -EAGAIN only in a special way instead.
> > > > > >
> > > > >
> > > > > Yes, i guess that's another improvement and is applicable.
> > > > >
> > > > > > In addition, we might want to stop once we looped to often I assume.
> > > > > >
> > > > >
> > > > > I wonder if really retried un-reasonably too often, we probably may
> > > > > need figure out
> > > > > what's going on inside alloc_contig_range() and fix it rather than
> > > > > return EBUSY error to
> > > > > users in case there're still a lot of avaiable memories.
> > > > > So currently i didn't add a maximum retry loop outside.
> > > > >
> > > > > Additionaly, for a small CMA system (128M with 32M max_order pages),
> > > > > the retry would
> > > > > be frequently when multiple process allocating memory, it also depends
> > > > > on system running
> > > > > state, so it's hard to define a reasonable and stable maxinum retry count.
> > > >
> > > > IMO, when the CMA see the -EAGAIN, it should put the task into
> > > > cma->wait_queue and then be woken up by other thread which finish
> > > > work of the cma. So it's similar with cma_mutex but we don't need to
> > > > synchronize for !EAGAIN cases and make the cma allocatoin fair.
> > >
> > > Okay, that's another approach which is completely different from the
> > > existing one.
> > > Instead of blocking on the CMA memory range which we want to allocate,
> > > the existing code will try the next available memory ranges.
> > > The question is whether we need to change this behavior?
> >
> > It could wait only if it scan the whole range but everyblock were
> > taken and then one more trial after woken up. If the cma_alloc return
> > -EAGAIN in the end, user could decide the policy.
> >
> > > It looks to me both ways have pros and cons.
> > >
> > > And for sleeping on -EAGAIN case, do we need an accurate wakeup?
> > > IOW only wakes the sleeper when the exact memory range is released
> > > which means we need create some more complicated code logic
> > > to track different CMA memory range usage.
> > > Otherwise, there will be possible false positive wakeups and the requester may
> > > quickly sleep again.
> >
> > I even didn't consider such complicated model since we would have
> > the race anyway.(Never tested but wanted to show the intention)
> >
> > diff --git a/mm/cma.c b/mm/cma.c
> > index bc9ca8f3c487..cccf684da587 100644
> > --- a/mm/cma.c
> > +++ b/mm/cma.c
> > @@ -449,6 +449,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> >         offset = cma_bitmap_aligned_offset(cma, align);
> >         bitmap_maxno = cma_bitmap_maxno(cma);
> >         bitmap_count = cma_bitmap_pages_to_bits(cma, count);
> > +       bool retried = false;
> >
> >         if (bitmap_count > bitmap_maxno)
> >                 goto out;
> > @@ -460,6 +461,12 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> >                                 offset);
> >                 if (bitmap_no >= bitmap_maxno) {
> >                         spin_unlock_irq(&cma->lock);
> > +                       if (ret == -EAGAIN && !retried) {
> > +                               wait_event_killable(cma->wq);
> 
> I spent a few days reading the code carefully and implementing your suggestions.
> But I think the problem was still when to wake up multiple callers
> properly who were sleeping
> on isolation contention error.  The issue I observed was that there's
> a synchronization problem
> between the sleeper and waker in current code logic due to:
> 1) we only need to wake up the caller when detecting users sleeping on
> the isolation contention.
> A flag needs to be introduced to check dynamically.
> 2) wakeup code may be finished before the flag was set which means the
> last sleeper
> may be missed to wake up. So a timeout mechanism needs to be introduced.
> 3) the code can't avoid the wait_for_completion() and complete() run in parallel
> without introducing extra complexity to do proper synchronization.
> (during my test, &cma->pending can be < 0 sometimes)
> 
> I wonder if it's worth introducing such complexity to support sleeping
> on isolation contention.
> Maybe the simple way was just as Andrew said that  let's revert that
> broken patch first.
> 
> Do you have any suggestions?

Thanks for the effort. On a second thought, I am also worry about the unfairness
on my suggestion since new comer could steal the pageblock while former was waiting
on the queue to be woken up. If it's complicated than the expectation, I don't think
we should block your workload so I agree reverting the original patch to resolve
regression first as Andrew suggested and then we need to think over better ideas.

Thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-04 23:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-15 14:45 [PATCH v3 0/2] mm: fix cma allocation fail sometimes Dong Aisheng
2022-03-15 14:45 ` [PATCH v3 1/2] mm: cma: fix allocation may " Dong Aisheng
2022-03-15 22:58   ` Andrew Morton
2022-03-16  3:41     ` Dong Aisheng
2022-03-16 21:09       ` Andrew Morton
2022-03-17  3:49         ` Dong Aisheng
2022-03-17 10:55   ` David Hildenbrand
2022-03-17 14:26     ` Dong Aisheng
2022-03-17 17:12       ` Minchan Kim
2022-03-18  3:43         ` Dong Aisheng
2022-03-18 16:20           ` Minchan Kim
2022-05-04 15:52             ` Dong Aisheng
2022-05-04 23:25               ` Minchan Kim
2022-03-15 14:45 ` [PATCH v3 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry Dong Aisheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).