stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dong Aisheng <aisheng.dong@nxp.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, dongas86@gmail.com,
	jason.hui.liu@nxp.com, leoyang.li@nxp.com, abel.vesa@nxp.com,
	shawnguo@kernel.org, linux-imx@nxp.com,
	akpm@linux-foundation.org, m.szyprowski@samsung.com,
	lecopzer.chen@mediatek.com, david@redhat.com, vbabka@suse.cz,
	stable@vger.kernel.org, shijie.qin@nxp.com,
	Dong Aisheng <aisheng.dong@nxp.com>
Subject: [PATCH v2 0/2] mm: fix cma allocation fail sometimes
Date: Wed, 12 Jan 2022 21:15:50 +0800	[thread overview]
Message-ID: <20220112131552.3329380-1-aisheng.dong@nxp.com> (raw)

We observed an issue with NXP 5.15 LTS kernel that dma_alloc_coherent()
may fail sometimes when there're multiple processes trying to allocate
CMA memory.

This issue can be very easily reproduced on MX6Q SDB board with latest
linux-next kernel by writing a test module creating 16 or 32 threads
allocating random size of CMA memory in parallel at the background.
Or simply enabling CONFIG_CMA_DEBUG, you can see endless of CMA alloc
retries during booting:
[    1.452124] cma: cma_alloc(): memory range at (ptrval) is busy,retrying
....
(thousands of reties)
NOTE: MX6 has CONFIG_FORCE_MAX_ZONEORDER=14 which means MAX_ORDER is
13 (32M).

The root cause of this issue is that since commit a4efc174b382
("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent
memory allocation.
It's possible that the pageblock process A try to alloc has already
been isolated by the allocation of process B during memory migration.

When there're multi process allocating CMA memory in parallel, it's
likely that other the remain pageblocks may have also been isolated,
then CMA alloc fail finally during the first round of scanning of the
whole available CMA bitmap.

This patchset introduces a retry mechanism to rescan CMA bitmap for -EBUSY
error in case the target pageblock may has been temporarily isolated
by others and released later.
It also improves the CMA allocation performance by trying the next
pageblock during reties rather than looping in the same pageblock
which is in -EBUSY state.

Theoretically, this issue can be easily reproduced on ARMv7 platforms
with big MAX_ORDER/pageblock 
e.g. 1G RAM(320M reserved CMA) and 32M pageblock ARM platform:
Page block order: 13
Pages per block:  8192

The following test is based on linux-next: next-20211213.

Without the fix, it's easily fail.
# insmod cma_alloc.ko pnum=16
[  274.322369] CMA alloc test enter: thread number: 16
[  274.329948] cpu: 0, pid: 692, index 4 pages 144
[  274.330143] cpu: 1, pid: 694, index 2 pages 44
[  274.330359] cpu: 2, pid: 695, index 7 pages 757
[  274.330760] cpu: 2, pid: 696, index 4 pages 144
[  274.330974] cpu: 2, pid: 697, index 6 pages 512
[  274.331223] cpu: 2, pid: 698, index 6 pages 512
[  274.331499] cpu: 2, pid: 699, index 2 pages 44
[  274.332228] cpu: 2, pid: 700, index 0 pages 7
[  274.337421] cpu: 0, pid: 701, index 1 pages 38
[  274.337618] cpu: 2, pid: 702, index 0 pages 7
[  274.344669] cpu: 1, pid: 703, index 0 pages 7
[  274.344807] cpu: 3, pid: 704, index 6 pages 512
[  274.348269] cpu: 2, pid: 705, index 5 pages 148
[  274.349490] cma: cma_alloc: reserved: alloc failed, req-size: 38 pages, ret: -16
[  274.366292] cpu: 1, pid: 706, index 4 pages 144
[  274.366562] cpu: 0, pid: 707, index 3 pages 128
[  274.367356] cma: cma_alloc: reserved: alloc failed, req-size: 128 pages, ret: -16
[  274.367370] cpu: 0, pid: 707, index 3 pages 128 failed
[  274.371148] cma: cma_alloc: reserved: alloc failed, req-size: 148 pages, ret: -16
[  274.375348] cma: cma_alloc: reserved: alloc failed, req-size: 144 pages, ret: -16
[  274.384256] cpu: 2, pid: 708, index 0 pages 7
....

With the fix, 32 threads allocating in parallel can pass overnight
stress test.

root@imx6qpdlsolox:~# insmod cma_alloc.ko pnum=32
[  112.976809] cma_alloc: loading out-of-tree module taints kernel.
[  112.984128] CMA alloc test enter: thread number: 32
[  112.989748] cpu: 2, pid: 707, index 6 pages 512
[  112.994342] cpu: 1, pid: 708, index 6 pages 512
[  112.995162] cpu: 0, pid: 709, index 3 pages 128
[  112.995867] cpu: 2, pid: 710, index 0 pages 7
[  112.995910] cpu: 3, pid: 711, index 2 pages 44
[  112.996005] cpu: 3, pid: 712, index 7 pages 757
[  112.996098] cpu: 3, pid: 713, index 7 pages 757
...
[41877.368163] cpu: 1, pid: 737, index 2 pages 44
[41877.369388] cpu: 1, pid: 736, index 3 pages 128
[41878.486516] cpu: 0, pid: 737, index 2 pages 44
[41878.486515] cpu: 2, pid: 739, index 4 pages 144
[41878.486622] cpu: 1, pid: 736, index 3 pages 128
[41878.486948] cpu: 2, pid: 735, index 7 pages 757
[41878.487279] cpu: 2, pid: 738, index 4 pages 144
[41879.526603] cpu: 1, pid: 739, index 3 pages 128
[41879.606491] cpu: 2, pid: 737, index 3 pages 128
[41879.606550] cpu: 0, pid: 736, index 0 pages 7
[41879.612271] cpu: 2, pid: 738, index 4 pages 144
...

v1:
https://patchwork.kernel.org/project/linux-mm/cover/20211215080242.3034856-1-aisheng.dong@nxp.com/

Dong Aisheng (2):
  mm: cma: fix allocation may fail sometimes
  mm: cma: try next MAX_ORDER_NR_PAGES during retry

 mm/cma.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

-- 
2.25.1


             reply	other threads:[~2022-01-12 13:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-12 13:15 Dong Aisheng [this message]
2022-01-12 13:15 ` [PATCH v2 1/2] mm: cma: fix allocation may fail sometimes Dong Aisheng
2022-01-12 13:15 ` [PATCH v2 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry Dong Aisheng
2022-01-25 16:33   ` David Hildenbrand
2022-01-28 12:20     ` Dong Aisheng
2022-01-24 14:34 ` [PATCH v2 0/2] mm: fix cma allocation fail sometimes Dong Aisheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220112131552.3329380-1-aisheng.dong@nxp.com \
    --to=aisheng.dong@nxp.com \
    --cc=abel.vesa@nxp.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=dongas86@gmail.com \
    --cc=jason.hui.liu@nxp.com \
    --cc=lecopzer.chen@mediatek.com \
    --cc=leoyang.li@nxp.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=m.szyprowski@samsung.com \
    --cc=shawnguo@kernel.org \
    --cc=shijie.qin@nxp.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).