Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Qian Cai <cai@lca.pw>
To: akpm@linux-foundation.org
Cc: bigeasy@linutronix.de, tglx@linutronix.de, thgarnie@google.com,
	peterz@infradead.org, tytso@mit.edu, cl@linux.com,
	penberg@kernel.org, rientjes@google.com, mingo@redhat.com,
	will@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, keescook@chromium.org,
	Qian Cai <cai@lca.pw>
Subject: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()
Date: Fri, 13 Sep 2019 12:27:44 -0400
Message-ID: <1568392064-3052-1-git-send-email-cai@lca.pw> (raw)

The commit b7d5dc21072c ("random: add a spinlock_t to struct
batched_entropy") insists on acquiring "batched_entropy_u32.lock" in
get_random_u32() which introduced the lock chain,

"&rq->lock --> batched_entropy_u32.lock"

even after crng init. As the result, it could result in deadlock below.
Fix it by using get_random_bytes() in shuffle_freelist() which does not
need to take on the batched_entropy locks.

WARNING: possible circular locking dependency detected
5.3.0-rc7-mm1+ #3 Tainted: G             L
------------------------------------------------------
make/7937 is trying to acquire lock:
ffff900012f225f8 (random_write_wait.lock){....}, at:
__wake_up_common_lock+0xa8/0x11c

but task is already holding lock:
ffff0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
get_random_u32+0x6c/0x1dc

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (batched_entropy_u32.lock){-.-.}:
       lock_acquire+0x31c/0x360
       _raw_spin_lock_irqsave+0x7c/0x9c
       get_random_u32+0x6c/0x1dc
       new_slab+0x234/0x6c0
       ___slab_alloc+0x3c8/0x650
       kmem_cache_alloc+0x4b0/0x590
       __debug_object_init+0x778/0x8b4
       debug_object_init+0x40/0x50
       debug_init+0x30/0x29c
       hrtimer_init+0x30/0x50
       init_dl_task_timer+0x24/0x44
       __sched_fork+0xc0/0x168
       init_idle+0x78/0x26c
       fork_idle+0x12c/0x178
       idle_threads_init+0x108/0x178
       smp_init+0x20/0x1bc
       kernel_init_freeable+0x198/0x26c
       kernel_init+0x18/0x334
       ret_from_fork+0x10/0x18

-> #2 (&rq->lock){-.-.}:
       lock_acquire+0x31c/0x360
       _raw_spin_lock+0x64/0x80
       task_fork_fair+0x5c/0x1b0
       sched_fork+0x15c/0x2dc
       copy_process+0x9e0/0x244c
       _do_fork+0xb8/0x644
       kernel_thread+0xc4/0xf4
       rest_init+0x30/0x238
       arch_call_rest_init+0x10/0x18
       start_kernel+0x424/0x52c

-> #1 (&p->pi_lock){-.-.}:
       lock_acquire+0x31c/0x360
       _raw_spin_lock_irqsave+0x7c/0x9c
       try_to_wake_up+0x74/0x8d0
       default_wake_function+0x38/0x48
       pollwake+0x118/0x158
       __wake_up_common+0x130/0x1c4
       __wake_up_common_lock+0xc8/0x11c
       __wake_up+0x3c/0x4c
       account+0x390/0x3e0
       extract_entropy+0x2cc/0x37c
       _xfer_secondary_pool+0x35c/0x3c4
       push_to_pool+0x54/0x308
       process_one_work+0x4f4/0x950
       worker_thread+0x390/0x4bc
       kthread+0x1cc/0x1e8
       ret_from_fork+0x10/0x18

-> #0 (random_write_wait.lock){....}:
       validate_chain+0xd10/0x2bcc
       __lock_acquire+0x7f4/0xb8c
       lock_acquire+0x31c/0x360
       _raw_spin_lock_irqsave+0x7c/0x9c
       __wake_up_common_lock+0xa8/0x11c
       __wake_up+0x3c/0x4c
       account+0x390/0x3e0
       extract_entropy+0x2cc/0x37c
       crng_reseed+0x60/0x2f8
       _extract_crng+0xd8/0x164
       crng_reseed+0x7c/0x2f8
       _extract_crng+0xd8/0x164
       get_random_u32+0xec/0x1dc
       new_slab+0x234/0x6c0
       ___slab_alloc+0x3c8/0x650
       kmem_cache_alloc+0x4b0/0x590
       getname_flags+0x44/0x1c8
       user_path_at_empty+0x3c/0x68
       vfs_statx+0xa4/0x134
       __arm64_sys_newfstatat+0x94/0x120
       el0_svc_handler+0x170/0x240
       el0_svc+0x8/0xc

other info that might help us debug this:

Chain exists of:
  random_write_wait.lock --> &rq->lock --> batched_entropy_u32.lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(batched_entropy_u32.lock);
                               lock(&rq->lock);
                               lock(batched_entropy_u32.lock);
  lock(random_write_wait.lock);

 *** DEADLOCK ***

1 lock held by make/7937:
 #0: ffff0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
get_random_u32+0x6c/0x1dc

stack backtrace:
CPU: 220 PID: 7937 Comm: make Tainted: G             L    5.3.0-rc7-mm1+
Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS
L50_5.13_1.11 06/18/2019
Call trace:
 dump_backtrace+0x0/0x248
 show_stack+0x20/0x2c
 dump_stack+0xd0/0x140
 print_circular_bug+0x368/0x380
 check_noncircular+0x248/0x250
 validate_chain+0xd10/0x2bcc
 __lock_acquire+0x7f4/0xb8c
 lock_acquire+0x31c/0x360
 _raw_spin_lock_irqsave+0x7c/0x9c
 __wake_up_common_lock+0xa8/0x11c
 __wake_up+0x3c/0x4c
 account+0x390/0x3e0
 extract_entropy+0x2cc/0x37c
 crng_reseed+0x60/0x2f8
 _extract_crng+0xd8/0x164
 crng_reseed+0x7c/0x2f8
 _extract_crng+0xd8/0x164
 get_random_u32+0xec/0x1dc
 new_slab+0x234/0x6c0
 ___slab_alloc+0x3c8/0x650
 kmem_cache_alloc+0x4b0/0x590
 getname_flags+0x44/0x1c8
 user_path_at_empty+0x3c/0x68
 vfs_statx+0xa4/0x134
 __arm64_sys_newfstatat+0x94/0x120
 el0_svc_handler+0x170/0x240
 el0_svc+0x8/0xc

Signed-off-by: Qian Cai <cai@lca.pw>
---
 mm/slub.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..96cdd36f9380 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1598,8 +1598,15 @@ static bool shuffle_freelist(struct kmem_cache *s, struct page *page)
 	if (page->objects < 2 || !s->random_seq)
 		return false;
 
+	/*
+	 * Don't use get_random_int() here as it might deadlock due to
+	 * "&rq->lock --> batched_entropy_u32.lock" chain.
+	 */
+	if (!arch_get_random_int((int *)&pos))
+		get_random_bytes(&pos, sizeof(int));
+
 	freelist_count = oo_objects(s->oo);
-	pos = get_random_int() % freelist_count;
+	pos %= freelist_count;
 
 	page_limit = page->objects * s->size;
 	start = fixup_red_left(s, page_address(page));
-- 
1.8.3.1



             reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13 16:27 Qian Cai [this message]
2019-09-16  9:03 ` Sebastian Andrzej Siewior
2019-09-16 14:01   ` Qian Cai
2019-09-16 19:51     ` Sebastian Andrzej Siewior
2019-09-16 21:31       ` Qian Cai
2019-09-17  7:16         ` Sebastian Andrzej Siewior
2019-09-18 19:59           ` Qian Cai

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1568392064-3052-1-git-send-email-cai@lca.pw \
    --to=cai@lca.pw \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=cl@linux.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=thgarnie@google.com \
    --cc=tytso@mit.edu \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox