All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, Nick Piggin <npiggin@suse.de>,
	Jiri Kosina <jkosina@suse.cz>, Yinghai Lu <yhlu.kernel@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: [PATCH percpu#for-linus] percpu: fix possible deadlock via irq lock inversion
Date: Mon, 09 Nov 2009 14:46:52 +0900	[thread overview]
Message-ID: <4AF7ACCC.2050208@kernel.org> (raw)
In-Reply-To: <1257610412.4108.3.camel@laptop>

Lockdep caught the following irq lock inversion which can lead to
deadlock.

 =========================================================
 [ INFO: possible irq lock inversion dependency detected ]
 2.6.32-rc5-tip-04815-g12f0f93-dirty #745
 ---------------------------------------------------------
 hub 1-3:1.0: state 7 ports 2 chg 0000 evt 0004
 ksoftirqd/65/199 just changed the state of lock:
  (pcpu_lock){..-...}, at: [<ffffffff81130e04>] free_percpu+0x38/0x104
 but this lock took another, SOFTIRQ-unsafe lock in the past:
  (vmap_area_lock){+.+...}

 and interrupts could create inverse lock ordering between them.

This happens because pcpu_lock is allowed to be acquired from irq
context for free_percpu() path and alloc_percpu() path may call
vfree() with pcpu_lock held.  As vmap_area_lock isn't irq safe, if an
IRQ occurs while vmap_area_lock is held and the irq handler calls
free_percpu(), locking order inversion occurs.

As the nesting only occurs in the alloc path which isn't allowed to be
called from irq context, A->B->A deadlock won't occur but A->B B->A
deadlock is still possible.

The only place where vmap_area_lock is nested under pcpu_lock is while
extending area_map to free old map.  Break the locking order by
temporarily releasing pcpu_lock when freeing old map.  This is safe to
do as allocation path is protected with outer pcpu_alloc_mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
---
If nobody objects, I'll push it to Linus tomorrow.  Thanks.

 mm/percpu.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index d907971..30cd343 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -372,7 +372,7 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
 static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 {
 	int new_alloc;
-	int *new;
+	int *new, *old = NULL;
 	size_t size;

 	/* has enough? */
@@ -407,10 +407,23 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, unsigned long *flags)
 	 * one of the first chunks and still using static map.
 	 */
 	if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
-		pcpu_mem_free(chunk->map, size);
+		old = chunk->map;

 	chunk->map_alloc = new_alloc;
 	chunk->map = new;
+
+	/*
+	 * pcpu_mem_free() might end up calling vfree() which uses
+	 * IRQ-unsafe lock and thus can't be called with pcpu_lock
+	 * held.  Release and reacquire pcpu_lock if old map needs to
+	 * be freed.
+	 */
+	if (old) {
+		spin_unlock_irqrestore(&pcpu_lock, *flags);
+		pcpu_mem_free(old, size);
+		spin_lock_irqsave(&pcpu_lock, *flags);
+	}
+
 	return 0;
 }


  reply	other threads:[~2009-11-09  5:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <86802c440911041008q4969b9bdk15b4598c40bb84bd@mail.gmail.com>
     [not found] ` <4AF25FC7.4000502@kernel.org>
     [not found]   ` <20091105082102.GA2870@elte.hu>
     [not found]     ` <4AF28D7A.6020209@kernel.org>
2009-11-05 14:31       ` irq lock inversion Jiri Kosina
2009-11-06  5:53         ` Tejun Heo
2009-11-06  7:17           ` Ingo Molnar
2009-11-06  7:45             ` Tejun Heo
2009-11-06  7:58               ` Ingo Molnar
2009-11-06  8:24                 ` Tejun Heo
2009-11-06  8:40                   ` Ingo Molnar
2009-11-06  8:52                     ` Tejun Heo
2009-11-06 16:08                       ` Christoph Lameter
2009-11-06 16:38                         ` Tejun Heo
2009-11-06 17:03                           ` Christoph Lameter
2009-11-07 16:13                             ` Peter Zijlstra
2009-11-09  5:46                               ` Tejun Heo [this message]
2009-11-06  9:59             ` Jens Axboe
2009-11-08  9:38               ` Ingo Molnar
2009-11-09 15:34                 ` Jens Axboe
2009-11-09 15:45                   ` Ingo Molnar
2009-11-09 15:49                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF7ACCC.2050208@kernel.org \
    --to=tj@kernel.org \
    --cc=cl@linux-foundation.org \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.