linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jaxboe@fusionio.com>
To: Paul Bolle <pebolle@tiscali.nl>
Cc: "paulmck@linux.vnet.ibm.com" <paulmck@linux.vnet.ibm.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: Mysterious CFQ crash and RCU
Date: Sun, 5 Jun 2011 08:56:33 +0200	[thread overview]
Message-ID: <4DEB28A1.5090109@fusionio.com> (raw)
In-Reply-To: <1307227686.28359.23.camel@t41.thuisdomein>

On 2011-06-05 00:48, Paul Bolle wrote:
> I think I finally found it!
> 
> The culprit seems to be io_context.ioc_data (not the most clear of
> names!). It seems to be a single entry "last-hit cache" of an hlist
> called cic_list. (There are three, subtly different, cic_lists in the
> CFQ code!) It is not entirely clear, but that last-hit cache can get out
> of sync with the hlist it is supposed to cache. My guess it that every
> now and then a member of the hlist gets deleted while it's still in that
> (single entry) cache. If it then gets retrieved from that cache it
> already points to poisoned memory. For some strange reason this only
> results in an Oops if one or more debugging options are set (as are set
> in the Fedora Rawhide non-stable kernels that I ran into this). I have
> no clue whatsoever, why that is ...
> 
> Anyhow, after ripping out ioc_data this bug seems to have disappeared!
> Jens, Vivek, could you please have a look at this? In the mean time I
> hope to pinpoint this issue and draft a small patch to really solve it
> (ie, not by simply ripping out ioc_data).

Does this fix it? It will introduce a hierarchy that is queue -> ioc
lock, but as far as I can remember (and tell from a quick look), we
don't have any dependencies on that order of locking at this moment. So
should be OK.

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 3c7b537..fa7ef54 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2772,8 +2772,11 @@ static void __cfq_exit_single_io_context(struct cfq_data *cfqd,
 	smp_wmb();
 	cic->key = cfqd_dead_key(cfqd);
 
-	if (ioc->ioc_data == cic)
+	if (ioc->ioc_data == cic) {
+		spin_lock(&ioc->lock);
 		rcu_assign_pointer(ioc->ioc_data, NULL);
+		spin_unlock(&ioc->lock);
+	}
 
 	if (cic->cfqq[BLK_RW_ASYNC]) {
 		cfq_exit_cfqq(cfqd, cic->cfqq[BLK_RW_ASYNC]);

-- 
Jens Axboe


  parent reply	other threads:[~2011-06-05  6:56 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-19 22:24 Mysterious CFQ crash and RCU Vivek Goyal
2011-05-21 21:00 ` Paul E. McKenney
2011-05-21 22:23   ` Paul Bolle
2011-05-21 23:54     ` Paul E. McKenney
2011-05-22 19:30       ` Paul Bolle
2011-05-22 20:13         ` Paul E. McKenney
2011-05-23 15:21   ` Vivek Goyal
2011-05-23 15:38     ` Paul E. McKenney
2011-05-23 22:20       ` Paul Bolle
2011-05-24  4:14         ` Paul E. McKenney
2011-05-24  9:41         ` Jens Axboe
2011-05-24 14:35           ` Paul E. McKenney
2011-05-24 14:51             ` Jens Axboe
2011-05-24 15:42               ` Paul E. McKenney
2011-05-24 15:51                 ` Paul E. McKenney
2011-05-25  8:28           ` Paul Bolle
2011-05-25  8:46             ` Jens Axboe
2011-05-25  9:13               ` Paul Bolle
2011-05-25  9:30                 ` Jens Axboe
2011-05-25  9:40                   ` Paul Bolle
2011-05-25 12:48               ` Paul Bolle
2011-05-25 12:51                 ` Jens Axboe
2011-05-25 17:28               ` Paul Bolle
2011-05-25 18:59                 ` Jens Axboe
2011-05-25 10:17       ` Paul Bolle
2011-05-25 15:33         ` Paul E. McKenney
2011-05-25 17:44           ` Paul Bolle
2011-05-25 20:40             ` Paul E. McKenney
2011-05-26  9:15       ` Paul Bolle
2011-06-03  5:07         ` Paul E. McKenney
2011-06-03 13:45           ` Vivek Goyal
2011-06-03 15:33             ` Paul E. McKenney
2011-06-03 16:54               ` Paul E. McKenney
2011-06-04 12:22             ` Paul Bolle
2011-06-04 12:50           ` Paul Bolle
2011-06-04 16:03             ` Paul E. McKenney
2011-06-04 22:48               ` Paul Bolle
2011-06-04 23:06                 ` Paul E. McKenney
2011-08-04 15:05                   ` Vivek Goyal
2011-08-04 19:43                     ` Jens Axboe
2011-08-04 19:51                       ` Vivek Goyal
2011-06-05  6:56                 ` Jens Axboe [this message]
2011-06-05  8:39                   ` Paul Bolle
2011-06-05 10:38                   ` Paul Bolle
2011-06-05 22:51                     ` Jens Axboe
2011-06-06 14:28                   ` Vivek Goyal
2011-05-23 15:36   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DEB28A1.5090109@fusionio.com \
    --to=jaxboe@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pebolle@tiscali.nl \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).