linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paul Bolle <pebolle@tiscali.nl>
Cc: Jens Axboe <jaxboe@fusionio.com>, Vivek Goyal <vgoyal@redhat.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: Mysterious CFQ crash and RCU
Date: Sat, 4 Jun 2011 16:06:33 -0700	[thread overview]
Message-ID: <20110604230633.GD6093@linux.vnet.ibm.com> (raw)
In-Reply-To: <1307227686.28359.23.camel@t41.thuisdomein>

On Sun, Jun 05, 2011 at 12:48:05AM +0200, Paul Bolle wrote:
> On Sat, 2011-06-04 at 09:03 -0700, Paul E. McKenney wrote:
> > More like "based on these diagnostics, I see no evidence of the RCU
> > implementation misbehaving."  Which is of course different than "I can
> > prove that the RCU implementation is not misbehaving".  That said, the
> > fact that you are running on a single CPU makes it hard for me to see
> > any latitude for RCU-implementation misbehavior.
> > 
> > Clearly something is wrong somewhere.
> 
> Yes.
> 
> > Given the fact that on a single-CPU
> > system, synchronize_rcu() is a no-op, and given that you weren't able
> > to reproduce with CONFIG_TREE_PREEMPT_RCU=y, my guess is that there is
> > a synchronize_rcu() that occasionally (illegally) gets executed within
> > an RCU read-side critical section.
> 
> I think I finally found it!

Very cool!!!

> The culprit seems to be io_context.ioc_data (not the most clear of
> names!). It seems to be a single entry "last-hit cache" of an hlist
> called cic_list. (There are three, subtly different, cic_lists in the
> CFQ code!) It is not entirely clear, but that last-hit cache can get out
> of sync with the hlist it is supposed to cache. My guess it that every
> now and then a member of the hlist gets deleted while it's still in that
> (single entry) cache. If it then gets retrieved from that cache it
> already points to poisoned memory. For some strange reason this only
> results in an Oops if one or more debugging options are set (as are set
> in the Fedora Rawhide non-stable kernels that I ran into this). I have
> no clue whatsoever, why that is ...

Oooh...  If I understand you correctly, this is exactly analogous to my
first-ever RCU bug way back in the early 90s.  This was a hashed version
of a BSD-style routing table, where each hash bucket maintained a pointer
to the last routing entry used.  When I deleted a routing entry, I forgot
to check the last-entry pointer.

The amazing thing is that the guy who found and fixed this bug generated
a correct fix despite having absolutely no idea what RCU was or how it
worked.  All he knew was that if he waited for the sun to reach Oregon,
his users were going to be way beyond upset.  ;-)

> Anyhow, after ripping out ioc_data this bug seems to have disappeared!
> Jens, Vivek, could you please have a look at this? In the mean time I
> hope to pinpoint this issue and draft a small patch to really solve it
> (ie, not by simply ripping out ioc_data).

Again, very cool!!!

							Thanx, Paul

  reply	other threads:[~2011-06-04 23:06 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-19 22:24 Mysterious CFQ crash and RCU Vivek Goyal
2011-05-21 21:00 ` Paul E. McKenney
2011-05-21 22:23   ` Paul Bolle
2011-05-21 23:54     ` Paul E. McKenney
2011-05-22 19:30       ` Paul Bolle
2011-05-22 20:13         ` Paul E. McKenney
2011-05-23 15:21   ` Vivek Goyal
2011-05-23 15:38     ` Paul E. McKenney
2011-05-23 22:20       ` Paul Bolle
2011-05-24  4:14         ` Paul E. McKenney
2011-05-24  9:41         ` Jens Axboe
2011-05-24 14:35           ` Paul E. McKenney
2011-05-24 14:51             ` Jens Axboe
2011-05-24 15:42               ` Paul E. McKenney
2011-05-24 15:51                 ` Paul E. McKenney
2011-05-25  8:28           ` Paul Bolle
2011-05-25  8:46             ` Jens Axboe
2011-05-25  9:13               ` Paul Bolle
2011-05-25  9:30                 ` Jens Axboe
2011-05-25  9:40                   ` Paul Bolle
2011-05-25 12:48               ` Paul Bolle
2011-05-25 12:51                 ` Jens Axboe
2011-05-25 17:28               ` Paul Bolle
2011-05-25 18:59                 ` Jens Axboe
2011-05-25 10:17       ` Paul Bolle
2011-05-25 15:33         ` Paul E. McKenney
2011-05-25 17:44           ` Paul Bolle
2011-05-25 20:40             ` Paul E. McKenney
2011-05-26  9:15       ` Paul Bolle
2011-06-03  5:07         ` Paul E. McKenney
2011-06-03 13:45           ` Vivek Goyal
2011-06-03 15:33             ` Paul E. McKenney
2011-06-03 16:54               ` Paul E. McKenney
2011-06-04 12:22             ` Paul Bolle
2011-06-04 12:50           ` Paul Bolle
2011-06-04 16:03             ` Paul E. McKenney
2011-06-04 22:48               ` Paul Bolle
2011-06-04 23:06                 ` Paul E. McKenney [this message]
2011-08-04 15:05                   ` Vivek Goyal
2011-08-04 19:43                     ` Jens Axboe
2011-08-04 19:51                       ` Vivek Goyal
2011-06-05  6:56                 ` Jens Axboe
2011-06-05  8:39                   ` Paul Bolle
2011-06-05 10:38                   ` Paul Bolle
2011-06-05 22:51                     ` Jens Axboe
2011-06-06 14:28                   ` Vivek Goyal
2011-05-23 15:36   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110604230633.GD6093@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pebolle@tiscali.nl \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).