From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750902Ab2EDEI4 (ORCPT ); Fri, 4 May 2012 00:08:56 -0400 Received: from mail-gh0-f174.google.com ([209.85.160.174]:40241 "EHLO mail-gy0-f174.google.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750738Ab2EDEIz convert rfc822-to-8bit (ORCPT ); Fri, 4 May 2012 00:08:55 -0400 MIME-Version: 1.0 In-Reply-To: <20120503170101.GF2592@linux.vnet.ibm.com> References: <20120503154140.GA2592@linux.vnet.ibm.com> <20120503170101.GF2592@linux.vnet.ibm.com> From: Sasha Levin Date: Fri, 4 May 2012 06:08:34 +0200 Message-ID: Subject: Re: rcu: BUG on exit_group To: paulmck@linux.vnet.ibm.com Cc: "linux-kernel@vger.kernel.org List" , Dave Jones , yinghan@google.com, kosaki.motohiro@jp.fujitsu.com, Andrew Morton Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 3, 2012 at 7:01 PM, Paul E. McKenney wrote: > On Thu, May 03, 2012 at 05:55:14PM +0200, Sasha Levin wrote: >> On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney >> wrote: >> > On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote: >> >> Hi Paul, >> >> >> >> I've hit a BUG similar to the schedule_tail() one when. It happened >> >> when I've started fuzzing exit_group() syscalls, and all of the traces >> >> are starting with exit_group() (there's a flood of them). >> >> >> >> I've verified that it indeed BUGs due to the rcu preempt count. >> > >> > Hello, Sasha, >> > >> > Which version of -next are you using?  I did some surgery on this >> > yesterday based on some bugs Hugh Dickins tracked down, so if you >> > are using something older, please move to the current -next. >> >> I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty). > > Hmmm...  Looking at this more closely, it looks like there really is > an attempt to acquire a mutex within an RCU read-side critical section, > which is illegal.  Could you please bisect this? Right, the issue is as you described, taking a mutex inside rcu_read_lock(). The offending commit is (I've cc'ed all parties from it): commit adf79cc03092ee4aec70da10e91b05fb8116ac7b Author: Ying Han Date: Thu May 3 15:44:01 2012 +1000 memcg: add mlock statistic in memory.stat With the issue there being is that in munlock_vma_page(), it now does a mem_cgroup_begin_update_page_stat() which takes the rcu_read_lock(), so when the older code that was there previously will try taking a mutex you'll get a BUG. Thanks.