linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rcu: BUG on exit_group
@ 2012-05-03  8:57 Sasha Levin
  2012-05-03 15:41 ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Levin @ 2012-05-03  8:57 UTC (permalink / raw)
  To: Paul McKenney; +Cc: linux-kernel@vger.kernel.org List

Hi Paul,

I've hit a BUG similar to the schedule_tail() one when. It happened
when I've started fuzzing exit_group() syscalls, and all of the traces
are starting with exit_group() (there's a flood of them).

I've verified that it indeed BUGs due to the rcu preempt count.

Here's one of the BUG()s:

[   83.820976] BUG: sleeping function called from invalid context at
kernel/mutex.c:269
[   83.827870] in_atomic(): 0, irqs_disabled(): 0, pid: 4506, name: trinity
[   83.832154] 1 lock held by trinity/4506:
[   83.834224]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff811a7d87>]
munlock_vma_page+0x197/0x200
[   83.839310] Pid: 4506, comm: trinity Tainted: G        W
3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty #108
[   83.849418] Call Trace:
[   83.851182]  [<ffffffff810e7218>] __might_sleep+0x1f8/0x210
[   83.854076]  [<ffffffff82d9540a>] mutex_lock_nested+0x2a/0x50
[   83.857120]  [<ffffffff811b0830>] try_to_unmap_file+0x40/0x2f0
[   83.860242]  [<ffffffff82d984bb>] ? _raw_spin_unlock_irq+0x2b/0x80
[   83.863423]  [<ffffffff810e7ffe>] ? sub_preempt_count+0xae/0xf0
[   83.866347]  [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
[   83.869570]  [<ffffffff811b0caa>] try_to_munlock+0x6a/0x80
[   83.872667]  [<ffffffff811a7cc6>] munlock_vma_page+0xd6/0x200
[   83.875646]  [<ffffffff811a7d87>] ? munlock_vma_page+0x197/0x200
[   83.878798]  [<ffffffff811a7e7f>] munlock_vma_pages_range+0x8f/0xd0
[   83.882235]  [<ffffffff811a8b8a>] exit_mmap+0x5a/0x160
[   83.884880]  [<ffffffff810ba23b>] ? exit_mm+0x10b/0x130
[   83.887508]  [<ffffffff8111d8ea>] ? __lock_release+0x1ba/0x1d0
[   83.890399]  [<ffffffff810b4fe1>] mmput+0x81/0xe0
[   83.892966]  [<ffffffff810ba24b>] exit_mm+0x11b/0x130
[   83.895640]  [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
[   83.898943]  [<ffffffff810bca53>] do_exit+0x263/0x460
[   83.901700]  [<ffffffff810bccf1>] do_group_exit+0xa1/0xe0
[   83.907366]  [<ffffffff810bcd42>] sys_exit_group+0x12/0x20
[   83.912450]  [<ffffffff82d993b9>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-03  8:57 rcu: BUG on exit_group Sasha Levin
@ 2012-05-03 15:41 ` Paul E. McKenney
  2012-05-03 15:55   ` Sasha Levin
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2012-05-03 15:41 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel@vger.kernel.org List

On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
> Hi Paul,
> 
> I've hit a BUG similar to the schedule_tail() one when. It happened
> when I've started fuzzing exit_group() syscalls, and all of the traces
> are starting with exit_group() (there's a flood of them).
> 
> I've verified that it indeed BUGs due to the rcu preempt count.

Hello, Sasha,

Which version of -next are you using?  I did some surgery on this
yesterday based on some bugs Hugh Dickins tracked down, so if you
are using something older, please move to the current -next.

						Thanx, Paul

> Here's one of the BUG()s:
> 
> [   83.820976] BUG: sleeping function called from invalid context at
> kernel/mutex.c:269
> [   83.827870] in_atomic(): 0, irqs_disabled(): 0, pid: 4506, name: trinity
> [   83.832154] 1 lock held by trinity/4506:
> [   83.834224]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff811a7d87>]
> munlock_vma_page+0x197/0x200
> [   83.839310] Pid: 4506, comm: trinity Tainted: G        W
> 3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty #108
> [   83.849418] Call Trace:
> [   83.851182]  [<ffffffff810e7218>] __might_sleep+0x1f8/0x210
> [   83.854076]  [<ffffffff82d9540a>] mutex_lock_nested+0x2a/0x50
> [   83.857120]  [<ffffffff811b0830>] try_to_unmap_file+0x40/0x2f0
> [   83.860242]  [<ffffffff82d984bb>] ? _raw_spin_unlock_irq+0x2b/0x80
> [   83.863423]  [<ffffffff810e7ffe>] ? sub_preempt_count+0xae/0xf0
> [   83.866347]  [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
> [   83.869570]  [<ffffffff811b0caa>] try_to_munlock+0x6a/0x80
> [   83.872667]  [<ffffffff811a7cc6>] munlock_vma_page+0xd6/0x200
> [   83.875646]  [<ffffffff811a7d87>] ? munlock_vma_page+0x197/0x200
> [   83.878798]  [<ffffffff811a7e7f>] munlock_vma_pages_range+0x8f/0xd0
> [   83.882235]  [<ffffffff811a8b8a>] exit_mmap+0x5a/0x160
> [   83.884880]  [<ffffffff810ba23b>] ? exit_mm+0x10b/0x130
> [   83.887508]  [<ffffffff8111d8ea>] ? __lock_release+0x1ba/0x1d0
> [   83.890399]  [<ffffffff810b4fe1>] mmput+0x81/0xe0
> [   83.892966]  [<ffffffff810ba24b>] exit_mm+0x11b/0x130
> [   83.895640]  [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
> [   83.898943]  [<ffffffff810bca53>] do_exit+0x263/0x460
> [   83.901700]  [<ffffffff810bccf1>] do_group_exit+0xa1/0xe0
> [   83.907366]  [<ffffffff810bcd42>] sys_exit_group+0x12/0x20
> [   83.912450]  [<ffffffff82d993b9>] system_call_fastpath+0x16/0x1b
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-03 15:41 ` Paul E. McKenney
@ 2012-05-03 15:55   ` Sasha Levin
  2012-05-03 17:01     ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Levin @ 2012-05-03 15:55 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel@vger.kernel.org List

On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
>> Hi Paul,
>>
>> I've hit a BUG similar to the schedule_tail() one when. It happened
>> when I've started fuzzing exit_group() syscalls, and all of the traces
>> are starting with exit_group() (there's a flood of them).
>>
>> I've verified that it indeed BUGs due to the rcu preempt count.
>
> Hello, Sasha,
>
> Which version of -next are you using?  I did some surgery on this
> yesterday based on some bugs Hugh Dickins tracked down, so if you
> are using something older, please move to the current -next.

I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-03 15:55   ` Sasha Levin
@ 2012-05-03 17:01     ` Paul E. McKenney
  2012-05-04  4:08       ` Sasha Levin
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2012-05-03 17:01 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel@vger.kernel.org List

On Thu, May 03, 2012 at 05:55:14PM +0200, Sasha Levin wrote:
> On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
> >> Hi Paul,
> >>
> >> I've hit a BUG similar to the schedule_tail() one when. It happened
> >> when I've started fuzzing exit_group() syscalls, and all of the traces
> >> are starting with exit_group() (there's a flood of them).
> >>
> >> I've verified that it indeed BUGs due to the rcu preempt count.
> >
> > Hello, Sasha,
> >
> > Which version of -next are you using?  I did some surgery on this
> > yesterday based on some bugs Hugh Dickins tracked down, so if you
> > are using something older, please move to the current -next.
> 
> I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty).

Hmmm...  Looking at this more closely, it looks like there really is
an attempt to acquire a mutex within an RCU read-side critical section,
which is illegal.  Could you please bisect this?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-03 17:01     ` Paul E. McKenney
@ 2012-05-04  4:08       ` Sasha Levin
  2012-05-04  5:33         ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Sasha Levin @ 2012-05-04  4:08 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel@vger.kernel.org List, Dave Jones, yinghan,
	kosaki.motohiro, Andrew Morton

On Thu, May 3, 2012 at 7:01 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, May 03, 2012 at 05:55:14PM +0200, Sasha Levin wrote:
>> On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
>> >> Hi Paul,
>> >>
>> >> I've hit a BUG similar to the schedule_tail() one when. It happened
>> >> when I've started fuzzing exit_group() syscalls, and all of the traces
>> >> are starting with exit_group() (there's a flood of them).
>> >>
>> >> I've verified that it indeed BUGs due to the rcu preempt count.
>> >
>> > Hello, Sasha,
>> >
>> > Which version of -next are you using?  I did some surgery on this
>> > yesterday based on some bugs Hugh Dickins tracked down, so if you
>> > are using something older, please move to the current -next.
>>
>> I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty).
>
> Hmmm...  Looking at this more closely, it looks like there really is
> an attempt to acquire a mutex within an RCU read-side critical section,
> which is illegal.  Could you please bisect this?

Right, the issue is as you described, taking a mutex inside rcu_read_lock().

The offending commit is (I've cc'ed all parties from it):

commit adf79cc03092ee4aec70da10e91b05fb8116ac7b
Author: Ying Han <yinghan@google.com>
Date:   Thu May 3 15:44:01 2012 +1000

    memcg: add mlock statistic in memory.stat

With the issue there being is that in munlock_vma_page(), it now does
a mem_cgroup_begin_update_page_stat() which takes the rcu_read_lock(),
so when the older code that was there previously will try taking a
mutex you'll get a BUG.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-04  4:08       ` Sasha Levin
@ 2012-05-04  5:33         ` Paul E. McKenney
  2012-05-09  4:59           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2012-05-04  5:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel@vger.kernel.org List, Dave Jones, yinghan,
	kosaki.motohiro, Andrew Morton

On Fri, May 04, 2012 at 06:08:34AM +0200, Sasha Levin wrote:
> On Thu, May 3, 2012 at 7:01 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, May 03, 2012 at 05:55:14PM +0200, Sasha Levin wrote:
> >> On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
> >> >> Hi Paul,
> >> >>
> >> >> I've hit a BUG similar to the schedule_tail() one when. It happened
> >> >> when I've started fuzzing exit_group() syscalls, and all of the traces
> >> >> are starting with exit_group() (there's a flood of them).
> >> >>
> >> >> I've verified that it indeed BUGs due to the rcu preempt count.
> >> >
> >> > Hello, Sasha,
> >> >
> >> > Which version of -next are you using?  I did some surgery on this
> >> > yesterday based on some bugs Hugh Dickins tracked down, so if you
> >> > are using something older, please move to the current -next.
> >>
> >> I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty).
> >
> > Hmmm...  Looking at this more closely, it looks like there really is
> > an attempt to acquire a mutex within an RCU read-side critical section,
> > which is illegal.  Could you please bisect this?
> 
> Right, the issue is as you described, taking a mutex inside rcu_read_lock().
> 
> The offending commit is (I've cc'ed all parties from it):
> 
> commit adf79cc03092ee4aec70da10e91b05fb8116ac7b
> Author: Ying Han <yinghan@google.com>
> Date:   Thu May 3 15:44:01 2012 +1000
> 
>     memcg: add mlock statistic in memory.stat
> 
> With the issue there being is that in munlock_vma_page(), it now does
> a mem_cgroup_begin_update_page_stat() which takes the rcu_read_lock(),
> so when the older code that was there previously will try taking a
> mutex you'll get a BUG.

Hmmm...  One approach would be to switch from rcu_read_lock() to
srcu_read_lock(), though this means carrying the index returned from
the srcu_read_lock() to the matching srcu_read_unlock() -- and making
the update side use synchronize_srcu() rather than synchronize_rcu().
Alternatively, it might be possible to defer acquiring the lock until
after exiting the RCU read-side critical section, but I don't know enough
about mm to even guess whether this might be possible.

There are probably other approaches as well...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-04  5:33         ` Paul E. McKenney
@ 2012-05-09  4:59           ` KAMEZAWA Hiroyuki
  2012-05-09 13:57             ` Paul E. McKenney
  2012-05-09 20:36             ` Hugh Dickins
  0 siblings, 2 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-05-09  4:59 UTC (permalink / raw)
  To: paulmck
  Cc: Sasha Levin, linux-kernel@vger.kernel.org List, Dave Jones,
	yinghan, kosaki.motohiro, Andrew Morton

(2012/05/04 14:33), Paul E. McKenney wrote:

> On Fri, May 04, 2012 at 06:08:34AM +0200, Sasha Levin wrote:
>> On Thu, May 3, 2012 at 7:01 PM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>>> On Thu, May 03, 2012 at 05:55:14PM +0200, Sasha Levin wrote:
>>>> On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney
>>>> <paulmck@linux.vnet.ibm.com> wrote:
>>>>> On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
>>>>>> Hi Paul,
>>>>>>
>>>>>> I've hit a BUG similar to the schedule_tail() one when. It happened
>>>>>> when I've started fuzzing exit_group() syscalls, and all of the traces
>>>>>> are starting with exit_group() (there's a flood of them).
>>>>>>
>>>>>> I've verified that it indeed BUGs due to the rcu preempt count.
>>>>>
>>>>> Hello, Sasha,
>>>>>
>>>>> Which version of -next are you using?  I did some surgery on this
>>>>> yesterday based on some bugs Hugh Dickins tracked down, so if you
>>>>> are using something older, please move to the current -next.
>>>>
>>>> I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty).
>>>
>>> Hmmm...  Looking at this more closely, it looks like there really is
>>> an attempt to acquire a mutex within an RCU read-side critical section,
>>> which is illegal.  Could you please bisect this?
>>
>> Right, the issue is as you described, taking a mutex inside rcu_read_lock().
>>
>> The offending commit is (I've cc'ed all parties from it):
>>
>> commit adf79cc03092ee4aec70da10e91b05fb8116ac7b
>> Author: Ying Han <yinghan@google.com>
>> Date:   Thu May 3 15:44:01 2012 +1000
>>
>>     memcg: add mlock statistic in memory.stat
>>
>> With the issue there being is that in munlock_vma_page(), it now does
>> a mem_cgroup_begin_update_page_stat() which takes the rcu_read_lock(),
>> so when the older code that was there previously will try taking a
>> mutex you'll get a BUG.
> 
> Hmmm...  One approach would be to switch from rcu_read_lock() to
> srcu_read_lock(), though this means carrying the index returned from
> the srcu_read_lock() to the matching srcu_read_unlock() -- and making
> the update side use synchronize_srcu() rather than synchronize_rcu().
> Alternatively, it might be possible to defer acquiring the lock until
> after exiting the RCU read-side critical section, but I don't know enough
> about mm to even guess whether this might be possible.
> 
> There are probably other approaches as well...


How about this ?
==
[PATCH] memcg: fix taking mutex under rcu at munlock

Following bug was reported because mutex is held under rcu_read_lock().

[   83.820976] BUG: sleeping function called from invalid context at
kernel/mutex.c:269
[   83.827870] in_atomic(): 0, irqs_disabled(): 0, pid: 4506, name: trinity
[   83.832154] 1 lock held by trinity/4506:
[   83.834224]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff811a7d87>]
munlock_vma_page+0x197/0x200
[   83.839310] Pid: 4506, comm: trinity Tainted: G        W
3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty #108
[   83.849418] Call Trace:
[   83.851182]  [<ffffffff810e7218>] __might_sleep+0x1f8/0x210
[   83.854076]  [<ffffffff82d9540a>] mutex_lock_nested+0x2a/0x50
[   83.857120]  [<ffffffff811b0830>] try_to_unmap_file+0x40/0x2f0
[   83.860242]  [<ffffffff82d984bb>] ? _raw_spin_unlock_irq+0x2b/0x80
[   83.863423]  [<ffffffff810e7ffe>] ? sub_preempt_count+0xae/0xf0
[   83.866347]  [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
[   83.869570]  [<ffffffff811b0caa>] try_to_munlock+0x6a/0x80
[   83.872667]  [<ffffffff811a7cc6>] munlock_vma_page+0xd6/0x200
[   83.875646]  [<ffffffff811a7d87>] ? munlock_vma_page+0x197/0x200
[   83.878798]  [<ffffffff811a7e7f>] munlock_vma_pages_range+0x8f/0xd0
[   83.882235]  [<ffffffff811a8b8a>] exit_mmap+0x5a/0x160

This bug was introduced by mem_cgroup_begin/end_update_page_stat()
which uses rcu_read_lock(). This patch fixes the bug by modifying
the range of rcu_read_lock().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/mlock.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 2fd967a..05ac10d1 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -123,6 +123,7 @@ void munlock_vma_page(struct page *page)
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
 		mem_cgroup_dec_page_stat(page, MEMCG_NR_MLOCK);
+		mem_cgroup_end_update_page_stat(page, &locked, &flags);
 		if (!isolate_lru_page(page)) {
 			int ret = SWAP_AGAIN;
 
@@ -154,8 +155,8 @@ void munlock_vma_page(struct page *page)
 			else
 				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
 		}
-	}
-	mem_cgroup_end_update_page_stat(page, &locked, &flags);
+	} else
+		mem_cgroup_end_update_page_stat(page, &locked, &flags);
 }
 
 /**
-- 
1.7.4.1




^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-09  4:59           ` KAMEZAWA Hiroyuki
@ 2012-05-09 13:57             ` Paul E. McKenney
  2012-05-09 20:36             ` Hugh Dickins
  1 sibling, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2012-05-09 13:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Sasha Levin, linux-kernel@vger.kernel.org List, Dave Jones,
	yinghan, kosaki.motohiro, Andrew Morton

On Wed, May 09, 2012 at 01:59:59PM +0900, KAMEZAWA Hiroyuki wrote:
> (2012/05/04 14:33), Paul E. McKenney wrote:
> 
> > On Fri, May 04, 2012 at 06:08:34AM +0200, Sasha Levin wrote:
> >> On Thu, May 3, 2012 at 7:01 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >>> On Thu, May 03, 2012 at 05:55:14PM +0200, Sasha Levin wrote:
> >>>> On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney
> >>>> <paulmck@linux.vnet.ibm.com> wrote:
> >>>>> On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
> >>>>>> Hi Paul,
> >>>>>>
> >>>>>> I've hit a BUG similar to the schedule_tail() one when. It happened
> >>>>>> when I've started fuzzing exit_group() syscalls, and all of the traces
> >>>>>> are starting with exit_group() (there's a flood of them).
> >>>>>>
> >>>>>> I've verified that it indeed BUGs due to the rcu preempt count.
> >>>>>
> >>>>> Hello, Sasha,
> >>>>>
> >>>>> Which version of -next are you using?  I did some surgery on this
> >>>>> yesterday based on some bugs Hugh Dickins tracked down, so if you
> >>>>> are using something older, please move to the current -next.
> >>>>
> >>>> I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty).
> >>>
> >>> Hmmm...  Looking at this more closely, it looks like there really is
> >>> an attempt to acquire a mutex within an RCU read-side critical section,
> >>> which is illegal.  Could you please bisect this?
> >>
> >> Right, the issue is as you described, taking a mutex inside rcu_read_lock().
> >>
> >> The offending commit is (I've cc'ed all parties from it):
> >>
> >> commit adf79cc03092ee4aec70da10e91b05fb8116ac7b
> >> Author: Ying Han <yinghan@google.com>
> >> Date:   Thu May 3 15:44:01 2012 +1000
> >>
> >>     memcg: add mlock statistic in memory.stat
> >>
> >> With the issue there being is that in munlock_vma_page(), it now does
> >> a mem_cgroup_begin_update_page_stat() which takes the rcu_read_lock(),
> >> so when the older code that was there previously will try taking a
> >> mutex you'll get a BUG.
> > 
> > Hmmm...  One approach would be to switch from rcu_read_lock() to
> > srcu_read_lock(), though this means carrying the index returned from
> > the srcu_read_lock() to the matching srcu_read_unlock() -- and making
> > the update side use synchronize_srcu() rather than synchronize_rcu().
> > Alternatively, it might be possible to defer acquiring the lock until
> > after exiting the RCU read-side critical section, but I don't know enough
> > about mm to even guess whether this might be possible.
> > 
> > There are probably other approaches as well...
> 
> 
> How about this ?

That looks to me to avoid acquiring the mutex within an RCU read-side
critical section, so good.  I have to defer to you guys on whether the
placement of the mem_cgroup_end_update_page_stat() works.

							Thanx, Paul

> ==
> [PATCH] memcg: fix taking mutex under rcu at munlock
> 
> Following bug was reported because mutex is held under rcu_read_lock().
> 
> [   83.820976] BUG: sleeping function called from invalid context at
> kernel/mutex.c:269
> [   83.827870] in_atomic(): 0, irqs_disabled(): 0, pid: 4506, name: trinity
> [   83.832154] 1 lock held by trinity/4506:
> [   83.834224]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff811a7d87>]
> munlock_vma_page+0x197/0x200
> [   83.839310] Pid: 4506, comm: trinity Tainted: G        W
> 3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty #108
> [   83.849418] Call Trace:
> [   83.851182]  [<ffffffff810e7218>] __might_sleep+0x1f8/0x210
> [   83.854076]  [<ffffffff82d9540a>] mutex_lock_nested+0x2a/0x50
> [   83.857120]  [<ffffffff811b0830>] try_to_unmap_file+0x40/0x2f0
> [   83.860242]  [<ffffffff82d984bb>] ? _raw_spin_unlock_irq+0x2b/0x80
> [   83.863423]  [<ffffffff810e7ffe>] ? sub_preempt_count+0xae/0xf0
> [   83.866347]  [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
> [   83.869570]  [<ffffffff811b0caa>] try_to_munlock+0x6a/0x80
> [   83.872667]  [<ffffffff811a7cc6>] munlock_vma_page+0xd6/0x200
> [   83.875646]  [<ffffffff811a7d87>] ? munlock_vma_page+0x197/0x200
> [   83.878798]  [<ffffffff811a7e7f>] munlock_vma_pages_range+0x8f/0xd0
> [   83.882235]  [<ffffffff811a8b8a>] exit_mmap+0x5a/0x160
> 
> This bug was introduced by mem_cgroup_begin/end_update_page_stat()
> which uses rcu_read_lock(). This patch fixes the bug by modifying
> the range of rcu_read_lock().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/mlock.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 2fd967a..05ac10d1 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -123,6 +123,7 @@ void munlock_vma_page(struct page *page)
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
>  		mem_cgroup_dec_page_stat(page, MEMCG_NR_MLOCK);
> +		mem_cgroup_end_update_page_stat(page, &locked, &flags);
>  		if (!isolate_lru_page(page)) {
>  			int ret = SWAP_AGAIN;
> 
> @@ -154,8 +155,8 @@ void munlock_vma_page(struct page *page)
>  			else
>  				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
>  		}
> -	}
> -	mem_cgroup_end_update_page_stat(page, &locked, &flags);
> +	} else
> +		mem_cgroup_end_update_page_stat(page, &locked, &flags);
>  }
> 
>  /**
> -- 
> 1.7.4.1
> 
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: rcu: BUG on exit_group
  2012-05-09  4:59           ` KAMEZAWA Hiroyuki
  2012-05-09 13:57             ` Paul E. McKenney
@ 2012-05-09 20:36             ` Hugh Dickins
  1 sibling, 0 replies; 9+ messages in thread
From: Hugh Dickins @ 2012-05-09 20:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Andrew Morton
  Cc: paulmck, Sasha Levin, linux-kernel@vger.kernel.org List,
	Dave Jones, yinghan, kosaki.motohiro

On Wed, 9 May 2012, KAMEZAWA Hiroyuki wrote:
> [PATCH] memcg: fix taking mutex under rcu at munlock
> 
> Following bug was reported because mutex is held under rcu_read_lock().
> 
> [   83.820976] BUG: sleeping function called from invalid context at
> kernel/mutex.c:269
> [   83.827870] in_atomic(): 0, irqs_disabled(): 0, pid: 4506, name: trinity
> [   83.832154] 1 lock held by trinity/4506:
> [   83.834224]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff811a7d87>]
> munlock_vma_page+0x197/0x200
> [   83.839310] Pid: 4506, comm: trinity Tainted: G        W
> 3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty #108
> [   83.849418] Call Trace:
> [   83.851182]  [<ffffffff810e7218>] __might_sleep+0x1f8/0x210
> [   83.854076]  [<ffffffff82d9540a>] mutex_lock_nested+0x2a/0x50
> [   83.857120]  [<ffffffff811b0830>] try_to_unmap_file+0x40/0x2f0
> [   83.860242]  [<ffffffff82d984bb>] ? _raw_spin_unlock_irq+0x2b/0x80
> [   83.863423]  [<ffffffff810e7ffe>] ? sub_preempt_count+0xae/0xf0
> [   83.866347]  [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
> [   83.869570]  [<ffffffff811b0caa>] try_to_munlock+0x6a/0x80
> [   83.872667]  [<ffffffff811a7cc6>] munlock_vma_page+0xd6/0x200
> [   83.875646]  [<ffffffff811a7d87>] ? munlock_vma_page+0x197/0x200
> [   83.878798]  [<ffffffff811a7e7f>] munlock_vma_pages_range+0x8f/0xd0
> [   83.882235]  [<ffffffff811a8b8a>] exit_mmap+0x5a/0x160
> 
> This bug was introduced by mem_cgroup_begin/end_update_page_stat()
> which uses rcu_read_lock(). This patch fixes the bug by modifying
> the range of rcu_read_lock().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Yes, I expect that this does fix the reported issue - thanks.  But
Ying and I would prefer for her memcg mlock stats patch simply to be
reverted from akpm's tree for now, as she requested on Friday.

Hannes kindly posted his program which would bypass these memcg mlock
statistics, so we need to fix that case, and bring back the warning
when mlocked pages are freed.

And although I think there's no immediate problem with doing the
isolate_lru_page/putback_lru_page while under the memcg stats lock,
I do have a potential (post-per-memcg-per-zone lru locking) patch
which just uses lru_lock for the move_lock (fixes an unlikely race
Konstantin pointed out with my version of lru locking patches) -
which would (of course) require us not to hold stats lock while
doing the lru part of it.

Though what I'd really like (but fail to find) is a better way of
handling the stats versus move, that doesn't get us into locking
hierarchy questions.

Ongoing work to come later.  For now, Andrew, please just revert Ying's
"memcg: add mlock statistic in memory.stat" patch (and your fix to it).

Thanks,
Hugh

> ---
>  mm/mlock.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 2fd967a..05ac10d1 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -123,6 +123,7 @@ void munlock_vma_page(struct page *page)
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
>  		mem_cgroup_dec_page_stat(page, MEMCG_NR_MLOCK);
> +		mem_cgroup_end_update_page_stat(page, &locked, &flags);
>  		if (!isolate_lru_page(page)) {
>  			int ret = SWAP_AGAIN;
>  
> @@ -154,8 +155,8 @@ void munlock_vma_page(struct page *page)
>  			else
>  				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
>  		}
> -	}
> -	mem_cgroup_end_update_page_stat(page, &locked, &flags);
> +	} else
> +		mem_cgroup_end_update_page_stat(page, &locked, &flags);
>  }
>  
>  /**
> -- 
> 1.7.4.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-05-09 20:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-03  8:57 rcu: BUG on exit_group Sasha Levin
2012-05-03 15:41 ` Paul E. McKenney
2012-05-03 15:55   ` Sasha Levin
2012-05-03 17:01     ` Paul E. McKenney
2012-05-04  4:08       ` Sasha Levin
2012-05-04  5:33         ` Paul E. McKenney
2012-05-09  4:59           ` KAMEZAWA Hiroyuki
2012-05-09 13:57             ` Paul E. McKenney
2012-05-09 20:36             ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).