From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751927AbdI1QFo (ORCPT <rfc822;w@1wt.eu>);
        Thu, 28 Sep 2017 12:05:44 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:32814 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1751241AbdI1QFn (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 28 Sep 2017 12:05:43 -0400
Date: Thu, 28 Sep 2017 09:05:14 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Cc: Sasha Levin <levinsasha928@gmail.com>,
        "linux-kernel@vger.kernel.org List" <linux-kernel@vger.kernel.org>,
        Ingo Molnar <mingo@kernel.org>,
        "jiangshanlai@gmail.com" <jiangshanlai@gmail.com>,
        "dipankar@in.ibm.com" <dipankar@in.ibm.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        Josh Triplett <josh@joshtriplett.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <peterz@infradead.org>,
        Steven Rostedt <rostedt@goodmis.org>,
        "dhowells@redhat.com" <dhowells@redhat.com>,
        Eric Dumazet <edumazet@google.com>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>,
        Oleg Nesterov <oleg@redhat.com>,
        "bobby.prani@gmail.com" <bobby.prani@gmail.com>
Subject: Re: [PATCH v3 tip/core/rcu 40/40] rcu: Make non-preemptive schedule
 be Tasks RCU quiescent state
Reply-To: paulmck@linux.vnet.ibm.com
References: <20170419165805.GB10874@linux.vnet.ibm.com>
 <1492621117-13939-40-git-send-email-paulmck@linux.vnet.ibm.com>
 <CA+1xoqdDCuQ5pz61aHn3Y-VdP5g2GvYfXmTdpHsWJG0dsM3DKg@mail.gmail.com>
 <20170928123055.GI3521@linux.vnet.ibm.com>
 <20170928153813.7cernglt2d7umhpe@sasha-lappy>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170928153813.7cernglt2d7umhpe@sasha-lappy>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
x-cbid: 17092816-0048-0000-0000-000001ECC075
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007805; HX=3.00000241; KW=3.00000007;
 PH=3.00000004; SC=3.00000232; SDB=6.00923648; UDB=6.00464360; IPR=6.00703786;
 BA=6.00005613; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017305; XFM=3.00000015;
 UTC=2017-09-28 16:05:20
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17092816-0049-0000-0000-000042B4E05E
Message-Id: <20170928160514.GM3521@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-28_05:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000
 definitions=main-1709280238
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Sep 28, 2017 at 03:38:16PM +0000, Levin, Alexander (Sasha Levin) wrote:
> On Thu, Sep 28, 2017 at 05:30:55AM -0700, Paul E. McKenney wrote:
> >On Thu, Sep 28, 2017 at 02:37:20AM -0700, Sasha Levin wrote:
> >> On Wed, Apr 19, 2017 at 9:58 AM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > Currently, a call to schedule() acts as a Tasks RCU quiescent state
> >> > only if a context switch actually takes place.  However, just the
> >> > call to schedule() guarantees that the calling task has moved off of
> >> > whatever tracing trampoline that it might have been one previously.
> >> > This commit therefore plumbs schedule()'s "preempt" parameter into
> >> > rcu_note_context_switch(), which then records the Tasks RCU quiescent
> >> > state, but only if this call to schedule() was -not- due to a preemption.
> >> >
> >> > To avoid adding overhead to the common-case context-switch path,
> >> > this commit hides the rcu_note_context_switch() check under an existing
> >> > non-common-case check.
> >> >
> >> > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> >> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >>
> >> Hey Paul,
> >>
> >> I'm seeing the following on the latest -next kernel, and suspect it's
> >> caused bit this patch:
> >
> >Hmmm...  kernel/rcu/tree_plugin.h:329 thinks that someone slept (as opposed
> >to was preempted) in an RCU read-side critical section.
> >
> >If this is reproducible, could you please enable lockdep if you are not
> >already doing so?
> 
> lockdep was on, as far as I can tell.
> 
> It happened once in ~3 weeks of fuzzing, so not sure how easily I can reproduce.

Oh, wait...  Looking at the stack trace:

> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x11d/0x1e5 lib/dump_stack.c:52
>  panic+0x1bc/0x3b7 kernel/panic.c:181
>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>  report_bug+0x211/0x2d0 lib/bug.c:183
>  fixup_bug+0x3f/0x90 arch/x86/kernel/traps.c:178
>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>  do_error_trap+0x133/0x380 arch/x86/kernel/traps.c:298
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
> RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
> RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
> RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
> RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
> RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
> RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
> R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040

Up to here is all about the kernel complaining.

>  __schedule+0x201/0x2240 kernel/sched/core.c:3292
>  schedule+0x113/0x460 kernel/sched/core.c:3421
>  kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158

It is kvm_async_pf_task_wait() that calls schedule(), but it carefully
sets state to make that legal.  Except...

>  do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
>  async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
> RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
> RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
> RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
> RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
> RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
> R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
> R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
>  vsnprintf+0x173/0x1700 lib/vsprintf.c:2136

We took a page fault in vsnprintf() while doing link_path_walk(),
which looks to be within an RCU read-side critical section.

Maybe the page fault confused lockdep?

Sigh.  It is going to be a real pain if all printk()s need to be
outside of RCU read-side critical sections due to the possibility of
page faults...

							Thanx, Paul

>  sprintf+0xbe/0xf0 lib/vsprintf.c:2386
>  proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
>  get_link fs/namei.c:1047 [inline]
>  link_path_walk+0x1041/0x1490 fs/namei.c:2127
>  path_openat+0x2d0/0x3600 fs/namei.c:3527
>  do_filp_open+0x263/0x3c0 fs/namei.c:3562
>  do_sys_open+0x515/0x6f0 fs/open.c:1059
>  SYSC_openat fs/open.c:1086 [inline]
>  SyS_openat+0x30/0x40 fs/open.c:1080
>  do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x4889da
> RSP: 002b:000000c4261eb8f0 EFLAGS: 00000206 ORIG_RAX: 0000000000000101
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004889da
> RDX: 0000000000000002 RSI: 000000c42338d2c0 RDI: ffffffffffffff9c
> RBP: 000000c4261eb980 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 00000000ffffffee R14: 00000000000012c0 R15: 0000000000000100
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range:
> 0xffffffff80000000-0xffffffffbfffffff)
> Rebooting in 86400 seconds..

So kvm_async_pf_task_wait() does a schedule(), and it carefully