From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B019C433EF for ; Wed, 13 Apr 2022 05:12:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232380AbiDMFO2 (ORCPT ); Wed, 13 Apr 2022 01:14:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229539AbiDMFOX (ORCPT ); Wed, 13 Apr 2022 01:14:23 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8C184BFF1; Tue, 12 Apr 2022 22:12:03 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id s14-20020a17090a880e00b001caaf6d3dd1so5171042pjn.3; Tue, 12 Apr 2022 22:12:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=rg2lurYr3ruoQu8jYzT28vzciXuBIWQaGUdRPFYHa8Y=; b=cqpW5WJq2dxo9Xpl9AtxtfJpPdxf2XqiC0K0NveMKKkqavqky1FSiLJZYafjLeWjcB esN4rtETggEWXs9X5/1NdJvjls2G1Cl3YTI2dZ8bh88HL57ziOZikc02NKL3TJxu760Z hej+rFnf9SGUI7TFdh/OFEz/SlHHOFWonerQ9O87sUkxWHqlMXuTIWCPArHU1u0njF7R srqTFh7g3PqUxop7rUb6RGCABgD2aAAcXsEOosBBmM+YFZV3LTC1yjdobLLohsqTcIIS cpc2p3Qp3msmWNq0+wPdvmraF3XemUOqZId39+2nyM7pG6jI9qC3Ba2v8YZWtcPTnYJr i9qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=rg2lurYr3ruoQu8jYzT28vzciXuBIWQaGUdRPFYHa8Y=; b=KTevU7Up0ix/vRNSgWQQLi+sjjIfuFpUHat1gqZe0SZOEhJPqoXcOVJXMWigzF3PPd z/hZxbiLQTyGNjBDTssvBuIG/l6iD+mQw8h1LN3rwhI6MLqm4yfbhjd8FNyIpH7lMN7r uS1JJSohUomI1+gTc8DdKimU/EDbx4SetnQdLJfa/knMmYtmz6b/oeHp6A2IBZlQvEEk eZdwnU6FuX7zHVMHvOTKOh9ovr8EOSUc/CXR0KgMtyderzAZCWXGfAtywxE4VEdxQi5G 8OmWv/zzGsngycVtXn0032hKGChtPMZ2gYIDu0UP4nZp11B804gtGbzOiCtA4i7Aw5MN OL7w== X-Gm-Message-State: AOAM532syk9tC5lXX5B7XT0BA1Dopo3XCaA9UM89f4s4BqaRzK8G1zEL VkSZRV+MqpAiDZeBLGwQtk4= X-Google-Smtp-Source: ABdhPJzOQ6Inv+6gpLXSzaV644vqnglofSuRMAchIsFaKAm/W4uaXTh1v8ofvyh/oXkIinU/GdnbRA== X-Received: by 2002:a17:902:a712:b0:158:9e75:686c with SMTP id w18-20020a170902a71200b001589e75686cmr2409612plq.56.1649826723274; Tue, 12 Apr 2022 22:12:03 -0700 (PDT) Received: from localhost (220-235-208-104.tpgi.com.au. [220.235.208.104]) by smtp.gmail.com with ESMTPSA id d4-20020a17090ad3c400b001c65ba76911sm1268956pjw.3.2022.04.12.22.12.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 22:12:02 -0700 (PDT) Date: Wed, 13 Apr 2022 15:11:57 +1000 From: Nicholas Piggin Subject: To: Michael Ellerman , paulmck@kernel.org, Zhouyi Zhou Cc: linuxppc-dev , Miguel Ojeda , rcu , Daniel Lezcano , Thomas Gleixner , linux-kernel@vger.kernel.org, Viresh Kumar References: <20220406170012.GO4285@paulmck-ThinkPad-P17-Gen-1> <87pmls6nt7.fsf@mpe.ellerman.id.au> <87k0bz7i1s.fsf@mpe.ellerman.id.au> In-Reply-To: <87k0bz7i1s.fsf@mpe.ellerman.id.au> MIME-Version: 1.0 Message-Id: <1649818529.j46672mh2p.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Daniel, Thomas, Viresh Subject: Re: rcu_sched self-detected stall on CPU Excerpts from Michael Ellerman's message of April 9, 2022 12:42 am: > Michael Ellerman writes: >> "Paul E. McKenney" writes: >>> On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote: >>>> Hi >>>>=20 >>>> I can reproduce it in a ppc virtual cloud server provided by Oregon >>>> State University. Following is what I do: >>>> 1) curl -l https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/li= nux.git/snapshot/linux-5.18-rc1.tar.gz >>>> -o linux-5.18-rc1.tar.gz >>>> 2) tar zxf linux-5.18-rc1.tar.gz >>>> 3) cp config linux-5.18-rc1/.config >>>> 4) cd linux-5.18-rc1 >>>> 5) make vmlinux -j 8 >>>> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot >>>> -smp 2 (QEMU 4.2.1) >>>> 7) after 12 rounds, the bug got reproduced: >>>> (http://154.223.142.244/logs/20220406/qemu.log.txt) >>> >>> Just to make sure, are you both seeing the same thing? Last I knew, >>> Zhouyi was chasing an RCU-tasks issue that appears only in kernels >>> built with CONFIG_PROVE_RCU=3Dy, which Miguel does not have set. Or di= d >>> I miss something? >>> >>> Miguel is instead seeing an RCU CPU stall warning where RCU's grace-per= iod >>> kthread slept for three milliseconds, but did not wake up for more than >>> 20 seconds. This kthread would normally have awakened on CPU 1, but >>> CPU 1 looks to me to be very unhealthy, as can be seen in your console >>> output below (but maybe my idea of what is healthy for powerpc systems >>> is outdated). Please see also the inline annotations. >>> >>> Thoughts from the PPC guys? >> >> I haven't seen it in my testing. But using Miguel's config I can >> reproduce it seemingly on every boot. >> >> For me it bisects to: >> >> 35de589cb879 ("powerpc/time: improve decrementer clockevent processing= ") >> >> Which seems plausible. >> >> Reverting that on mainline makes the bug go away. >> >> I don't see an obvious bug in the diff, but I could be wrong, or the old >> code was papering over an existing bug? >> >> I'll try and work out what it is about Miguel's config that exposes >> this vs our defconfig, that might give us a clue. >=20 > It's CONFIG_HIGH_RES_TIMERS=3Dn which triggers the stall. >=20 > I can reproduce just with: >=20 > $ make ppc64le_guest_defconfig > $ ./scripts/config -d HIGH_RES_TIMERS >=20 > We have no defconfigs that disable HIGH_RES_TIMERS, I didn't even > realise you could disable it TBH :) >=20 > The Rust CI has it disabled because I copied that from the x86 defconfig > they were using back when I added the Rust support. I think that was > meant to be a stripped down fast config for CI, but the result is it's > just using a badly tested combination which is not helpful. >=20 > So I'll send a patch to turn HIGH_RES_TIMERS on for the Rust CI, and we > can debug this further without blocking them. So we traced the problem down to possibly a misunderstanding between=20 decrementer clock event device and core code. The decrementer is only oneshot*ish*. It actually needs to either be=20 reprogrammed or shut down otherwise it just continues to cause=20 interrupts. Before commit 35de589cb879, it was sort of two-shot. The initial=20 interrupt at the programmed time would set its internal next_tb variable=20 to ~0 and call the ->event_handler(). If that did not set_next_event or=20 stop the timer, the interrupt will fire again immediately, notice=20 next_tb is ~0, and only then stop the decrementer interrupt. So that was already kind of ugly, this patch just turned it into a hang. The problem happens when the tick is stopped with an event still=20 pending, then tick_nohz_handler() is called, but it bails out because=20 tick_stopped =3D=3D 1 so the device never gets programmed again, and so it=20 keeps firing. How to fix it? Before commit a7cba02deced, powerpc's decrementer was=20 really oneshot, but we would like to avoid doing that because it requires=20 additional programming of the hardware on each timer interrupt. We have=20 the ONESHOT_STOPPED state which seems to be just about what we want. Did the ONESHOT_STOPPED patch just miss this case, or is there a reason=20 we don't stop it here? This patch seems to fix the hang (not heavily tested though). =20 Thanks, Nick --- diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 2d76c91b85de..7e13a55b6b71 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1364,9 +1364,11 @@ static void tick_nohz_handler(struct clock_event_dev= ice *dev) tick_sched_do_timer(ts, now); tick_sched_handle(ts, regs); =20 - /* No need to reprogram if we are running tickless */ - if (unlikely(ts->tick_stopped)) + if (unlikely(ts->tick_stopped)) { + /* If we are tickless, change the clock event to stopped */ + tick_program_event(KTIME_MAX, 1); return; + } =20 hrtimer_forward(&ts->sched_timer, now, TICK_NSEC); tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);