From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 95531C433FE for ; Mon, 10 Oct 2022 18:01:36 +0000 (UTC) Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id 6C15684EE8; Mon, 10 Oct 2022 20:01:34 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de Authentication-Results: phobos.denx.de; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="FzkotAQW"; dkim-atps=neutral Received: by phobos.denx.de (Postfix, from userid 109) id 430EF84EF5; Mon, 10 Oct 2022 20:01:32 +0200 (CEST) Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by phobos.denx.de (Postfix) with ESMTPS id 40FB884E35 for ; Mon, 10 Oct 2022 20:01:29 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=pass (p=none dis=none) header.from=kernel.org Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=pali@kernel.org Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A76E460B20; Mon, 10 Oct 2022 18:01:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BADEBC433D6; Mon, 10 Oct 2022 18:01:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1665424887; bh=LdQC1KIaT5Kbc9F6j9DVSoLPEUHH6XOpGfA8LJzL3hI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=FzkotAQWxrrLSbxHydJBLBbflOMTj54cR22j80opsEyiSOuu0lgNkxYw0D5K8eTJb 1j2eVkySdgVVmeZvVAlp0Z685DA1qecD6Nu9VMwaJIe5VXhuWZesiwlWYw6Mj1EzxK e/JDPsy2VAWZ/WkQX6QnzMMOqcTxcHPKgNXwMe53WAmTF6VqmUSgqKGNdHykjAelQe EoA6YZvV973RFKjw5Yb7QhTgRl17vJsVDCbm1FrzH2mtWizToJGf7DG00N0L0jF4qB r3Ndt7m8YZX8mu/BV2TFkxY5V4kKzk/FfE1w4VNlYADSGl4pez1yfwOAqvw2H3mZGw 8UkMeWKLaFyuA== Received: by pali.im (Postfix) id C31BF7F9; Mon, 10 Oct 2022 20:01:23 +0200 (CEST) Date: Mon, 10 Oct 2022 20:01:23 +0200 From: Pali =?utf-8?B?Um9ow6Fy?= To: Tom Rini Cc: Stefan Roese , u-boot@lists.denx.de Subject: Re: Broken watchdog in u-boot master branch Message-ID: <20221010180123.p7p4gfo2aa6u6zi3@pali> References: <20221009191225.65jwebefhqng3qbi@pali> <20221010162818.GM2020586@bill-the-cat> <20221010172256.jb4qwvgsbcucwejf@pali> <20221010174038.GP2020586@bill-the-cat> <20221010174405.5rvz7aclukn567gj@pali> <20221010175610.GQ2020586@bill-the-cat> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20221010175610.GQ2020586@bill-the-cat> User-Agent: NeoMutt/20180716 X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.39 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.103.6 at phobos.denx.de X-Virus-Status: Clean On Monday 10 October 2022 13:56:10 Tom Rini wrote: > On Mon, Oct 10, 2022 at 07:44:05PM +0200, Pali Rohár wrote: > > On Monday 10 October 2022 13:40:38 Tom Rini wrote: > > > On Mon, Oct 10, 2022 at 07:22:56PM +0200, Pali Rohár wrote: > > > > On Monday 10 October 2022 12:28:18 Tom Rini wrote: > > > > > On Sun, Oct 09, 2022 at 09:12:25PM +0200, Pali Rohár wrote: > > > > > > Hello! Watchdog code seems to be broken in u-boot master branch. > > > > > > On Nokia N900 I'm getting following message in qemu: > > > > > > > > > > > > cyclic function rx51_watchdog took too long: 10000us vs 1000us max, disabling > > > > > > > > > > > > Seems that watchdog core code is not prepared for "slower" watchdogs > > > > > > which communicate over slower i2c bus, like it is the case for N900. > > > > > > > > > > > > Disabling slower watchdog is a bad idea as it would result in reboot > > > > > > loop instead of slower - but working code. > > > > > > > > > > So, looking at this in more detail, we have > > > > > CONFIG_CYCLIC_MAX_CPU_TIME_US as a configuration option (which is where > > > > > the too long comes from). And picking a random CI run: > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/511177 > > > > > I do see we hit this in CI once, but not every time, QEMU runs here. Is > > > > > that the max time is configurable enough to satisfy your concerns here? > > > > > > > > It is needed to investigate, how to _properly_ fix this issue, not just > > > > workarounded it. Probably other boards may be affected. > > > > > > So it's the cyclic watchdog code, which we merged as early as possible > > > that's the reason here. And it was merged as early as we could to see if > > > there's problems. Are there problems? We're seeing "system too slow, > > > disabling" on QEMU, sometimes, and the value of too slow is > > > configurable. I know you reported other problems with n900 HW, so we > > > can't see if it's failing there > > > > I was tested it with older asm code (as described in that other email, > > via git checkout commit -- file) on n900 HW and watchdog problem is > > there too. Phone reboots in about 20 seconds. But as I do not have > > serial console, I do not know if that "disabling" message is printed > > there too (but I guess it is). > > I think I'm a bit baffled at this point, honestly. The watchdog timeout > is 60 seconds. If you're confident in it being about 20 seconds, > consistently, changing WATCHDOG_TIMEOUT_MSECS to say 10000 (so, 10 > seconds) should let you see if U-Boot has configured the watchdog and > it's being tripped, or if it's still at the prior stage value. $ git grep CONFIG_WATCHDOG_TIMEOUT_MSECS configs/nokia_rx51_defconfig configs/nokia_rx51_defconfig:CONFIG_WATCHDOG_TIMEOUT_MSECS=31000 Also watchdog is started by NOLO (which loads and execute U-Boot) so there can be some smaller timeout. So I have feeling that on the real HW is same issue. cyclic code disabled watchdog kicking and then watchdog restarted phone. I do not remember exact time (if it is 20s or 25s; I have not measured it precisely), but it sounds plausible. > I would have expected that QEMU would see problems that real HW doesn't > (the value in your log is much higher than the one in CI), but I could > be wrong here. > > -- > Tom