From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8C6BC433F5 for ; Fri, 8 Oct 2021 05:56:09 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 719376101A for ; Fri, 8 Oct 2021 05:56:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 719376101A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ymIRsjwDdyR3SYisTN3WdmXS+HhL6cQq6Exm3K9ENig=; b=Dqsn8k4mhkHSkM 1sN4l538Z0JmhjhySztgLmlhjdOHdT2oE4x0RdRT+It8i6to+ZeRd9XLbdwUbCPqfQEFiPchVNC9h Gw6L29tpsJPCBBNJ8WpCta64YcU9ozgkDagxTI1ftDqqtmb4Z4Pr5fSgVjkpK4CA97X8tn7V/V4YH IH1zn59FMuoOujQsiyuy/rdCI1/jEODW8ma969WVkelldEAVKExgQIfwsF89dRdp33J7/Sz38T7PW 245xRxBarv3hjRDODTcufyNEdeY9IXFqgoLXNA1UqZdj6G5n85SfgWRs4hN7q0hjU0MbFxn40rvt6 9gIrSHU2QkU0U9HJTtCQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mYipj-001fg7-Br; Fri, 08 Oct 2021 05:54:15 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mYipf-001ffV-SO for linux-arm-kernel@lists.infradead.org; Fri, 08 Oct 2021 05:54:13 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633672450; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p/8Q73TOBZDtJheVPT8JeTVTM8ZE3M+LmUjxYN/WtSU=; b=DG8AZiR41aB0wHuueAhvz4UcPMjuWetwpBMrJ1WLfIpfhu6mw/hE+KYqsZrWOZZA9E5UnW DuvNk6I8JZtfC2nEbVTHA1yrIiAKVzvH8VIOt4W3kPSRW0WR7FhNXeqs0Nfpyu34ySgwuv y9dAiqMlL2LywwHDVICTeN300vyiQ1M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-34-Fnie5ebpOfeRR5HnIGGaGw-1; Fri, 08 Oct 2021 01:54:03 -0400 X-MC-Unique: Fnie5ebpOfeRR5HnIGGaGw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B7188801AA7; Fri, 8 Oct 2021 05:54:00 +0000 (UTC) Received: from piliu.users.ipa.redhat.com (ovpn-8-26.pek2.redhat.com [10.72.8.26]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 06D315F4EE; Fri, 8 Oct 2021 05:53:49 +0000 (UTC) Date: Fri, 8 Oct 2021 13:53:45 +0800 From: Pingfan Liu To: Petr Mladek Cc: Pingfan Liu , linux-kernel@vger.kernel.org, Sumit Garg , Catalin Marinas , Will Deacon , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Marc Zyngier , Kees Cook , Masahiro Yamada , Sami Tolvanen , Andrew Morton , Wang Qing , "Peter Zijlstra (Intel)" , Santosh Sivaraj , linux-arm-kernel@lists.infradead.org Subject: Re: [PATCHv2 3/4] kernel/watchdog: adapt the watchdog_hld interface for async model Message-ID: References: <20210923140951.35902-1-kernelfans@gmail.com> <20210923140951.35902-4-kernelfans@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211007_225412_020529_329C1FCD X-CRM114-Status: GOOD ( 31.57 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Oct 05, 2021 at 09:03:17AM +0200, Petr Mladek wrote: [...] > > +static void lockup_detector_delay_init(struct work_struct *work); > > +bool hld_detector_delay_initialized __initdata; > > + > > +struct wait_queue_head hld_detector_wait __initdata = > > + __WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait); > > + > > +static struct work_struct detector_work __initdata = > > + __WORK_INITIALIZER(detector_work, lockup_detector_delay_init); > > + > > +static void __init lockup_detector_delay_init(struct work_struct *work) > > +{ > > + int ret; > > + > > + wait_event(hld_detector_wait, hld_detector_delay_initialized); > > + ret = watchdog_nmi_probe(); > > + if (!ret) { > > + nmi_watchdog_available = true; > > + lockup_detector_setup(); > > Is it really safe to call the entire lockup_detector_setup() > later? > > It manipulates also softlockup detector. And more importantly, > the original call is before smp_init(). It means that it was > running when only single CPU was on. > For the race analysis, lockup_detector_reconfigure() is on the centre stage. Since proc_watchdog_update() can call lockup_detector_reconfigure() to re-initialize both soft and hard lockup detector, so the race issue should be already taken into consideration. > It seems that x86 has some problem with hardlockup detector as > well. It later manipulates only the hardlockup detector. Also it uses > cpus_read_lock() to prevent races with CPU hotplug, see > fixup_ht_bug(). > Yes. But hardlockup_detector_perf_{stop,start}() can not meet the requirement, since no perf_event is created yet. So there is no handy interface to re-initialize hardlockup detector directly. > > > + } else { > > + WARN_ON(ret == -EBUSY); > > + pr_info("Perf NMI watchdog permanently disabled\n"); > > + } > > +} > > + > > void __init lockup_detector_init(void) > > { > > + int ret; > > + > > if (tick_nohz_full_enabled()) > > pr_info("Disabling watchdog on nohz_full cores by default\n"); > > > > cpumask_copy(&watchdog_cpumask, > > housekeeping_cpumask(HK_FLAG_TIMER)); > > > > - if (!watchdog_nmi_probe()) > > + ret = watchdog_nmi_probe(); > > + if (!ret) > > nmi_watchdog_available = true; > > + else if (ret == -EBUSY) > > + queue_work_on(smp_processor_id(), system_wq, &detector_work); > > IMHO, this is not acceptable. It will block one worker until someone > wakes it. Only arm64 will have a code to wake up the work and only > when pmu is successfully initialized. In all other cases, the worker > will stay blocked forever. > What about consider -EBUSY and hld_detector_delay_initialized as a unit? If watchdog_nmi_probe() returns -EBUSY, then set the state of ld_detector_delay_initialized as "waiting", and then moved to state "finished". And at the end of do_initcalls(), check the state is "finished". If not, then throw a warning and wake up the worker. > The right solution is to do it the other way. Queue the work > from arm64-specific code when armv8_pmu_driver_init() succeeded. > Could it be better if watchdog can provide a common framework for future extension instead of arch specific? The 2nd argument is to avoid the message "Perf NMI watchdog permanently disabled" while later enabling it. (Please see lockup_detector_init()->watchdog_nmi_probe()->hardlockup_detector_perf_init(), but if providing arch specific probe method, it can be avoided) > Also I suggest to flush the work to make sure that it is finished > before __init code gets removed. > Good point, and very interesting. I will look into it. > > The open question is what code the work will call. As mentioned > above, I am not sure that lockup_detector_delay_init() is safe. > IMHO, we need to manipulate only hardlockup detector and > we have to serialize it against CPU hotplug. > As explained ahead, it has already consider the race against CPU hotplug. Thanks, Pingfan _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel