From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D54DEC47099 for ; Thu, 3 Jun 2021 17:15:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C3B04613F1 for ; Thu, 3 Jun 2021 17:15:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233740AbhFCRRd (ORCPT ); Thu, 3 Jun 2021 13:17:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:43188 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232559AbhFCRNV (ORCPT ); Thu, 3 Jun 2021 13:13:21 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id B72B36141B; Thu, 3 Jun 2021 17:10:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1622740241; bh=yCIrM2YCEREfws3zQzUwYiMb4cPdjUcMGWaBVJu7oZU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QYAIbUHRFletN3+iJW3oa2bSXg/IBH+v+3juRbIvc3+TiPv3u/MmuYVAWyJ6qqATm 0GWf3+9+5zzYQ1BgIg659CBCkIzY+62zNx+xOv0FRoUvo8b5Y4N4eu1ZceD2PKyoQf n1lh/zBm8R9eO7CwwpJiPgesVaSRFC9DfSwectW5uIcfOZms8tby80PdZXKqSns9u6 RJPktbZuyBcy2vXaIUW1qGuHx89WlndYJ1gggMSRMvfo1Ce+QtVA2ytIKhgVNC9JPO FDBVDFA7e1GOzKkwTy4luG3r/Q0DGYbqHlR9w5NsDzXZ5Kr04IBsY0yg2J/JkkdhmY mUwZSF+0wjEmw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Sergey Senozhatsky , Tejun Heo , Sasha Levin Subject: [PATCH AUTOSEL 4.14 09/18] wq: handle VM suspension in stall detection Date: Thu, 3 Jun 2021 13:10:20 -0400 Message-Id: <20210603171029.3169669-9-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210603171029.3169669-1-sashal@kernel.org> References: <20210603171029.3169669-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sergey Senozhatsky [ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ] If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then once this VCPU resumes it will see the new jiffies value, while it may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this VCPU and updates all the watchdogs via pvclock_touch_watchdogs(). There is a small chance of misreported WQ stalls in the meantime, because new jiffies is time_after() old 'ts + thresh'. wq_watchdog_timer_fn() { for_each_pool(pool, pi) { if (time_after(jiffies, ts + thresh)) { pr_emerg("BUG: workqueue lockup - pool"); } } } Save jiffies at the beginning of this function and use that value for stall detection. If VM gets suspended then we continue using "old" jiffies value and old WQ touch timestamps. If IRQ at some point restarts the stall detection cycle (pvclock_touch_watchdogs()) then old jiffies will always be before new 'ts + thresh'. Signed-off-by: Sergey Senozhatsky Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin --- kernel/workqueue.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bc32ed4a4cf3..58e7eefe4dbf 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -49,6 +49,7 @@ #include #include #include +#include #include "workqueue_internal.h" @@ -5465,6 +5466,7 @@ static void wq_watchdog_timer_fn(unsigned long data) { unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ; bool lockup_detected = false; + unsigned long now = jiffies; struct worker_pool *pool; int pi; @@ -5479,6 +5481,12 @@ static void wq_watchdog_timer_fn(unsigned long data) if (list_empty(&pool->worklist)) continue; + /* + * If a virtual machine is stopped by the host it can look to + * the watchdog like a stall. + */ + kvm_check_and_clear_guest_paused(); + /* get the latest of pool and touched timestamps */ pool_ts = READ_ONCE(pool->watchdog_ts); touched = READ_ONCE(wq_watchdog_touched); @@ -5497,12 +5505,12 @@ static void wq_watchdog_timer_fn(unsigned long data) } /* did we stall? */ - if (time_after(jiffies, ts + thresh)) { + if (time_after(now, ts + thresh)) { lockup_detected = true; pr_emerg("BUG: workqueue lockup - pool"); pr_cont_pool_info(pool); pr_cont(" stuck for %us!\n", - jiffies_to_msecs(jiffies - pool_ts) / 1000); + jiffies_to_msecs(now - pool_ts) / 1000); } } -- 2.30.2