From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2659ECAAA1 for ; Wed, 7 Sep 2022 00:07:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C2068D0002; Tue, 6 Sep 2022 20:07:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 271B56B0073; Tue, 6 Sep 2022 20:07:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 160BB8D0002; Tue, 6 Sep 2022 20:07:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 02B306B0072 for ; Tue, 6 Sep 2022 20:07:51 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CD02E1A077B for ; Wed, 7 Sep 2022 00:07:51 +0000 (UTC) X-FDA: 79883351142.03.27221B3 Received: from mail3-166.sinamail.sina.com.cn (mail3-166.sinamail.sina.com.cn [202.108.3.166]) by imf30.hostedemail.com (Postfix) with SMTP id 7EB608008C for ; Wed, 7 Sep 2022 00:07:50 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.249.57.76]) by sina.com (172.16.97.23) with ESMTP id 6317E0950000B41E; Wed, 7 Sep 2022 08:06:46 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 71195554920547 From: Hillf Danton To: Jing-Ting Wu Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Waiman Long Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete Date: Wed, 7 Sep 2022 08:07:41 +0800 Message-Id: <20220907000741.2496-1-hdanton@sina.com> In-Reply-To: <88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662509271; a=rsa-sha256; cv=none; b=yA3xejPnHMHPoWx0my95JM89SVByp3fyEu3AioeHfwagCF12xHiDGMZ2WPs2s5chUuhsXZ IPXCJFxCcgVwHTpJoGl86xh3QGn12FmX8GY+Eu/gZpNfnN/qNV2Sr+x0h2eEO9d+emTgJr AbDfUv6mfe/relrQJdtzxZl8DCQ4KFc= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) smtp.mailfrom=hdanton@sina.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662509271; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6TbJitUleVBH4BdLkoNIz5zx01/uhVoRb79oPpr7j5o=; b=RGkk80cZkfuJegdOXzLR3TLHoK9S6JVDAhPW8Gx7Wgo7oJoyJb3yON7MWlXRrT7vkVuc8o nWMBVGY5E55wMKvJpNoext+1+SQWJSeGFHSdw3k6tAKVLJWxrOqlTjLZ8amof2cnsO48wH IFd4pzjNOwVD2urJcijB4lmI7WSyMvA= X-Rspamd-Queue-Id: 7EB608008C X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) smtp.mailfrom=hdanton@sina.com X-Rspamd-Server: rspam01 X-Stat-Signature: 5h9dhdf6631tfdqy515xtx5mcod5od9k X-HE-Tag: 1662509270-717142 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5 Sep 2022 10:47:36 +0800 Jing-Ting Wu wrote > > We meet the HANG_DETECT happened in T SW version with kernel-5.15. > Many tasks have been blocked for a long time. > > Root cause: > migration_cpu_stop() is not complete due to is_migration_disabled(p) is > true, complete is false and complete_all() never get executed. > It let other task wait the rwsem. See if handing task over to stopper again in case of migration disabled could survive your tests. Hillf --- linux-5.15/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2322,9 +2322,7 @@ static int migration_cpu_stop(void *data * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because * we're holding p->pi_lock. */ - if (task_rq(p) == rq) { - if (is_migration_disabled(p)) - goto out; + if (task_rq(p) == rq && !is_migration_disabled(p)) { if (pending) { p->migration_pending = NULL;