From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5BB8C54EE9 for ; Thu, 22 Sep 2022 12:02:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69EEE6B0071; Thu, 22 Sep 2022 08:02:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 64CDD6B0072; Thu, 22 Sep 2022 08:02:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53BBD940007; Thu, 22 Sep 2022 08:02:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 45E086B0071 for ; Thu, 22 Sep 2022 08:02:42 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 062CCA1490 for ; Thu, 22 Sep 2022 12:02:42 +0000 (UTC) X-FDA: 79939584564.30.41CC42C Received: from r3-20.sinamail.sina.com.cn (r3-20.sinamail.sina.com.cn [202.108.3.20]) by imf10.hostedemail.com (Postfix) with ESMTP id A4FF7C005C for ; Thu, 22 Sep 2022 12:02:39 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.249.60.74]) by sina.com (172.16.97.35) with ESMTP id 632C4E870003673E; Thu, 22 Sep 2022 20:01:12 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 96088215074182 From: Hillf Danton To: Jing-Ting Wu Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Waiman Long , Vincent Guittot , Mel Gorman , wsd_upstream@mediatek.com Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete Date: Thu, 22 Sep 2022 20:02:27 +0800 Message-Id: <20220922120227.1311-1-hdanton@sina.com> In-Reply-To: <93f4ce9486ec4b856ba0f3bfe956fc9b2d3cb4cf.camel@mediatek.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.20 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663848161; a=rsa-sha256; cv=none; b=BOxMC0Xlg26CrmZZz/fePvvGvKK+3g7rS6Y6XwV/ctad4DmLJcdM8HODPbdje/bLYCBVIn sc8VywS87lKH/gWHj0lGaMvf46udMEnQdWKIX/amMu+4xyfIbrLwh8mPYsb3Vd3S+m5ShR iAM3OX+0vZudmQApoC9dy3SIH+FKka8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663848161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XN7Xi2RanKICEK4ygg1MY03QYtH5DBtwBystG1BREI0=; b=kLK300vtxfdWPE/f9vTB3KFQHDe3Ic/tln1fx9mLuEuUd2MBzExh/3g3PJnb7Dd3geKYpO c3TSSvQn6RoAkNvUT2+ph6K9dAUOY6o2VaAZNiCBuuYIC3oi6gHEQ0aOVqVCYnpBu17J8L saDpZPuijyBaVdeiZ2acXpkrV4ZY2IQ= Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.20 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none X-Rspam-User: X-Stat-Signature: swccrty79xf71yj6z6f3u9wkqgeoqnrp X-Rspamd-Queue-Id: A4FF7C005C X-Rspamd-Server: rspam09 X-HE-Tag: 1663848159-227739 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 22 Sep 2022 13:40:47 +0800 Jing-Ting Wu wrote > > Because Peter have some concern for patch by Waiman. > We add Hillf's patch to our stability test. > But there are side effects after patched. > The warning appear once < two weeks. Thanks for your test. Any other effects observed? > > Backtrace as follows: > [name:panic&]WARNING: CPU: 6 PID: 32583 at affine_move_task > pc : affine_move_task > lr : __set_cpus_allowed_ptr_locked > Call trace: > affine_move_task > __set_cpus_allowed_ptr_locked > migrate_enable > __cgroup_bpf_run_filter_skb > ip_finish_output > ip_output > > > The root cause is when is_migration_disabled(p) is true,the patched > version will set p->migration_pending to NULL by migration_cpu_stop. > And in affine_move_task will raise a WARN_ON_ONCE(!pending). > > Kernel-5.15/kernel/sched/core.c: > static int affine_move_task(struct rq *rq, struct task_struct *p, > struct rq_flags *rf, int dest_cpu, unsigned int flags) { > ... > If (WARN_ON_ONCE(!pending)) { > Task_rq_unlock(rq,p,fr); > return -EINVAL; > } > ... > } > > But the tasks have not been migrated to the new affinity CPU, so there > should be pending tasks to be processed, so p->migration_pending should > not be NULL. > > > > Without patch: > When is_migration_disabled is true, then goto out and not set p- > >migration_pending to NULL. > > static int migration_cpu_stop(void *data) { > ... > If (task_rq(p) == rq) { > if (is_migration_disabled(p)) > goto out; > } > ... > } > > > With patch: > When is_migration_disabled is true and pending is true, goto else if > flow. Because p->cpus_ptr not updated when migrate_disable, so this > condition is always true and p->migration_pending will set to NULL. > > static int migration_cpu_stop(void *data) { > ... > If (task_rq(p) == rq && !is_migration_disabled(p) ) { > ... > } else if (pending) { > ... /* * The task moved before the stopper got to run. We're holding * ->pi_lock, so the allowed mask is stable - if it got * somewhere allowed, we're done. */ > If (cpumask_test_cpu(task_cpu(p), p-> cpus_ ptr)) { > p->migration_pending = NULL; > complete = true; > goto out; > } > ... > } Given p->migration_pending reset in case of job done, the warning you saw is benign without other negative effects observed. It should be fixed (example by simply cutting it) if you have a reproducer on top of the mainline tree. Hillf