From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E912ECAAD8 for ; Fri, 23 Sep 2022 14:35:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232322AbiIWOfF (ORCPT ); Fri, 23 Sep 2022 10:35:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231770AbiIWOel (ORCPT ); Fri, 23 Sep 2022 10:34:41 -0400 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A9FCB7E2; Fri, 23 Sep 2022 07:34:33 -0700 (PDT) Received: from pps.filterd (m0279864.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28NDx8tV026081; Fri, 23 Sep 2022 14:33:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=qcppdkim1; bh=Amh2W25vG622JwDCDmZuOupC0Zg1Foj/eEoyphdMAS8=; b=ZDQ0t2RdHHOc4K74nEIKRrclnpRoQIPoDvtRY6RsiNluzgAwDRT97zKmuaj+brDB4iJk TSZK5woDXrUMYWdGcgy6RQMuicy4qzguj/E2SUyPWvSav1KYPNcgVtAZUu8JQGIrHiiM xijtdqdeFn4++LpPiTQzqiKJ3/FbjEwp7icbAtuYLGCmGpGhDdEYMLB5sLOPEth2dK3j MUboDEuKZgDDcx3w8l8rYSdjnvUB0F8XZZCsNCora26Kesu/KEiz0lYnZqpJGNJuB6vR QXEcVjD+qVYo9NtecmpjeGtlCDJBkm61wnc7x9NenNEp33cJY7kDqjbJhCZVfXAR4DRX 5g== Received: from nasanppmta05.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3js67nhxys-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Sep 2022 14:33:58 +0000 Received: from pps.filterd (NASANPPMTA05.qualcomm.com [127.0.0.1]) by NASANPPMTA05.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTP id 28NEKWlp018414; Fri, 23 Sep 2022 14:20:32 GMT Received: from pps.reinject (localhost [127.0.0.1]) by NASANPPMTA05.qualcomm.com (PPS) with ESMTPS id 3jsbsa1gh5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Sep 2022 14:20:32 +0000 Received: from NASANPPMTA05.qualcomm.com (NASANPPMTA05.qualcomm.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28NEKW4p018408; Fri, 23 Sep 2022 14:20:32 GMT Received: from nasanex01c.na.qualcomm.com (nasanex01c.na.qualcomm.com [10.45.79.139]) by NASANPPMTA05.qualcomm.com (PPS) with ESMTPS id 28NEKWlj018407 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Sep 2022 14:20:32 +0000 Received: from [10.216.63.150] (10.80.80.8) by nasanex01c.na.qualcomm.com (10.45.79.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Fri, 23 Sep 2022 07:20:23 -0700 Message-ID: Date: Fri, 23 Sep 2022 19:50:04 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete Content-Language: en-US To: Peter Zijlstra , Waiman Long CC: Tejun Heo , Jing-Ting Wu , Valentin Schneider , , , , , , "chris.redpath@arm.com" , Dietmar Eggemann , Vincent Donnefort , Ingo Molnar , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Christian Brauner , , , References: <88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com> <203d4614c1b2a498a240ace287156e9f401d5395.camel@mediatek.com> <02b8e7b3-941d-8bb9-cd0e-992738893ba3@redhat.com> <36a73401-7011-834a-7949-c65a2f66246c@redhat.com> From: Mukesh Ojha In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nasanex01c.na.qualcomm.com (10.45.79.139) X-QCInternal: smtphost X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: EClpNyz9Q2SNicq3qJPb5J2Yh_clS8D0 X-Proofpoint-GUID: EClpNyz9Q2SNicq3qJPb5J2Yh_clS8D0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-23_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 impostorscore=0 priorityscore=1501 malwarescore=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 mlxscore=0 mlxlogscore=572 adultscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209230094 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On 9/7/2022 2:20 AM, Peter Zijlstra wrote: > On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote: > > I've not followed the earlier stuff due to being unreadable; just > reacting to this.. We are able to reproduce this issue explained at this link https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/ > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 838623b68031..5d9ea1553ec0 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct >> task_struct *p, >>                 if (cpumask_equal(&p->cpus_mask, new_mask)) >>                         goto out; >> >> -               if (WARN_ON_ONCE(p == current && >> -                                is_migration_disabled(p) && >> -                                !cpumask_test_cpu(task_cpu(p), new_mask))) >> { >> +               if (is_migration_disabled(p) && >> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) { >> +                       WARN_ON_ONCE(p == current); >>                         ret = -EBUSY; >>                         goto out; >>                 } >> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct >> task_struct *p, >>         if (flags & SCA_USER) >>                 user_mask = clear_user_cpus_ptr(p); >> >> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags); >> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) { >> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags); >> +       } else { >> +               task_rq_unlock(rq, p, rf); >> +       } > > This cannot be right. There might be previous set_cpus_allowed_ptr() > callers that are blocked and waiting for the task to land on a valid > CPU. > Was thinking if just skipping as below will help here, well i am not sure . But thinking what if we keep the task as it is on the same cpu and let's wait for migration to be enabled for the task to take care of it later. ------------------->O------------------------------------------ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d90d37c..7717733 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data) * we're holding p->pi_lock. */ if (task_rq(p) == rq) { - if (is_migration_disabled(p)) + if (is_migration_disabled(p)) { + complete = true; goto out; + } if (pending) { -Mukesh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4625EC6FA82 for ; Fri, 23 Sep 2022 14:35:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:From:References:CC:To:Subject: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=GGOuBgCsCfl4DQs5QVjiMXTKu3kqoXJqAcMYNtKEBBM=; b=GA4mdbgWyi216I 94G1NsIi52pEtbLKeCt99pBGGgzCg1Lt55EvtUjNfw6H5GKyO/T7uI84mwFRmDL/dHWmpRSDTL/BG 2KcnyOMMyqMzhYV8gZHreKo39+shm5mkpjrkYgFqJPp/vABesIy0tE0k6sVJIrjjYFnOugLPzu1h6 TXrr2Y48/n0NWJW2ypwdR/Ii+6r390rRbvSIKiDV82+4Gt9HLTXmVeuyhMdzdjO9JkMRy/nZyY6T9 OcXRhQEPSjHk5NZiZsU94bjeldvfnMbQKxfAw+0qGXL4hlQ66iZduKc6f97fM2Iw6yVX/LSb3LBLv 2j/5LEGduIJoK2zZH0Iw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1objl8-004e9i-VK; Fri, 23 Sep 2022 14:34:31 +0000 Received: from mx0a-0031df01.pphosted.com ([205.220.168.131]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1objl5-004e9E-DQ; Fri, 23 Sep 2022 14:34:28 +0000 Received: from pps.filterd (m0279864.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 28NDx8tV026081; Fri, 23 Sep 2022 14:33:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=qcppdkim1; bh=Amh2W25vG622JwDCDmZuOupC0Zg1Foj/eEoyphdMAS8=; b=ZDQ0t2RdHHOc4K74nEIKRrclnpRoQIPoDvtRY6RsiNluzgAwDRT97zKmuaj+brDB4iJk TSZK5woDXrUMYWdGcgy6RQMuicy4qzguj/E2SUyPWvSav1KYPNcgVtAZUu8JQGIrHiiM xijtdqdeFn4++LpPiTQzqiKJ3/FbjEwp7icbAtuYLGCmGpGhDdEYMLB5sLOPEth2dK3j MUboDEuKZgDDcx3w8l8rYSdjnvUB0F8XZZCsNCora26Kesu/KEiz0lYnZqpJGNJuB6vR QXEcVjD+qVYo9NtecmpjeGtlCDJBkm61wnc7x9NenNEp33cJY7kDqjbJhCZVfXAR4DRX 5g== Received: from nasanppmta05.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3js67nhxys-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Sep 2022 14:33:58 +0000 Received: from pps.filterd (NASANPPMTA05.qualcomm.com [127.0.0.1]) by NASANPPMTA05.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTP id 28NEKWlp018414; Fri, 23 Sep 2022 14:20:32 GMT Received: from pps.reinject (localhost [127.0.0.1]) by NASANPPMTA05.qualcomm.com (PPS) with ESMTPS id 3jsbsa1gh5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Sep 2022 14:20:32 +0000 Received: from NASANPPMTA05.qualcomm.com (NASANPPMTA05.qualcomm.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 28NEKW4p018408; Fri, 23 Sep 2022 14:20:32 GMT Received: from nasanex01c.na.qualcomm.com (nasanex01c.na.qualcomm.com [10.45.79.139]) by NASANPPMTA05.qualcomm.com (PPS) with ESMTPS id 28NEKWlj018407 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Sep 2022 14:20:32 +0000 Received: from [10.216.63.150] (10.80.80.8) by nasanex01c.na.qualcomm.com (10.45.79.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Fri, 23 Sep 2022 07:20:23 -0700 Message-ID: Date: Fri, 23 Sep 2022 19:50:04 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete Content-Language: en-US To: Peter Zijlstra , Waiman Long CC: Tejun Heo , Jing-Ting Wu , Valentin Schneider , , , , , , "chris.redpath@arm.com" , Dietmar Eggemann , Vincent Donnefort , Ingo Molnar , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Christian Brauner , , , References: <88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com> <203d4614c1b2a498a240ace287156e9f401d5395.camel@mediatek.com> <02b8e7b3-941d-8bb9-cd0e-992738893ba3@redhat.com> <36a73401-7011-834a-7949-c65a2f66246c@redhat.com> From: Mukesh Ojha In-Reply-To: X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nasanex01c.na.qualcomm.com (10.45.79.139) X-QCInternal: smtphost X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: EClpNyz9Q2SNicq3qJPb5J2Yh_clS8D0 X-Proofpoint-GUID: EClpNyz9Q2SNicq3qJPb5J2Yh_clS8D0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-23_04,2022-09-22_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 impostorscore=0 priorityscore=1501 malwarescore=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 mlxscore=0 mlxlogscore=572 adultscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2209230094 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220923_073427_510362_1CFF908A X-CRM114-Status: GOOD ( 22.32 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: base64 Content-Type: text/plain; charset="utf-8"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org SGkgUGV0ZXIsCgoKT24gOS83LzIwMjIgMjoyMCBBTSwgUGV0ZXIgWmlqbHN0cmEgd3JvdGU6Cj4g T24gVHVlLCBTZXAgMDYsIDIwMjIgYXQgMDQ6NDA6MDNQTSAtMDQwMCwgV2FpbWFuIExvbmcgd3Jv dGU6Cj4gCj4gSSd2ZSBub3QgZm9sbG93ZWQgdGhlIGVhcmxpZXIgc3R1ZmYgZHVlIHRvIGJlaW5n IHVucmVhZGFibGU7IGp1c3QKPiByZWFjdGluZyB0byB0aGlzLi4KCldlIGFyZSBhYmxlIHRvIHJl cHJvZHVjZSB0aGlzIGlzc3VlIGV4cGxhaW5lZCBhdCB0aGlzIGxpbmsKCmh0dHBzOi8vbG9yZS5r ZXJuZWwub3JnL2xrbWwvODhiMjkxMDE4MWJkYTk1NWFjNDYwMTFiNjk1YzUzZjdkYTM5YWM0Ny5j YW1lbEBtZWRpYXRlay5jb20vCgoKPiAKPj4gZGlmZiAtLWdpdCBhL2tlcm5lbC9zY2hlZC9jb3Jl LmMgYi9rZXJuZWwvc2NoZWQvY29yZS5jCj4+IGluZGV4IDgzODYyM2I2ODAzMS4uNWQ5ZWExNTUz ZWMwIDEwMDY0NAo+PiAtLS0gYS9rZXJuZWwvc2NoZWQvY29yZS5jCj4+ICsrKyBiL2tlcm5lbC9z Y2hlZC9jb3JlLmMKPj4gQEAgLTI3OTQsOSArMjc5NCw5IEBAIHN0YXRpYyBpbnQgX19zZXRfY3B1 c19hbGxvd2VkX3B0cl9sb2NrZWQoc3RydWN0Cj4+IHRhc2tfc3RydWN0ICpwLAo+PiAgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIGlmIChjcHVtYXNrX2VxdWFsKCZwLT5jcHVzX21hc2ss IG5ld19tYXNrKSkKPj4gIMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqAgZ290byBvdXQ7Cj4+Cj4+IC3CoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIGlmIChX QVJOX09OX09OQ0UocCA9PSBjdXJyZW50ICYmCj4+IC3CoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBpc19taWdyYXRpb25fZGlzYWJs ZWQocCkgJiYKPj4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgICFjcHVtYXNrX3Rlc3RfY3B1KHRhc2tfY3B1KHApLCBuZXdfbWFz aykpKQo+PiB7Cj4+ICvCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIGlmIChpc19taWdyYXRp b25fZGlzYWJsZWQocCkgJiYKPj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oCAhY3B1bWFza190ZXN0X2NwdSh0YXNrX2NwdShwKSwgbmV3X21hc2spKSB7Cj4+ICvCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBXQVJOX09OX09OQ0UocCA9PSBj dXJyZW50KTsKPj4gIMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqAgcmV0ID0gLUVCVVNZOwo+PiAgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoCBnb3RvIG91dDsKPj4gIMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCB9 Cj4+IEBAIC0yODE4LDcgKzI4MTgsMTEgQEAgc3RhdGljIGludCBfX3NldF9jcHVzX2FsbG93ZWRf cHRyX2xvY2tlZChzdHJ1Y3QKPj4gdGFza19zdHJ1Y3QgKnAsCj4+ICDCoMKgwqDCoMKgwqDCoCBp ZiAoZmxhZ3MgJiBTQ0FfVVNFUikKPj4gIMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCB1 c2VyX21hc2sgPSBjbGVhcl91c2VyX2NwdXNfcHRyKHApOwo+Pgo+PiAtwqDCoMKgwqDCoMKgIHJl dCA9IGFmZmluZV9tb3ZlX3Rhc2socnEsIHAsIHJmLCBkZXN0X2NwdSwgZmxhZ3MpOwo+PiArwqDC oMKgwqDCoMKgIGlmICghaXNfbWlncmF0aW9uX2Rpc2FibGVkKHApIHx8IChmbGFncyAmIFNDQV9N SUdSQVRFX0VOQUJMRSkpIHsKPj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgcmV0ID0g YWZmaW5lX21vdmVfdGFzayhycSwgcCwgcmYsIGRlc3RfY3B1LCBmbGFncyk7Cj4+ICvCoMKgwqDC oMKgwqAgfSBlbHNlIHsKPj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgdGFza19ycV91 bmxvY2socnEsIHAsIHJmKTsKPj4gK8KgwqDCoMKgwqDCoCB9Cj4gCj4gVGhpcyBjYW5ub3QgYmUg cmlnaHQuIFRoZXJlIG1pZ2h0IGJlIHByZXZpb3VzIHNldF9jcHVzX2FsbG93ZWRfcHRyKCkKPiBj YWxsZXJzIHRoYXQgYXJlIGJsb2NrZWQgYW5kIHdhaXRpbmcgZm9yIHRoZSB0YXNrIHRvIGxhbmQg b24gYSB2YWxpZAo+IENQVS4KPiAKCldhcyB0aGlua2luZyBpZiBqdXN0IHNraXBwaW5nIGFzIGJl bG93IHdpbGwgaGVscCBoZXJlLCB3ZWxsIGkgYW0gbm90IHN1cmUgLgoKQnV0IHRoaW5raW5nIHdo YXQgaWYgd2Uga2VlcCB0aGUgdGFzayBhcyBpdCBpcyBvbiB0aGUgc2FtZSBjcHUgYW5kIGxldCdz IAp3YWl0IGZvciBtaWdyYXRpb24gdG8gYmUgZW5hYmxlZCBmb3IgdGhlIHRhc2sgdG8gdGFrZSBj YXJlIG9mIGl0IGxhdGVyLgoKLS0tLS0tLS0tLS0tLS0tLS0tLT5PLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCgpkaWZmIC0tZ2l0IGEva2VybmVsL3NjaGVkL2NvcmUu YyBiL2tlcm5lbC9zY2hlZC9jb3JlLmMKaW5kZXggZDkwZDM3Yy4uNzcxNzczMyAxMDA2NDQKLS0t IGEva2VybmVsL3NjaGVkL2NvcmUuYworKysgYi9rZXJuZWwvc2NoZWQvY29yZS5jCkBAIC0yMzkw LDggKzIzOTAsMTAgQEAgc3RhdGljIGludCBtaWdyYXRpb25fY3B1X3N0b3Aodm9pZCAqZGF0YSkK ICAgICAgICAgICogd2UncmUgaG9sZGluZyBwLT5waV9sb2NrLgogICAgICAgICAgKi8KICAgICAg ICAgaWYgKHRhc2tfcnEocCkgPT0gcnEpIHsKLSAgICAgICAgICAgICAgIGlmIChpc19taWdyYXRp b25fZGlzYWJsZWQocCkpCisgICAgICAgICAgICAgICBpZiAoaXNfbWlncmF0aW9uX2Rpc2FibGVk KHApKSB7CisgICAgICAgICAgICAgICAgICAgICAgIGNvbXBsZXRlID0gdHJ1ZTsKICAgICAgICAg ICAgICAgICAgICAgICAgIGdvdG8gb3V0OworICAgICAgICAgICAgICAgfQoKICAgICAgICAgICAg ICAgICBpZiAocGVuZGluZykgewoKCi1NdWtlc2gKCl9fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fCmxpbnV4LWFybS1rZXJuZWwgbWFpbGluZyBsaXN0CmxpbnV4 LWFybS1rZXJuZWxAbGlzdHMuaW5mcmFkZWFkLm9yZwpodHRwOi8vbGlzdHMuaW5mcmFkZWFkLm9y Zy9tYWlsbWFuL2xpc3RpbmZvL2xpbnV4LWFybS1rZXJuZWwK From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mukesh Ojha Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete Date: Fri, 23 Sep 2022 19:50:04 +0530 Message-ID: References: <88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com> <203d4614c1b2a498a240ace287156e9f401d5395.camel@mediatek.com> <02b8e7b3-941d-8bb9-cd0e-992738893ba3@redhat.com> <36a73401-7011-834a-7949-c65a2f66246c@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=qcppdkim1; bh=Amh2W25vG622JwDCDmZuOupC0Zg1Foj/eEoyphdMAS8=; b=ZDQ0t2RdHHOc4K74nEIKRrclnpRoQIPoDvtRY6RsiNluzgAwDRT97zKmuaj+brDB4iJk TSZK5woDXrUMYWdGcgy6RQMuicy4qzguj/E2SUyPWvSav1KYPNcgVtAZUu8JQGIrHiiM xijtdqdeFn4++LpPiTQzqiKJ3/FbjEwp7icbAtuYLGCmGpGhDdEYMLB5sLOPEth2dK3j MUboDEuKZgDDcx3w8l8rYSdjnvUB0F8XZZCsNCora26Kesu/KEiz0lYnZqpJGNJuB6vR QXEcVjD+qVYo9NtecmpjeGtlCDJBkm61wnc7x9NenNEp33cJY7kDqjbJhCZVfXAR4DRX 5g== Content-Language: en-US In-Reply-To: List-ID: Content-Type: text/plain; charset="utf-8"; format="flowed" To: Peter Zijlstra , Waiman Long Cc: Tejun Heo , Jing-Ting Wu , Valentin Schneider , wsd_upstream-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, Jonathan.JMChen-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org, "chris.redpath-5wv7dgnIgG8@public.gmane.org" , Dietmar Eggemann , Vincent Donnefort , Ingo Molnar , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Christian Brauner , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, lixiong.liu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org, wenju.xu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org Hi Peter, On 9/7/2022 2:20 AM, Peter Zijlstra wrote: > On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote: > > I've not followed the earlier stuff due to being unreadable; just > reacting to this.. We are able to reproduce this issue explained at this link https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org/ > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 838623b68031..5d9ea1553ec0 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct >> task_struct *p, >>                 if (cpumask_equal(&p->cpus_mask, new_mask)) >>                         goto out; >> >> -               if (WARN_ON_ONCE(p == current && >> -                                is_migration_disabled(p) && >> -                                !cpumask_test_cpu(task_cpu(p), new_mask))) >> { >> +               if (is_migration_disabled(p) && >> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) { >> +                       WARN_ON_ONCE(p == current); >>                         ret = -EBUSY; >>                         goto out; >>                 } >> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct >> task_struct *p, >>         if (flags & SCA_USER) >>                 user_mask = clear_user_cpus_ptr(p); >> >> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags); >> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) { >> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags); >> +       } else { >> +               task_rq_unlock(rq, p, rf); >> +       } > > This cannot be right. There might be previous set_cpus_allowed_ptr() > callers that are blocked and waiting for the task to land on a valid > CPU. > Was thinking if just skipping as below will help here, well i am not sure . But thinking what if we keep the task as it is on the same cpu and let's wait for migration to be enabled for the task to take care of it later. ------------------->O------------------------------------------ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d90d37c..7717733 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data) * we're holding p->pi_lock. */ if (task_rq(p) == rq) { - if (is_migration_disabled(p)) + if (is_migration_disabled(p)) { + complete = true; goto out; + } if (pending) { -Mukesh