From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71016C11F8D for ; Tue, 6 Jul 2021 11:39:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 628E861DEB for ; Tue, 6 Jul 2021 11:39:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239102AbhGFLkb (ORCPT ); Tue, 6 Jul 2021 07:40:31 -0400 Received: from mail.kernel.org ([198.145.29.99]:42374 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235068AbhGFL3f (ORCPT ); Tue, 6 Jul 2021 07:29:35 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 1329B61D6F; Tue, 6 Jul 2021 11:20:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1625570434; bh=DdM/QtZXD7pf1Ou7w+R2eAax2FtnYf4YG/2ZosDBU5k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tGYx6F0XNQz0xy+024tA8BtMteFrOiJGs4be3fKWNHiPbEuKFQ4qXIECHbxg0izyd g1thPgDB8KWIpVcV9stBRqXsf74CP/fAB++bNVR8GBo+FzqLdN5yhH/g8pBUmXNYC1 A/76r/IrYm9PJW3+Lpdw+sLoKzWJoL9cyDLYbvJkGJyFjWn2/wLfKghOLbaRXduYSX Cr41PAMum8W4CwJOpH8ok0SWHToxkfoBjx3A2AzIuTaRtjBGuovWq7bgYLer72EPst Bwh7Pk1S4nr26zHRIYhP8Nv/TTChSGhnAAc1Tsv99FI4ayNpSsaGdfzU1NE+PLQsyJ VsHi9NwmkNPsQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Amber Lin , Felix Kuehling , Alex Deucher , Sasha Levin , amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: [PATCH AUTOSEL 5.12 096/160] drm/amdkfd: Fix circular lock in nocpsch path Date: Tue, 6 Jul 2021 07:17:22 -0400 Message-Id: <20210706111827.2060499-96-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210706111827.2060499-1-sashal@kernel.org> References: <20210706111827.2060499-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Amber Lin [ Upstream commit a7b2451d31cfa2e8aeccf3b35612ce33f02371fc ] Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a circular lock. destroy_queue_nocpsch_locked is called under a DQM lock, which is taken in MMU notifiers, potentially in FS reclaim context. Taking another lock, which is BO reservation lock from free_mqd, while causing an FS reclaim inside the DQM lock creates a problematic circular lock dependency. Therefore move free_mqd out of destroy_queue_nocpsch_locked and call it after unlocking DQM. Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index df05eca73275..3d66565a618f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -486,9 +486,6 @@ static int destroy_queue_nocpsch_locked(struct device_queue_manager *dqm, if (retval == -ETIME) qpd->reset_wavefronts = true; - - mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); - list_del(&q->list); if (list_empty(&qpd->queues_list)) { if (qpd->reset_wavefronts) { @@ -523,6 +520,8 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, int retval; uint64_t sdma_val = 0; struct kfd_process_device *pdd = qpd_to_pdd(qpd); + struct mqd_manager *mqd_mgr = + dqm->mqd_mgrs[get_mqd_type_from_queue_type(q->properties.type)]; /* Get the SDMA queue stats */ if ((q->properties.type == KFD_QUEUE_TYPE_SDMA) || @@ -540,6 +539,8 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, pdd->sdma_past_activity_counter += sdma_val; dqm_unlock(dqm); + mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); + return retval; } @@ -1632,7 +1633,7 @@ static int set_trap_handler(struct device_queue_manager *dqm, static int process_termination_nocpsch(struct device_queue_manager *dqm, struct qcm_process_device *qpd) { - struct queue *q, *next; + struct queue *q; struct device_process_node *cur, *next_dpn; int retval = 0; bool found = false; @@ -1640,12 +1641,19 @@ static int process_termination_nocpsch(struct device_queue_manager *dqm, dqm_lock(dqm); /* Clear all user mode queues */ - list_for_each_entry_safe(q, next, &qpd->queues_list, list) { + while (!list_empty(&qpd->queues_list)) { + struct mqd_manager *mqd_mgr; int ret; + q = list_first_entry(&qpd->queues_list, struct queue, list); + mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type( + q->properties.type)]; ret = destroy_queue_nocpsch_locked(dqm, qpd, q); if (ret) retval = ret; + dqm_unlock(dqm); + mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); + dqm_lock(dqm); } /* Unregister process */ -- 2.30.2 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE2AFC11F6E for ; Tue, 6 Jul 2021 11:20:37 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B219561DA8 for ; Tue, 6 Jul 2021 11:20:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B219561DA8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A6386E364; Tue, 6 Jul 2021 11:20:36 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1BC3F6E342; Tue, 6 Jul 2021 11:20:35 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 1329B61D6F; Tue, 6 Jul 2021 11:20:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1625570434; bh=DdM/QtZXD7pf1Ou7w+R2eAax2FtnYf4YG/2ZosDBU5k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tGYx6F0XNQz0xy+024tA8BtMteFrOiJGs4be3fKWNHiPbEuKFQ4qXIECHbxg0izyd g1thPgDB8KWIpVcV9stBRqXsf74CP/fAB++bNVR8GBo+FzqLdN5yhH/g8pBUmXNYC1 A/76r/IrYm9PJW3+Lpdw+sLoKzWJoL9cyDLYbvJkGJyFjWn2/wLfKghOLbaRXduYSX Cr41PAMum8W4CwJOpH8ok0SWHToxkfoBjx3A2AzIuTaRtjBGuovWq7bgYLer72EPst Bwh7Pk1S4nr26zHRIYhP8Nv/TTChSGhnAAc1Tsv99FI4ayNpSsaGdfzU1NE+PLQsyJ VsHi9NwmkNPsQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH AUTOSEL 5.12 096/160] drm/amdkfd: Fix circular lock in nocpsch path Date: Tue, 6 Jul 2021 07:17:22 -0400 Message-Id: <20210706111827.2060499-96-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210706111827.2060499-1-sashal@kernel.org> References: <20210706111827.2060499-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sasha Levin , Amber Lin , Felix Kuehling , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, Alex Deucher Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Amber Lin [ Upstream commit a7b2451d31cfa2e8aeccf3b35612ce33f02371fc ] Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a circular lock. destroy_queue_nocpsch_locked is called under a DQM lock, which is taken in MMU notifiers, potentially in FS reclaim context. Taking another lock, which is BO reservation lock from free_mqd, while causing an FS reclaim inside the DQM lock creates a problematic circular lock dependency. Therefore move free_mqd out of destroy_queue_nocpsch_locked and call it after unlocking DQM. Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index df05eca73275..3d66565a618f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -486,9 +486,6 @@ static int destroy_queue_nocpsch_locked(struct device_queue_manager *dqm, if (retval == -ETIME) qpd->reset_wavefronts = true; - - mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); - list_del(&q->list); if (list_empty(&qpd->queues_list)) { if (qpd->reset_wavefronts) { @@ -523,6 +520,8 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, int retval; uint64_t sdma_val = 0; struct kfd_process_device *pdd = qpd_to_pdd(qpd); + struct mqd_manager *mqd_mgr = + dqm->mqd_mgrs[get_mqd_type_from_queue_type(q->properties.type)]; /* Get the SDMA queue stats */ if ((q->properties.type == KFD_QUEUE_TYPE_SDMA) || @@ -540,6 +539,8 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, pdd->sdma_past_activity_counter += sdma_val; dqm_unlock(dqm); + mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); + return retval; } @@ -1632,7 +1633,7 @@ static int set_trap_handler(struct device_queue_manager *dqm, static int process_termination_nocpsch(struct device_queue_manager *dqm, struct qcm_process_device *qpd) { - struct queue *q, *next; + struct queue *q; struct device_process_node *cur, *next_dpn; int retval = 0; bool found = false; @@ -1640,12 +1641,19 @@ static int process_termination_nocpsch(struct device_queue_manager *dqm, dqm_lock(dqm); /* Clear all user mode queues */ - list_for_each_entry_safe(q, next, &qpd->queues_list, list) { + while (!list_empty(&qpd->queues_list)) { + struct mqd_manager *mqd_mgr; int ret; + q = list_first_entry(&qpd->queues_list, struct queue, list); + mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type( + q->properties.type)]; ret = destroy_queue_nocpsch_locked(dqm, qpd, q); if (ret) retval = ret; + dqm_unlock(dqm); + mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); + dqm_lock(dqm); } /* Unregister process */ -- 2.30.2 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 970F6C11F67 for ; Tue, 6 Jul 2021 11:20:37 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6454F61D97 for ; Tue, 6 Jul 2021 11:20:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6454F61D97 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=amd-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D91FA6E39B; Tue, 6 Jul 2021 11:20:36 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1BC3F6E342; Tue, 6 Jul 2021 11:20:35 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 1329B61D6F; Tue, 6 Jul 2021 11:20:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1625570434; bh=DdM/QtZXD7pf1Ou7w+R2eAax2FtnYf4YG/2ZosDBU5k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tGYx6F0XNQz0xy+024tA8BtMteFrOiJGs4be3fKWNHiPbEuKFQ4qXIECHbxg0izyd g1thPgDB8KWIpVcV9stBRqXsf74CP/fAB++bNVR8GBo+FzqLdN5yhH/g8pBUmXNYC1 A/76r/IrYm9PJW3+Lpdw+sLoKzWJoL9cyDLYbvJkGJyFjWn2/wLfKghOLbaRXduYSX Cr41PAMum8W4CwJOpH8ok0SWHToxkfoBjx3A2AzIuTaRtjBGuovWq7bgYLer72EPst Bwh7Pk1S4nr26zHRIYhP8Nv/TTChSGhnAAc1Tsv99FI4ayNpSsaGdfzU1NE+PLQsyJ VsHi9NwmkNPsQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH AUTOSEL 5.12 096/160] drm/amdkfd: Fix circular lock in nocpsch path Date: Tue, 6 Jul 2021 07:17:22 -0400 Message-Id: <20210706111827.2060499-96-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210706111827.2060499-1-sashal@kernel.org> References: <20210706111827.2060499-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sasha Levin , Amber Lin , Felix Kuehling , dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, Alex Deucher Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" From: Amber Lin [ Upstream commit a7b2451d31cfa2e8aeccf3b35612ce33f02371fc ] Calling free_mqd inside of destroy_queue_nocpsch_locked can cause a circular lock. destroy_queue_nocpsch_locked is called under a DQM lock, which is taken in MMU notifiers, potentially in FS reclaim context. Taking another lock, which is BO reservation lock from free_mqd, while causing an FS reclaim inside the DQM lock creates a problematic circular lock dependency. Therefore move free_mqd out of destroy_queue_nocpsch_locked and call it after unlocking DQM. Signed-off-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index df05eca73275..3d66565a618f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -486,9 +486,6 @@ static int destroy_queue_nocpsch_locked(struct device_queue_manager *dqm, if (retval == -ETIME) qpd->reset_wavefronts = true; - - mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); - list_del(&q->list); if (list_empty(&qpd->queues_list)) { if (qpd->reset_wavefronts) { @@ -523,6 +520,8 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, int retval; uint64_t sdma_val = 0; struct kfd_process_device *pdd = qpd_to_pdd(qpd); + struct mqd_manager *mqd_mgr = + dqm->mqd_mgrs[get_mqd_type_from_queue_type(q->properties.type)]; /* Get the SDMA queue stats */ if ((q->properties.type == KFD_QUEUE_TYPE_SDMA) || @@ -540,6 +539,8 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, pdd->sdma_past_activity_counter += sdma_val; dqm_unlock(dqm); + mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); + return retval; } @@ -1632,7 +1633,7 @@ static int set_trap_handler(struct device_queue_manager *dqm, static int process_termination_nocpsch(struct device_queue_manager *dqm, struct qcm_process_device *qpd) { - struct queue *q, *next; + struct queue *q; struct device_process_node *cur, *next_dpn; int retval = 0; bool found = false; @@ -1640,12 +1641,19 @@ static int process_termination_nocpsch(struct device_queue_manager *dqm, dqm_lock(dqm); /* Clear all user mode queues */ - list_for_each_entry_safe(q, next, &qpd->queues_list, list) { + while (!list_empty(&qpd->queues_list)) { + struct mqd_manager *mqd_mgr; int ret; + q = list_first_entry(&qpd->queues_list, struct queue, list); + mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type( + q->properties.type)]; ret = destroy_queue_nocpsch_locked(dqm, qpd, q); if (ret) retval = ret; + dqm_unlock(dqm); + mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj); + dqm_lock(dqm); } /* Unregister process */ -- 2.30.2 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx