All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sasha Levin <sashal@kernel.org>,
	airlied@linux.ie, Felix Kuehling <Felix.Kuehling@amd.com>,
	Xinhui.Pan@amd.com, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org,
	Alex Deucher <alexander.deucher@amd.com>,
	christian.koenig@amd.com, shaoyunl <shaoyun.liu@amd.com>
Subject: [PATCH AUTOSEL 5.15 25/39] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
Date: Thu, 25 Nov 2021 21:31:42 -0500	[thread overview]
Message-ID: <20211126023156.441292-25-sashal@kernel.org> (raw)
In-Reply-To: <20211126023156.441292-1-sashal@kernel.org>

From: shaoyunl <shaoyun.liu@amd.com>

[ Upstream commit 2cf49e00d40d5132e3d067b5aa6d84791929ab15 ]

In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f8fce9d05f50c..4f2e0cc8a51a8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
 	bool hanging;
 
 	dqm_lock(dqm);
+	if (!dqm->sched_running) {
+		dqm_unlock(dqm);
+		return 0;
+	}
+
 	if (!dqm->is_hws_hang)
 		unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
 	hanging = dqm->is_hws_hang || dqm->is_resetting;
-- 
2.33.0


WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sasha Levin <sashal@kernel.org>,
	airlied@linux.ie, Felix Kuehling <Felix.Kuehling@amd.com>,
	Xinhui.Pan@amd.com, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, daniel@ffwll.ch,
	Alex Deucher <alexander.deucher@amd.com>,
	christian.koenig@amd.com, shaoyunl <shaoyun.liu@amd.com>
Subject: [PATCH AUTOSEL 5.15 25/39] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
Date: Thu, 25 Nov 2021 21:31:42 -0500	[thread overview]
Message-ID: <20211126023156.441292-25-sashal@kernel.org> (raw)
In-Reply-To: <20211126023156.441292-1-sashal@kernel.org>

From: shaoyunl <shaoyun.liu@amd.com>

[ Upstream commit 2cf49e00d40d5132e3d067b5aa6d84791929ab15 ]

In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f8fce9d05f50c..4f2e0cc8a51a8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
 	bool hanging;
 
 	dqm_lock(dqm);
+	if (!dqm->sched_running) {
+		dqm_unlock(dqm);
+		return 0;
+	}
+
 	if (!dqm->is_hws_hang)
 		unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
 	hanging = dqm->is_hws_hang || dqm->is_resetting;
-- 
2.33.0


WARNING: multiple messages have this Message-ID (diff)
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: shaoyunl <shaoyun.liu@amd.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	christian.koenig@amd.com, Xinhui.Pan@amd.com, airlied@linux.ie,
	daniel@ffwll.ch, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.15 25/39] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
Date: Thu, 25 Nov 2021 21:31:42 -0500	[thread overview]
Message-ID: <20211126023156.441292-25-sashal@kernel.org> (raw)
In-Reply-To: <20211126023156.441292-1-sashal@kernel.org>

From: shaoyunl <shaoyun.liu@amd.com>

[ Upstream commit 2cf49e00d40d5132e3d067b5aa6d84791929ab15 ]

In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f8fce9d05f50c..4f2e0cc8a51a8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
 	bool hanging;
 
 	dqm_lock(dqm);
+	if (!dqm->sched_running) {
+		dqm_unlock(dqm);
+		return 0;
+	}
+
 	if (!dqm->is_hws_hang)
 		unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
 	hanging = dqm->is_hws_hang || dqm->is_resetting;
-- 
2.33.0


  parent reply	other threads:[~2021-11-26  2:32 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-26  2:31 [PATCH AUTOSEL 5.15 01/39] gfs2: release iopen glock early in evict Sasha Levin
2021-11-26  2:31 ` [Cluster-devel] " Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 02/39] gfs2: Fix length of holes reported at end-of-file Sasha Levin
2021-11-26  2:31   ` [Cluster-devel] " Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 03/39] powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory" Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 04/39] powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 05/39] drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 06/39] mac80211: do not access the IV when it was stripped Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 07/39] mac80211: fix throughput LED trigger Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 08/39] x86/hyperv: Move required MSRs check to initial platform probing Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 09/39] tun: fix bonding active backup with arp monitoring Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 10/39] net/smc: Transfer remaining wait queue entries during fallback Sasha Levin
2021-11-26  2:51   ` Jakub Kicinski
2021-12-03 18:20     ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 11/39] atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 12/39] net: return correct error code Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 13/39] pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 14/39] blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 15/39] platform/x86: dell-wmi-descriptor: disable by default Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 16/39] platform/x86: thinkpad_acpi: Add support for dual fan control Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 17/39] platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 18/39] s390/setup: avoid using memblock_enforce_memory_limit Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 19/39] btrfs: silence lockdep when reading chunk tree during mount Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 20/39] btrfs: check-integrity: fix a warning on write caching disabled disk Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 21/39] thermal: core: Reset previous low and high trip during thermal zone init Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 22/39] scsi: iscsi: Unblock session then wake up error handler Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 23/39] net: usb: r8152: Add MAC passthrough support for more Lenovo Docks Sasha Levin
2021-11-28  9:49   ` Sergey Shtylyov
2021-11-28  9:50     ` Sergey Shtylyov
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 24/39] drm/amd/pm: Remove artificial freq level on Navi1x Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` Sasha Levin [this message]
2021-11-26  2:31   ` [PATCH AUTOSEL 5.15 25/39] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 26/39] drm/amd/amdgpu: fix potential memleak Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 27/39] ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile Sasha Levin
2021-11-29 14:46   ` Limonciello, Mario
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 28/39] ata: libahci: Adjust behavior when StorageD3Enable _DSD is set Sasha Levin
2021-11-29 14:46   ` Limonciello, Mario
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 29/39] ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port() Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 30/39] ipv6: check return value of ipv6_skip_exthdr Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 31/39] net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 32/39] net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 33/39] perf sort: Fix the 'weight' sort key behavior Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 34/39] perf sort: Fix the 'ins_lat' " Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 35/39] perf sort: Fix the 'p_stage_cyc' " Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 36/39] perf inject: Fix ARM SPE handling Sasha Levin
2021-11-26  2:31   ` Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 37/39] perf hist: Fix memory leak of a perf_hpp_fmt Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 38/39] perf report: Fix memory leaks around perf_tip() Sasha Levin
2021-11-26  2:31 ` [PATCH AUTOSEL 5.15 39/39] tracing: Don't use out-of-sync va_list in event printing Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211126023156.441292-25-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Felix.Kuehling@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@linux.ie \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaoyun.liu@amd.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.