From: Oded Gabbay <ogabbay@kernel.org>
To: linux-kernel@vger.kernel.org
Cc: Yuri Nudelman <ynudelman@habana.ai>
Subject: [PATCH 3/3] debugfs: add skip_reset_on_timeout option
Date: Thu, 17 Jun 2021 17:40:48 +0300 [thread overview]
Message-ID: <20210617144048.4655-3-ogabbay@kernel.org> (raw)
In-Reply-To: <20210617144048.4655-1-ogabbay@kernel.org>
From: Yuri Nudelman <ynudelman@habana.ai>
To be able to debug long-running CS better, without changing the
userspace code, we are adding a new option through debugfs interface
to skip the reset of the device in case of CS timeout.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
Documentation/ABI/testing/debugfs-driver-habanalabs | 8 ++++++++
drivers/misc/habanalabs/common/command_submission.c | 1 +
drivers/misc/habanalabs/common/debugfs.c | 5 +++++
drivers/misc/habanalabs/common/habanalabs.h | 3 +++
4 files changed, 17 insertions(+)
diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs
index c78fc9282876..e78ceb1f70b3 100644
--- a/Documentation/ABI/testing/debugfs-driver-habanalabs
+++ b/Documentation/ABI/testing/debugfs-driver-habanalabs
@@ -207,6 +207,14 @@ Contact: ogabbay@kernel.org
Description: Sets the PCI power state. Valid values are "1" for D0 and "2"
for D3Hot
+What: /sys/kernel/debug/habanalabs/hl<n>/skip_reset_on_timeout
+Date: Jun 2021
+KernelVersion: 5.13
+Contact: ynudelman@habana.ai
+Description: Sets the skip reset on timeout option for the device. Value of
+ "0" means device will be reset in case some CS has timed out,
+ otherwise it will not be reset.
+
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
Date: Mar 2020
KernelVersion: 5.6
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 6d51f54030c1..adedb288d452 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -663,6 +663,7 @@ static int allocate_cs(struct hl_device *hdev, struct hl_ctx *ctx,
cs->timestamp = !!(flags & HL_CS_FLAGS_TIMESTAMP);
cs->timeout_jiffies = timeout;
cs->skip_reset_on_timeout =
+ hdev->skip_reset_on_timeout ||
!!(flags & HL_CS_FLAGS_SKIP_RESET_ON_TIMEOUT);
cs->submission_time_jiffies = jiffies;
INIT_LIST_HEAD(&cs->job_list);
diff --git a/drivers/misc/habanalabs/common/debugfs.c b/drivers/misc/habanalabs/common/debugfs.c
index 8381155578a0..703d79fb6f3f 100644
--- a/drivers/misc/habanalabs/common/debugfs.c
+++ b/drivers/misc/habanalabs/common/debugfs.c
@@ -1278,6 +1278,11 @@ void hl_debugfs_add_device(struct hl_device *hdev)
dev_entry->root,
&dev_entry->blob_desc);
+ debugfs_create_x8("skip_reset_on_timeout",
+ 0644,
+ dev_entry->root,
+ &hdev->skip_reset_on_timeout);
+
for (i = 0, entry = dev_entry->entry_arr ; i < count ; i++, entry++) {
debugfs_create_file(hl_debugfs_list[i].name,
0444,
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index f8b7080e0570..12d9dc42e05e 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -2222,6 +2222,8 @@ struct hl_mmu_funcs {
* @supports_staged_submission: true if staged submissions are supported
* @curr_reset_cause: saves an enumerated reset cause when a hard reset is
* triggered, and cleared after it is shared with preboot.
+ * @skip_reset_on_timeout: Skip device reset if CS has timed out, wait for it to
+ * complete instead.
*/
struct hl_device {
struct pci_dev *pdev;
@@ -2337,6 +2339,7 @@ struct hl_device {
u8 device_fini_pending;
u8 supports_staged_submission;
u8 curr_reset_cause;
+ u8 skip_reset_on_timeout;
/* Parameters for bring-up */
u64 nic_ports_mask;
--
2.25.1
prev parent reply other threads:[~2021-06-17 14:40 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-17 14:40 [PATCH 1/3] habanalabs/gaudi: correct driver events numbering Oded Gabbay
2021-06-17 14:40 ` [PATCH 2/3] habanalabs: fix typo Oded Gabbay
2021-06-17 14:40 ` Oded Gabbay [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210617144048.4655-3-ogabbay@kernel.org \
--to=ogabbay@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ynudelman@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).