All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] habanalabs: rate limit error msg on waiting for CS
@ 2019-12-03 20:27 Oded Gabbay
  2019-12-04  8:01 ` Tomer Tayar
  0 siblings, 1 reply; 2+ messages in thread
From: Oded Gabbay @ 2019-12-03 20:27 UTC (permalink / raw)
  To: linux-kernel, oshpigelman, ttayar; +Cc: gregkh

In case a user submits a CS, and the submission fails, and the user doesn't
check the return value and instead use the error return value as a valid
sequence number of a CS and ask to wait on it, the driver will print an
error and return an error code for that wait.

The real problem happens if now the user ignores the error of the wait, and
try to wait again and again. This can lead to a flood of error messages
from the driver and even soft lockup event.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
---
 drivers/misc/habanalabs/command_submission.c | 5 +++--
 drivers/misc/habanalabs/context.c            | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/command_submission.c b/drivers/misc/habanalabs/command_submission.c
index 8850f475a413..0bf08678431b 100644
--- a/drivers/misc/habanalabs/command_submission.c
+++ b/drivers/misc/habanalabs/command_submission.c
@@ -824,8 +824,9 @@ int hl_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 	memset(args, 0, sizeof(*args));
 
 	if (rc < 0) {
-		dev_err(hdev->dev, "Error %ld on waiting for CS handle %llu\n",
-			rc, seq);
+		dev_err_ratelimited(hdev->dev,
+				"Error %ld on waiting for CS handle %llu\n",
+				rc, seq);
 		if (rc == -ERESTARTSYS) {
 			args->out.status = HL_WAIT_CS_STATUS_INTERRUPTED;
 			rc = -EINTR;
diff --git a/drivers/misc/habanalabs/context.c b/drivers/misc/habanalabs/context.c
index 17db7b3dfb4c..2df6fb87e7ff 100644
--- a/drivers/misc/habanalabs/context.c
+++ b/drivers/misc/habanalabs/context.c
@@ -176,7 +176,7 @@ struct dma_fence *hl_ctx_get_fence(struct hl_ctx *ctx, u64 seq)
 	spin_lock(&ctx->cs_lock);
 
 	if (seq >= ctx->cs_sequence) {
-		dev_notice(hdev->dev,
+		dev_notice_ratelimited(hdev->dev,
 			"Can't wait on seq %llu because current CS is at seq %llu\n",
 			seq, ctx->cs_sequence);
 		spin_unlock(&ctx->cs_lock);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* RE: [PATCH] habanalabs: rate limit error msg on waiting for CS
  2019-12-03 20:27 [PATCH] habanalabs: rate limit error msg on waiting for CS Oded Gabbay
@ 2019-12-04  8:01 ` Tomer Tayar
  0 siblings, 0 replies; 2+ messages in thread
From: Tomer Tayar @ 2019-12-04  8:01 UTC (permalink / raw)
  To: Oded Gabbay, linux-kernel, Omer Shpigelman; +Cc: gregkh

On Tue, Dec 3, 2019 at 22:28, Oded Gabbay <oded.gabbay@gmail.com> wrote:
> In case a user submits a CS, and the submission fails, and the user doesn't
> check the return value and instead use the error return value as a valid
> sequence number of a CS and ask to wait on it, the driver will print an
> error and return an error code for that wait.
> 
> The real problem happens if now the user ignores the error of the wait, and
> try to wait again and again. This can lead to a flood of error messages
> from the driver and even soft lockup event.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

Reviewed-by: Tomer Tayar <ttayar@habana.ai>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-12-04  8:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-03 20:27 [PATCH] habanalabs: rate limit error msg on waiting for CS Oded Gabbay
2019-12-04  8:01 ` Tomer Tayar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.