From: Oded Gabbay <oded.gabbay@gmail.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
Omer Shpigelman <oshpigelman@habana.ai>,
Tomer Tayar <ttayar@habana.ai>
Subject: Re: [PATCH 9/9] habanalabs: allow multiple processes to open FD
Date: Sun, 28 Jul 2019 14:56:40 +0300 [thread overview]
Message-ID: <CAFCwf11bL_2S__bc+wYpK=1_Ug2sJ6ZWnFiYA1dMDJxpsRZ5dA@mail.gmail.com> (raw)
In-Reply-To: <20190728114435.GB12033@kroah.com>
On Sun, Jul 28, 2019 at 2:44 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sun, Jul 28, 2019 at 02:28:18PM +0300, Oded Gabbay wrote:
> > This patch removes the limitation of a single process that can open the
> > device.
> >
> > Now, there is no limitation on the number of processes that can open the
> > device and have a valid FD.
> >
> > However, only a single process can perform compute operations. This is
> > enforced by allowing only a single process to have a compute context.
> >
> > Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
> > ---
> > drivers/misc/habanalabs/context.c | 100 +++++++++++++++------
> > drivers/misc/habanalabs/device.c | 18 ++--
> > drivers/misc/habanalabs/habanalabs.h | 1 -
> > drivers/misc/habanalabs/habanalabs_drv.c | 8 --
> > drivers/misc/habanalabs/habanalabs_ioctl.c | 7 +-
> > 5 files changed, 85 insertions(+), 49 deletions(-)
> >
> > diff --git a/drivers/misc/habanalabs/context.c b/drivers/misc/habanalabs/context.c
> > index 57bbe59da9b6..f64220fc3a55 100644
> > --- a/drivers/misc/habanalabs/context.c
> > +++ b/drivers/misc/habanalabs/context.c
> > @@ -56,7 +56,7 @@ void hl_ctx_do_release(struct kref *ref)
> > kfree(ctx);
> > }
> >
> > -int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv)
> > +static int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv)
> > {
> > struct hl_ctx_mgr *mgr = &hpriv->ctx_mgr;
> > struct hl_ctx *ctx;
> > @@ -89,9 +89,6 @@ int hl_ctx_create(struct hl_device *hdev, struct hl_fpriv *hpriv)
> > /* TODO: remove for multiple contexts per process */
> > hpriv->ctx = ctx;
> >
> > - /* TODO: remove the following line for multiple process support */
> > - hdev->compute_ctx = ctx;
> > -
> > return 0;
> >
> > remove_from_idr:
> > @@ -206,13 +203,22 @@ bool hl_ctx_is_valid(struct hl_fpriv *hpriv, bool requires_compute_ctx)
> > int rc;
> >
> > /* First thing, to minimize latency impact, check if context exists.
> > - * Also check if it matches the requirements. If so, exit immediately
> > + * This is relevant for the "steady state", where a process context
> > + * already exists, and we want to minimize the latency in command
> > + * submissions. In that case, we want to see if we can quickly exit
> > + * with a valid answer.
> > + *
> > + * If a context doesn't exists, we must grab the mutex. Otherwise,
> > + * there can be nasty races in case of multi-threaded application.
> > + *
> > + * So, if the context exists and we don't need a compute context,
> > + * that's fine. If it exists and the context we have is the compute
> > + * context, that's also fine. Other then that, we can't check anything
> > + * without the mutex.
> > */
> > - if (hpriv->ctx) {
> > - if ((requires_compute_ctx) && (hdev->compute_ctx != hpriv->ctx))
> > - return false;
> > + if ((hpriv->ctx) && ((!requires_compute_ctx) ||
> > + (hdev->compute_ctx == hpriv->ctx)))
> > return true;
> > - }
> >
> > mutex_lock(&hdev->lazy_ctx_creation_lock);
> >
> > @@ -222,35 +228,73 @@ bool hl_ctx_is_valid(struct hl_fpriv *hpriv, bool requires_compute_ctx)
> > * creation of a context
> > */
> > if (hpriv->ctx) {
> > - if ((requires_compute_ctx) && (hdev->compute_ctx != hpriv->ctx))
> > + if ((!requires_compute_ctx) ||
> > + (hdev->compute_ctx == hpriv->ctx))
> > + goto unlock_mutex;
> > +
> > + if (hdev->compute_ctx) {
> > valid = false;
> > - goto unlock_mutex;
> > - }
> > + goto unlock_mutex;
> > + }
> >
> > - /* If we already have a compute context, there is no point
> > - * of creating one in case we are called from ioctl that needs
> > - * a compute context
> > - */
> > - if ((hdev->compute_ctx) && (requires_compute_ctx)) {
> > + /* If we reached here, it means we have a non-compute context,
> > + * but there is no compute context on the device. Therefore,
> > + * we can try to "upgrade" the existing context to a compute
> > + * context
> > + */
> > + dev_dbg_ratelimited(hdev->dev,
> > + "Non-compute context %d exists\n",
> > + hpriv->ctx->asid);
> > +
> > + } else if ((hdev->compute_ctx) && (requires_compute_ctx)) {
> > +
> > + /* If we already have a compute context in the device, there is
> > + * no point of creating one in case we are called from ioctl
> > + * that needs a compute context
> > + */
> > dev_err(hdev->dev,
> > "Can't create new compute context as one already exists\n");
> > valid = false;
> > goto unlock_mutex;
> > - }
> > + } else {
> > + /* If we reached here it is because there isn't a context for
> > + * the process AND there is no compute context or compute
> > + * context wasn't required. In any case, must create a context
> > + * for the process
> > + */
> >
> > - rc = hl_ctx_create(hdev, hpriv);
> > - if (rc) {
> > - dev_err(hdev->dev, "Failed to create context %d\n", rc);
> > - valid = false;
> > - goto unlock_mutex;
> > + rc = hl_ctx_create(hdev, hpriv);
> > + if (rc) {
> > + dev_err(hdev->dev, "Failed to create context %d\n", rc);
> > + valid = false;
> > + goto unlock_mutex;
> > + }
> > +
> > + dev_dbg_ratelimited(hdev->dev, "Created context %d\n",
> > + hpriv->ctx->asid);
> > }
> >
> > - /* Device is IDLE at this point so it is legal to change PLLs.
> > - * There is no need to check anything because if the PLL is
> > - * already HIGH, the set function will return without doing
> > - * anything
> > + /* If we reached here then either we have a new context, or we can
> > + * upgrade a non-compute context to a compute context. Do the upgrade
> > + * only if the caller required a compute context
> > */
> > - hl_device_set_frequency(hdev, PLL_HIGH);
> > + if (requires_compute_ctx) {
> > + WARN(hdev->compute_ctx,
> > + "Compute context exists but driver is setting a new one");
>
> This will trigger syzbot and will reboot machines that have
> 'panic-on-warn' set (i.e. all cloud systems). So be _VERY_ careful
> about this.
>
> If a user can trigger this, do not use WARN(), that's not what it is
> for.
>
> thanks,
>
> greg k-h
I see...
I'll replace it with dev_crit, but I wanted to ask if you recommend to
never use WARN in drivers ? Just use it in kernel core code ?
Thanks,
Oded
next prev parent reply other threads:[~2019-07-28 11:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-28 11:28 [PATCH 0/9] habanalabs: support open device by multiple processes Oded Gabbay
2019-07-28 11:28 ` [PATCH 1/9] habanalabs: add handle field to context structure Oded Gabbay
2019-07-28 11:28 ` [PATCH 2/9] habanalabs: verify context is valid in IOCTLs Oded Gabbay
2019-07-28 11:28 ` [PATCH 3/9] habanalabs: create context in lazy mode Oded Gabbay
2019-07-28 11:28 ` [PATCH 4/9] habanalabs: don't change frequency if user context is valid Oded Gabbay
2019-07-28 11:28 ` [PATCH 5/9] habanalabs: maintain a list of file private data objects Oded Gabbay
2019-07-28 11:28 ` [PATCH 6/9] habanalabs: define user context as compute context Oded Gabbay
2019-07-28 11:28 ` [PATCH 7/9] habanalabs: protect only pointer dereference in hard-reset Oded Gabbay
2019-07-28 11:28 ` [PATCH 8/9] habanalabs: kill user process after CS rollback Oded Gabbay
2019-07-28 11:28 ` [PATCH 9/9] habanalabs: allow multiple processes to open FD Oded Gabbay
2019-07-28 11:44 ` Greg KH
2019-07-28 11:56 ` Oded Gabbay [this message]
2019-07-28 12:04 ` Greg KH
2019-07-28 12:06 ` Oded Gabbay
2019-07-28 12:12 ` Greg KH
2019-07-28 12:18 ` Oded Gabbay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFCwf11bL_2S__bc+wYpK=1_Ug2sJ6ZWnFiYA1dMDJxpsRZ5dA@mail.gmail.com' \
--to=oded.gabbay@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oshpigelman@habana.ai \
--cc=ttayar@habana.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).