From: Alex Deucher <alexdeucher@gmail.com>
To: "Liu, Shaoyun" <Shaoyun.Liu@amd.com>
Cc: "Koenig, Christian" <Christian.Koenig@amd.com>,
"Chen, Horace" <Horace.Chen@amd.com>,
"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>,
"Kuehling, Felix" <Felix.Kuehling@amd.com>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>,
"Xiao, Jack" <Jack.Xiao@amd.com>,
"Zhang, Hawking" <Hawking.Zhang@amd.com>,
"Liu, Monk" <Monk.Liu@amd.com>, "Xu, Feifei" <Feifei.Xu@amd.com>,
"Chang, HaiJun" <HaiJun.Chang@amd.com>,
Leo Liu <leo.liiu@amd.com>,
"Liu, Jenny (Jing)" <Jenny-Jing.Liu@amd.com>
Subject: Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous
Date: Wed, 17 Apr 2024 15:21:43 -0400 [thread overview]
Message-ID: <CADnq5_MJJk=wW4tDKtOCCVPk9HeK8as75jxgkaHU3LmXb8ezxg@mail.gmail.com> (raw)
In-Reply-To: <CH0PR12MB53727853A2173DA703452EF5F40F2@CH0PR12MB5372.namprd12.prod.outlook.com>
On Wed, Apr 17, 2024 at 3:17 PM Liu, Shaoyun <Shaoyun.Liu@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> I have a discussion with Christian about this before . The conclusion is that driver should prevent multiple process from using the MES ring at the same time . Also for current MES ring usage ,driver doesn't have the logic to prevent the ring been overflowed and we doesn't hit the issue because MES will wait polling for each MES submission . If we want to change the MES to work asynchronously , we need to consider a way to avoid this (similar to add the limit in the fence handling we use for kiq and HMM paging)
>
I think we need a separate fence (different GPU address and seq
number) per request. Then each caller can wait independently.
Alex
> Regards
> Shaoyun.liu
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
> Sent: Wednesday, April 17, 2024 8:49 AM
> To: Chen, Horace <Horace.Chen@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Xiao, Jack <Jack.Xiao@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Xu, Feifei <Feifei.Xu@amd.com>; Chang, HaiJun <HaiJun.Chang@amd.com>; Leo Liu <leo.liiu@amd.com>; Liu, Jenny (Jing) <Jenny-Jing.Liu@amd.com>
> Subject: Re: [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous
>
> Am 17.04.24 um 13:30 schrieb Horace Chen:
> > The MES firmware expects synchronous operation with the driver. For
> > this to work asynchronously, each caller would need to provide its own
> > fence location and sequence number.
>
> Well that's certainly not correct. The seqno takes care that we can wait async for the submission to complete.
>
> So clear NAK for that patch here.
>
> Regards,
> Christian.
>
> >
> > For now, add a mutex lock to serialize the MES submission.
> > For SR-IOV long-wait case, break the long-wait to separated part to
> > prevent this wait from impacting reset sequence.
> >
> > Signed-off-by: Horace Chen <horace.chen@amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 +++
> > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 +
> > drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 18 ++++++++++++++----
> > 3 files changed, 18 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > index 78e4f88f5134..8896be95b2c8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
> > @@ -137,6 +137,7 @@ int amdgpu_mes_init(struct amdgpu_device *adev)
> > spin_lock_init(&adev->mes.queue_id_lock);
> > spin_lock_init(&adev->mes.ring_lock);
> > mutex_init(&adev->mes.mutex_hidden);
> > + mutex_init(&adev->mes.submission_lock);
> >
> > adev->mes.total_max_queue = AMDGPU_FENCE_MES_QUEUE_ID_MASK;
> > adev->mes.vmid_mask_mmhub = 0xffffff00; @@ -221,6 +222,7 @@ int
> > amdgpu_mes_init(struct amdgpu_device *adev)
> > idr_destroy(&adev->mes.queue_id_idr);
> > ida_destroy(&adev->mes.doorbell_ida);
> > mutex_destroy(&adev->mes.mutex_hidden);
> > + mutex_destroy(&adev->mes.submission_lock);
> > return r;
> > }
> >
> > @@ -240,6 +242,7 @@ void amdgpu_mes_fini(struct amdgpu_device *adev)
> > idr_destroy(&adev->mes.queue_id_idr);
> > ida_destroy(&adev->mes.doorbell_ida);
> > mutex_destroy(&adev->mes.mutex_hidden);
> > + mutex_destroy(&adev->mes.submission_lock);
> > }
> >
> > static void amdgpu_mes_queue_free_mqd(struct amdgpu_mes_queue *q)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> > index 6b3e1844eac5..90af935cc889 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> > @@ -85,6 +85,7 @@ struct amdgpu_mes {
> >
> > struct amdgpu_ring ring;
> > spinlock_t ring_lock;
> > + struct mutex submission_lock;
> >
> > const struct firmware *fw[AMDGPU_MAX_MES_PIPES];
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > index e40d00afd4f5..0a609a5b8835 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > @@ -162,6 +162,7 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
> > struct amdgpu_ring *ring = &mes->ring;
> > unsigned long flags;
> > signed long timeout = adev->usec_timeout;
> > + signed long retry_count = 1;
> > const char *op_str, *misc_op_str;
> >
> > if (x_pkt->header.opcode >= MES_SCH_API_MAX) @@ -169,15 +170,19 @@
> > static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes
> > *mes,
> >
> > if (amdgpu_emu_mode) {
> > timeout *= 100;
> > - } else if (amdgpu_sriov_vf(adev)) {
> > + }
> > +
> > + if (amdgpu_sriov_vf(adev) && timeout > 0) {
> > /* Worst case in sriov where all other 15 VF timeout, each VF needs about 600ms */
> > - timeout = 15 * 600 * 1000;
> > + retry_count = (15 * 600 * 1000) / timeout;
> > }
> > BUG_ON(size % 4 != 0);
> >
> > + mutex_lock(&mes->submission_lock);
> > spin_lock_irqsave(&mes->ring_lock, flags);
> > if (amdgpu_ring_alloc(ring, ndw)) {
> > spin_unlock_irqrestore(&mes->ring_lock, flags);
> > + mutex_unlock(&mes->submission_lock);
> > return -ENOMEM;
> > }
> >
> > @@ -199,8 +204,13 @@ static int mes_v11_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
> > else
> > dev_dbg(adev->dev, "MES msg=%d was emitted\n",
> > x_pkt->header.opcode);
> >
> > - r = amdgpu_fence_wait_polling(ring, ring->fence_drv.sync_seq,
> > - timeout);
> > + do {
> > + r = amdgpu_fence_wait_polling(ring, ring->fence_drv.sync_seq,
> > + timeout);
> > + retry_count--;
> > + } while (retry_count > 0 && !amdgpu_in_reset(adev));
> > +
> > + mutex_unlock(&mes->submission_lock);
> > if (r < 1) {
> >
> > if (misc_op_str)
>
next prev parent reply other threads:[~2024-04-17 19:21 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-17 11:30 [PATCH 1/3] drm/amdgpu/mes11: print MES opcodes rather than numbers Horace Chen
2024-04-17 11:30 ` [PATCH 2/3] Revert "drm/amdgpu: increase mes submission timeout" Horace Chen
2024-04-17 11:30 ` [PATCH 3/3] drm/amdgpu/mes11: make fence waits synchronous Horace Chen
2024-04-17 12:49 ` Christian König
2024-04-17 18:52 ` Liu, Shaoyun
2024-04-17 19:21 ` Alex Deucher [this message]
2024-04-18 5:59 ` Christian König
2024-04-18 14:42 ` Liu, Shaoyun
2024-04-17 14:16 ` [PATCH 1/3] drm/amdgpu/mes11: print MES opcodes rather than numbers Kasiviswanathan, Harish
2024-04-17 18:53 ` Liu, Shaoyun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADnq5_MJJk=wW4tDKtOCCVPk9HeK8as75jxgkaHU3LmXb8ezxg@mail.gmail.com' \
--to=alexdeucher@gmail.com \
--cc=Alexander.Deucher@amd.com \
--cc=Andrey.Grodzovsky@amd.com \
--cc=Christian.Koenig@amd.com \
--cc=Feifei.Xu@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=HaiJun.Chang@amd.com \
--cc=Hawking.Zhang@amd.com \
--cc=Horace.Chen@amd.com \
--cc=Jack.Xiao@amd.com \
--cc=Jenny-Jing.Liu@amd.com \
--cc=Monk.Liu@amd.com \
--cc=Shaoyun.Liu@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=leo.liiu@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).