All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liu, Monk" <Monk.Liu@amd.com>
To: "Koenig, Christian" <Christian.Koenig@amd.com>,
	"Grodzovsky, Andrey" <Andrey.Grodzovsky@amd.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	"Chen, JingWen" <JingWen.Chen2@amd.com>
Cc: DRI Development <dri-devel@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: RE: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread
Date: Wed, 1 Sep 2021 01:58:27 +0000	[thread overview]
Message-ID: <BL1PR12MB5269EB4E07A80EEB48A391DF84CD9@BL1PR12MB5269.namprd12.prod.outlook.com> (raw)
In-Reply-To: <BL1PR12MB526942160701B46D4B28EEEC84CD9@BL1PR12MB5269.namprd12.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 2970 bytes --]

[AMD Official Use Only]

In the previous discussion, you guys stated that we should drop the "kthread_should_park" in cleanup_job.

@@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
{
        struct drm_sched_job *job, *next;

-       /*
-        * Don't destroy jobs while the timeout worker is running  OR thread
-        * is being parked and hence assumed to not touch pending_list
-        */
-       if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-           !cancel_delayed_work(&sched->work_tdr)) ||
-           kthread_should_park())
-               return NULL;

But I suddenly have a question here: if return the timedout job no matter kthread_should_park() or not, then we are backing to the original problem again: that the timedout_job is suddenly signaling and cleanup_job still returns it to sched_main and job is freed while it is still handling by vendor's timeout callback

If we return NULL when kthread_should_park() in cleanup_job, we can prevent above scenario from happening: once a job is processed by job_timedout we can stop its scheduler, and after that even this job suddenly signaled the cleanup_job won't return it so sched_main won't free it in parallel ...

What do you think ?
Thanks

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------

From: Liu, Monk
Sent: Wednesday, September 1, 2021 9:23 AM
To: Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>; Chen, JingWen <JingWen.Chen2@amd.com>
Cc: DRI Development <dri-devel@lists.freedesktop.org>; amd-gfx@lists.freedesktop.org
Subject: [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread


[AMD Official Use Only]

Hi Daniel/Christian/Andrey

It looks the voice from you three are spread over those email floods to me, the feature we are working on (diagnostic TDR scheme) is pending there for more than 6 month (we started it from feb 2021).

Honestly speaking the email ways that we are using now is not friendly and quite painful to me ....
Can we try to put all our opinions, suggestions, or even objects here together, let's go through them one by one, it's too hard for us to reply each email on different questions .

For [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4)

This is a fixing patch on the timeout timer in scheduler, can we complete this one first ? it should already resolved all the questions and suggestions.

For [PATCH 2/2] drm/sched: serialize job_timeout and scheduler

I think I already explained the questions raised by Daniel in other thread , regarding why I use __kthread_should_park()
For other aspects, can we put all our opinion synthesized here ?

Thanks !

------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------


[-- Attachment #2: Type: text/html, Size: 7739 bytes --]

  reply	other threads:[~2021-09-01  1:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-01  1:23 [diagnostic TDR mode patches] unify our solution opinions/suggestions in one thread Liu, Monk
2021-09-01  1:58 ` Liu, Monk [this message]
2021-09-01  4:04   ` Andrey Grodzovsky
2021-09-01  4:25     ` Jingwen Chen
2021-09-01  4:28       ` Andrey Grodzovsky
2021-09-01  4:40         ` Jingwen Chen
2021-09-01  4:49           ` Andrey Grodzovsky
2021-09-01  8:18 ` Daniel Vetter
2021-09-01 10:19   ` Liu, Monk
2021-09-01 15:19     ` Alex Deucher
2021-09-01 18:50       ` Dave Airlie
2021-09-02  5:52         ` Liu, Monk
2021-09-02 11:00           ` Christian König
2021-09-02 16:11             ` Daniel Vetter
2021-09-06  6:36               ` Liu, Monk
2021-09-06 10:35                 ` Jingwen Chen
2021-09-02 13:31           ` Alex Deucher
2021-09-02 16:57           ` Daniel Stone
2021-09-01 10:27   ` Liu, Monk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL1PR12MB5269EB4E07A80EEB48A391DF84CD9@BL1PR12MB5269.namprd12.prod.outlook.com \
    --to=monk.liu@amd.com \
    --cc=Andrey.Grodzovsky@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=JingWen.Chen2@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.