[PATCH 1/2] drm/scheduler: Set sched->thread to NULL on failure

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] drm/scheduler: Set sched->thread to NULL on failure
@ 2018-11-02 10:31 Sharat Masetty
       [not found] ` <1541154693-29623-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sharat Masetty @ 2018-11-02 10:31 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: andrey.grodzovsky-5C7GfCeVMHo,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Sharat Masetty,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, christian.koenig-5C7GfCeVMHo

In cases where the scheduler instance is used as a base object of another
driver object, it's not clear if the driver can call scheduler cleanup on the
fail path. So, Set the sched->thread to NULL, so that the driver can safely
call drm_sched_fini() during cleanup.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 44fe587..c993d10 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -594,7 +594,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   long timeout,
 		   const char *name)
 {
-	int i;
+	int i, ret;
 	sched->ops = ops;
 	sched->hw_submission_limit = hw_submission;
 	sched->name = name;
@@ -615,8 +615,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	/* Each scheduler will run on a seperate kernel thread */
 	sched->thread = kthread_run(drm_sched_main, sched, sched->name);
 	if (IS_ERR(sched->thread)) {
+		ret = PTR_ERR(sched->thread);
+		sched->thread = NULL;
 		DRM_ERROR("Failed to create scheduler for %s.\n", name);
-		return PTR_ERR(sched->thread);
+		return ret;
 	}

 	return 0;
--
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] drm/scheduler: Add a start_timeout_notify() backend function
       [not found] ` <1541154693-29623-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-11-02 10:31   ` Sharat Masetty
       [not found]     ` <1541154693-29623-2-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sharat Masetty @ 2018-11-02 10:31 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: andrey.grodzovsky-5C7GfCeVMHo,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Sharat Masetty,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ, christian.koenig-5C7GfCeVMHo

Add an optional backend function op which will let the scheduler clients
know when the timeout got scheduled on the scheduler instance. This will
help drivers with multiple schedulers(one per ring) measure time spent on
the ring accurately, eventually helping with better timeout detection.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
Here is an example of how I plan to use this new function callback.

[1] https://patchwork.freedesktop.org/patch/254227/

 drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
 include/drm/gpu_scheduler.h            | 6 ++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c993d10..afd461e 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -192,8 +192,13 @@ bool drm_sched_dependency_optimized(struct dma_fence* fence,
 static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 {
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-	    !list_empty(&sched->ring_mirror_list))
+	    !list_empty(&sched->ring_mirror_list)) {
+
 		schedule_delayed_work(&sched->work_tdr, sched->timeout);
+
+		if (sched->ops->start_timeout_notify)
+			sched->ops->start_timeout_notify(sched);
+	}
 }

 /* job_finish is called after hw fence signaled
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d87b268..faf28b4 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -239,6 +239,12 @@ struct drm_sched_backend_ops {
          * and it's time to clean it up.
 	 */
 	void (*free_job)(struct drm_sched_job *sched_job);
+
+	/*
+	 * (Optional) Called to let the driver know that a timeout detection
+	 * timer has been started.
+	 */
+	void (*start_timeout_notify)(struct drm_gpu_scheduler *sched);
 };

 /**
--
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/scheduler: Add a start_timeout_notify() backend function
       [not found]     ` <1541154693-29623-2-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-11-02 10:39       ` Koenig, Christian
       [not found]         ` <57ffb214-3635-6445-e0e1-fc5010e05ba7-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Koenig, Christian @ 2018-11-02 10:39 UTC (permalink / raw)
  To: Sharat Masetty, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Grodzovsky, Andrey

Am 02.11.18 um 11:31 schrieb Sharat Masetty:
> Add an optional backend function op which will let the scheduler clients
> know when the timeout got scheduled on the scheduler instance. This will
> help drivers with multiple schedulers(one per ring) measure time spent on
> the ring accurately, eventually helping with better timeout detection.
>
> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>

Well, NAK. drm_sched_start_timeout() is called whenever the timer needs 
to run, but that doesn't mean that the timer is started (e.g. it can 
already be running).

So the callback would be called multiple times and not reflect the 
actual job run time.

Additional to that it can be racy, e.g. we can complete multiple jobs at 
a time before the timer is started again.

If you want to accurately count how much time you spend on each job/ring 
you need to do this by measuring the time inside your driver instead.

E.g. for amdgpu I would get the time first in amdgpu_job_run() and then 
again in amdgpu_job_free_cb() and calculate the difference.

Regards,
Christian.

> ---
> Here is an example of how I plan to use this new function callback.
>
> [1] https://patchwork.freedesktop.org/patch/254227/
>
>   drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
>   include/drm/gpu_scheduler.h            | 6 ++++++
>   2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c993d10..afd461e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -192,8 +192,13 @@ bool drm_sched_dependency_optimized(struct dma_fence* fence,
>   static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>   {
>   	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> -	    !list_empty(&sched->ring_mirror_list))
> +	    !list_empty(&sched->ring_mirror_list)) {
> +
>   		schedule_delayed_work(&sched->work_tdr, sched->timeout);
> +
> +		if (sched->ops->start_timeout_notify)
> +			sched->ops->start_timeout_notify(sched);
> +	}
>   }
>
>   /* job_finish is called after hw fence signaled
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index d87b268..faf28b4 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -239,6 +239,12 @@ struct drm_sched_backend_ops {
>            * and it's time to clean it up.
>   	 */
>   	void (*free_job)(struct drm_sched_job *sched_job);
> +
> +	/*
> +	 * (Optional) Called to let the driver know that a timeout detection
> +	 * timer has been started.
> +	 */
> +	void (*start_timeout_notify)(struct drm_gpu_scheduler *sched);
>   };
>
>   /**
> --
> 1.9.1
>

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/scheduler: Add a start_timeout_notify() backend function
       [not found]         ` <57ffb214-3635-6445-e0e1-fc5010e05ba7-5C7GfCeVMHo@public.gmane.org>
@ 2018-11-02 13:25           ` Sharat Masetty
       [not found]             ` <f30f0949-1a4e-1b96-982a-661ee3e3c9d3-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sharat Masetty @ 2018-11-02 13:25 UTC (permalink / raw)
  To: Koenig, Christian, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Grodzovsky, Andrey



On 11/2/2018 4:09 PM, Koenig, Christian wrote:
> Am 02.11.18 um 11:31 schrieb Sharat Masetty:
>> Add an optional backend function op which will let the scheduler clients
>> know when the timeout got scheduled on the scheduler instance. This will
>> help drivers with multiple schedulers(one per ring) measure time spent on
>> the ring accurately, eventually helping with better timeout detection.
>>
>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
> 
> Well, NAK. drm_sched_start_timeout() is called whenever the timer needs
> to run, but that doesn't mean that the timer is started (e.g. it can
> already be running).
> 
> So the callback would be called multiple times and not reflect the
> actual job run time.
> 
> Additional to that it can be racy, e.g. we can complete multiple jobs at
> a time before the timer is started again.
> 
> If you want to accurately count how much time you spend on each job/ring
> you need to do this by measuring the time inside your driver instead.
> 
> E.g. for amdgpu I would get the time first in amdgpu_job_run() and then
> again in amdgpu_job_free_cb() and calculate the difference.
Hi Christian,

Thank you for the comments and apologies if this was confusing. All I 
want to determine(more accurately) is that when the scheduler instance 
timer of say 500 ms goes off, is if the ring(associated with the 
scheduler instance) actually spent 500 ms on the hardware - and for this 
I need to know in the driver space when the timer actually started.

In msm hardware we have ring preemption support enabled and the kernel 
driver triggers a preemption switch to a higher priority ring if there 
is work available on that ring for the GPU to work on. So in the 
presence of preemption it is possible that a lower priority ring did not 
actually get to spend the full 500 ms and this is what I am trying to 
catch with this callback.

I am *not* trying to profile per job time consumption with this.

 > Well, NAK. drm_sched_start_timeout() is called whenever the timer needs
 > to run, but that doesn't mean that the timer is started (e.g. it can
 > already be running).

Regarding the case where the timer may already be running - good point, 
but it should be easy to address the scenario. I will check the output
of schedule_delayed_work() and only call the callback(new proposed) if 
the timer was really scheduled.

In summary, when this timedout_job() callback is called, it is assumed 
that the job actually did time out from the POV of the scheduler, but 
this will not hold true with preemption switching and that is what I am 
trying to better address with this patch.

Sharat
> 
> Regards,
> Christian.
> 
>> ---
>> Here is an example of how I plan to use this new function callback.
>>
>> [1] https://patchwork.freedesktop.org/patch/254227/
>>
>>    drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
>>    include/drm/gpu_scheduler.h            | 6 ++++++
>>    2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index c993d10..afd461e 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -192,8 +192,13 @@ bool drm_sched_dependency_optimized(struct dma_fence* fence,
>>    static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>>    {
>>    	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> -	    !list_empty(&sched->ring_mirror_list))
>> +	    !list_empty(&sched->ring_mirror_list)) {
>> +
>>    		schedule_delayed_work(&sched->work_tdr, sched->timeout);
>> +
>> +		if (sched->ops->start_timeout_notify)
>> +			sched->ops->start_timeout_notify(sched);
>> +	}
>>    }
>>
>>    /* job_finish is called after hw fence signaled
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index d87b268..faf28b4 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -239,6 +239,12 @@ struct drm_sched_backend_ops {
>>             * and it's time to clean it up.
>>    	 */
>>    	void (*free_job)(struct drm_sched_job *sched_job);
>> +
>> +	/*
>> +	 * (Optional) Called to let the driver know that a timeout detection
>> +	 * timer has been started.
>> +	 */
>> +	void (*start_timeout_notify)(struct drm_gpu_scheduler *sched);
>>    };
>>
>>    /**
>> --
>> 1.9.1
>>
> 
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/scheduler: Add a start_timeout_notify() backend function
       [not found]             ` <f30f0949-1a4e-1b96-982a-661ee3e3c9d3-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-11-02 13:37               ` Koenig, Christian
  2018-11-05  7:55                 ` [Freedreno] " Sharat Masetty
  0 siblings, 1 reply; 7+ messages in thread
From: Koenig, Christian @ 2018-11-02 13:37 UTC (permalink / raw)
  To: Sharat Masetty, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Grodzovsky, Andrey

Am 02.11.18 um 14:25 schrieb Sharat Masetty:
>
>
> On 11/2/2018 4:09 PM, Koenig, Christian wrote:
>> Am 02.11.18 um 11:31 schrieb Sharat Masetty:
>>> Add an optional backend function op which will let the scheduler 
>>> clients
>>> know when the timeout got scheduled on the scheduler instance. This 
>>> will
>>> help drivers with multiple schedulers(one per ring) measure time 
>>> spent on
>>> the ring accurately, eventually helping with better timeout detection.
>>>
>>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
>>
>> Well, NAK. drm_sched_start_timeout() is called whenever the timer needs
>> to run, but that doesn't mean that the timer is started (e.g. it can
>> already be running).
>>
>> So the callback would be called multiple times and not reflect the
>> actual job run time.
>>
>> Additional to that it can be racy, e.g. we can complete multiple jobs at
>> a time before the timer is started again.
>>
>> If you want to accurately count how much time you spend on each job/ring
>> you need to do this by measuring the time inside your driver instead.
>>
>> E.g. for amdgpu I would get the time first in amdgpu_job_run() and then
>> again in amdgpu_job_free_cb() and calculate the difference.
> Hi Christian,
>
> Thank you for the comments and apologies if this was confusing. All I 
> want to determine(more accurately) is that when the scheduler instance 
> timer of say 500 ms goes off, is if the ring(associated with the 
> scheduler instance) actually spent 500 ms on the hardware - and for 
> this I need to know in the driver space when the timer actually started.
>
> In msm hardware we have ring preemption support enabled and the kernel 
> driver triggers a preemption switch to a higher priority ring if there 
> is work available on that ring for the GPU to work on. So in the 
> presence of preemption it is possible that a lower priority ring did 
> not actually get to spend the full 500 ms and this is what I am trying 
> to catch with this callback.
>
> I am *not* trying to profile per job time consumption with this.
>
> > Well, NAK. drm_sched_start_timeout() is called whenever the timer needs
> > to run, but that doesn't mean that the timer is started (e.g. it can
> > already be running).
>
> Regarding the case where the timer may already be running - good 
> point, but it should be easy to address the scenario. I will check the 
> output
> of schedule_delayed_work() and only call the callback(new proposed) if 
> the timer was really scheduled.

Yeah, that should work.

>
> In summary, when this timedout_job() callback is called, it is assumed 
> that the job actually did time out from the POV of the scheduler, but 
> this will not hold true with preemption switching and that is what I 
> am trying to better address with this patch.

Mhm, so what you actually need is to suspend the timeout when the lower 
priority ring is preempted and resume it when it is started again? I 
wonder if that wouldn't be simpler.

We have support for ring preemption as well, but not implemented yet. So 
it would be nice to have something that works for everybody.

But on the other hand a callback to notify the driver that the timer 
started isn't so bad either.

Regards,
Christian.

>
> Sharat
>>
>> Regards,
>> Christian.
>>
>>> ---
>>> Here is an example of how I plan to use this new function callback.
>>>
>>> [1] https://patchwork.freedesktop.org/patch/254227/
>>>
>>>    drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
>>>    include/drm/gpu_scheduler.h            | 6 ++++++
>>>    2 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index c993d10..afd461e 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -192,8 +192,13 @@ bool drm_sched_dependency_optimized(struct 
>>> dma_fence* fence,
>>>    static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>>>    {
>>>        if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>> -        !list_empty(&sched->ring_mirror_list))
>>> +        !list_empty(&sched->ring_mirror_list)) {
>>> +
>>>            schedule_delayed_work(&sched->work_tdr, sched->timeout);
>>> +
>>> +        if (sched->ops->start_timeout_notify)
>>> +            sched->ops->start_timeout_notify(sched);
>>> +    }
>>>    }
>>>
>>>    /* job_finish is called after hw fence signaled
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index d87b268..faf28b4 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -239,6 +239,12 @@ struct drm_sched_backend_ops {
>>>             * and it's time to clean it up.
>>>         */
>>>        void (*free_job)(struct drm_sched_job *sched_job);
>>> +
>>> +    /*
>>> +     * (Optional) Called to let the driver know that a timeout 
>>> detection
>>> +     * timer has been started.
>>> +     */
>>> +    void (*start_timeout_notify)(struct drm_gpu_scheduler *sched);
>>>    };
>>>
>>>    /**
>>> -- 
>>> 1.9.1
>>>
>>
>> _______________________________________________
>> Freedreno mailing list
>> Freedreno@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/freedreno
>>
>

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Freedreno] [PATCH 2/2] drm/scheduler: Add a start_timeout_notify() backend function
  2018-11-02 13:37               ` Koenig, Christian
@ 2018-11-05  7:55                 ` Sharat Masetty
       [not found]                   ` <bbfa8eae-23f5-5484-5abb-bf55dee64e34-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Sharat Masetty @ 2018-11-05  7:55 UTC (permalink / raw)
  To: Koenig, Christian, freedreno; +Cc: linux-arm-msm, dri-devel



On 11/2/2018 7:07 PM, Koenig, Christian wrote:
> Am 02.11.18 um 14:25 schrieb Sharat Masetty:
>>
>>
>> On 11/2/2018 4:09 PM, Koenig, Christian wrote:
>>> Am 02.11.18 um 11:31 schrieb Sharat Masetty:
>>>> Add an optional backend function op which will let the scheduler
>>>> clients
>>>> know when the timeout got scheduled on the scheduler instance. This
>>>> will
>>>> help drivers with multiple schedulers(one per ring) measure time
>>>> spent on
>>>> the ring accurately, eventually helping with better timeout detection.
>>>>
>>>> Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
>>>
>>> Well, NAK. drm_sched_start_timeout() is called whenever the timer needs
>>> to run, but that doesn't mean that the timer is started (e.g. it can
>>> already be running).
>>>
>>> So the callback would be called multiple times and not reflect the
>>> actual job run time.
>>>
>>> Additional to that it can be racy, e.g. we can complete multiple jobs at
>>> a time before the timer is started again.
>>>
>>> If you want to accurately count how much time you spend on each job/ring
>>> you need to do this by measuring the time inside your driver instead.
>>>
>>> E.g. for amdgpu I would get the time first in amdgpu_job_run() and then
>>> again in amdgpu_job_free_cb() and calculate the difference.
>> Hi Christian,
>>
>> Thank you for the comments and apologies if this was confusing. All I
>> want to determine(more accurately) is that when the scheduler instance
>> timer of say 500 ms goes off, is if the ring(associated with the
>> scheduler instance) actually spent 500 ms on the hardware - and for
>> this I need to know in the driver space when the timer actually started.
>>
>> In msm hardware we have ring preemption support enabled and the kernel
>> driver triggers a preemption switch to a higher priority ring if there
>> is work available on that ring for the GPU to work on. So in the
>> presence of preemption it is possible that a lower priority ring did
>> not actually get to spend the full 500 ms and this is what I am trying
>> to catch with this callback.
>>
>> I am *not* trying to profile per job time consumption with this.
>>
>>> Well, NAK. drm_sched_start_timeout() is called whenever the timer needs
>>> to run, but that doesn't mean that the timer is started (e.g. it can
>>> already be running).
>>
>> Regarding the case where the timer may already be running - good
>> point, but it should be easy to address the scenario. I will check the
>> output
>> of schedule_delayed_work() and only call the callback(new proposed) if
>> the timer was really scheduled.
> 
> Yeah, that should work.
> 
>>
>> In summary, when this timedout_job() callback is called, it is assumed
>> that the job actually did time out from the POV of the scheduler, but
>> this will not hold true with preemption switching and that is what I
>> am trying to better address with this patch.
> 
> Mhm, so what you actually need is to suspend the timeout when the lower
> priority ring is preempted and resume it when it is started again? I
> wonder if that wouldn't be simpler.
> 
> We have support for ring preemption as well, but not implemented yet. So
> it would be nice to have something that works for everybody.
> 
> But on the other hand a callback to notify the driver that the timer
> started isn't so bad either.
Hi Christian,

Yes something like a suspend timeout would be simpler for the drivers, 
but I could not find anything which does this for the delayed work or 
even for the general timers. All I could find was cancel/delete.

In lieu of this, I chose this approach. If you like it this way(proposed 
patch), then I will address the review comments and re-spin... please 
let me know.

Sharat
> 
> Regards,
> Christian.
> 
>>
>> Sharat
>>>
>>> Regards,
>>> Christian.
>>>
>>>> ---
>>>> Here is an example of how I plan to use this new function callback.
>>>>
>>>> [1] https://patchwork.freedesktop.org/patch/254227/
>>>>
>>>>     drivers/gpu/drm/scheduler/sched_main.c | 7 ++++++-
>>>>     include/drm/gpu_scheduler.h            | 6 ++++++
>>>>     2 files changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index c993d10..afd461e 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -192,8 +192,13 @@ bool drm_sched_dependency_optimized(struct
>>>> dma_fence* fence,
>>>>     static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>>>>     {
>>>>         if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>> -        !list_empty(&sched->ring_mirror_list))
>>>> +        !list_empty(&sched->ring_mirror_list)) {
>>>> +
>>>>             schedule_delayed_work(&sched->work_tdr, sched->timeout);
>>>> +
>>>> +        if (sched->ops->start_timeout_notify)
>>>> +            sched->ops->start_timeout_notify(sched);
>>>> +    }
>>>>     }
>>>>
>>>>     /* job_finish is called after hw fence signaled
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index d87b268..faf28b4 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -239,6 +239,12 @@ struct drm_sched_backend_ops {
>>>>              * and it's time to clean it up.
>>>>          */
>>>>         void (*free_job)(struct drm_sched_job *sched_job);
>>>> +
>>>> +    /*
>>>> +     * (Optional) Called to let the driver know that a timeout
>>>> detection
>>>> +     * timer has been started.
>>>> +     */
>>>> +    void (*start_timeout_notify)(struct drm_gpu_scheduler *sched);
>>>>     };
>>>>
>>>>     /**
>>>> -- 
>>>> 1.9.1
>>>>
>>>
>>> _______________________________________________
>>> Freedreno mailing list
>>> Freedreno@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/freedreno
>>>
>>
> 
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/scheduler: Add a start_timeout_notify() backend function
       [not found]                   ` <bbfa8eae-23f5-5484-5abb-bf55dee64e34-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2018-11-05 12:06                     ` Koenig, Christian
  0 siblings, 0 replies; 7+ messages in thread
From: Koenig, Christian @ 2018-11-05 12:06 UTC (permalink / raw)
  To: Sharat Masetty, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	jcrouse-sgV2jX0FEOL9JmXXK+q4OQ,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Grodzovsky, Andrey

[SNIP]

Am 05.11.18 um 08:55 schrieb Sharat Masetty:
> Hi Christian,
>
> Yes something like a suspend timeout would be simpler for the drivers, 
> but I could not find anything which does this for the delayed work or 
> even for the general timers. All I could find was cancel/delete.
>
> In lieu of this, I chose this approach. If you like it this 
> way(proposed patch), then I will address the review comments and 
> re-spin... please let me know.
>
> Sharat

I think I would prefer investigating into the suspend/resume approach 
for a moment.

What we can do rather easily is to use mod_delayed_work() with a larger 
timeout, e.g. like the following:

void drm_sched_suspend_timeout()
{
     mod_delayed_work(&sched->work_tdr, sched->timeout * 10);
}

void drm_sched_resume_timeout()
{
     mod_delayed_work(&sched->work_tdr, sched->timeout);
}

By looking at work_tdr.timer.expires before modifying it 
drm_sched_suspend_timeout() could also return the remaining jiffies to wait.

This way we wouldn't need to restart the timeout completely on every resume.

Regards,
Christian.
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-05 12:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-02 10:31 [PATCH 1/2] drm/scheduler: Set sched->thread to NULL on failure Sharat Masetty
     [not found] ` <1541154693-29623-1-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-11-02 10:31   ` [PATCH 2/2] drm/scheduler: Add a start_timeout_notify() backend function Sharat Masetty
     [not found]     ` <1541154693-29623-2-git-send-email-smasetty-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-11-02 10:39       ` Koenig, Christian
     [not found]         ` <57ffb214-3635-6445-e0e1-fc5010e05ba7-5C7GfCeVMHo@public.gmane.org>
2018-11-02 13:25           ` Sharat Masetty
     [not found]             ` <f30f0949-1a4e-1b96-982a-661ee3e3c9d3-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-11-02 13:37               ` Koenig, Christian
2018-11-05  7:55                 ` [Freedreno] " Sharat Masetty
     [not found]                   ` <bbfa8eae-23f5-5484-5abb-bf55dee64e34-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-11-05 12:06                     ` Koenig, Christian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.