All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Selectively spread compute rings across pipes
@ 2017-09-26 16:22 Andres Rodriguez
       [not found] ` <20170926162246.3200-1-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andres Rodriguez @ 2017-09-26 16:22 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, andresx7-Re5JQEeQqe8AvxtiuMwx3w

This was disabled due to an OCL perf regression as discussed on amd-gfx.

This series re-enables the feature for ASICs that are not affected, and also
introduces a boot parameter to force the policy on or off. This should help
future effort of comparing performance with the feature enabled/disabled.



Andres Rodriguez (2):
  drm/amdgpu: use multipipe compute policy on non PL11 asics
  drm/amdgpu: add option for force enable multipipe policy for compute

 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 20 ++++++++++++++++++--
 3 files changed, 23 insertions(+), 2 deletions(-)

-- 
2.9.3

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics
       [not found] ` <20170926162246.3200-1-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-26 16:22   ` Andres Rodriguez
       [not found]     ` <20170926162246.3200-2-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-09-26 16:22   ` [PATCH 2/2] drm/amdgpu: add option for force enable multipipe policy for compute Andres Rodriguez
  2017-09-26 18:15   ` [PATCH 0/2] Selectively spread compute rings across pipes Felix Kuehling
  2 siblings, 1 reply; 12+ messages in thread
From: Andres Rodriguez @ 2017-09-26 16:22 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, andresx7-Re5JQEeQqe8AvxtiuMwx3w

A performance regression for OpenCL tests on Polaris11 had this feature
disabled for all asics.

Instead, disable it selectively on the affected asics.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 4f6c68f..3d76e76 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se, unsigned max_s
 	}
 }
 
+static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
+{
+	/* FIXME: spreading the queues across pipes causes perf regressions
+	 * on POLARIS11 compute workloads */
+	if (adev->asic_type == CHIP_POLARIS11)
+		return false;
+
+	return adev->gfx.mec.num_mec > 1;
+}
+
 void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
 {
 	int i, queue, pipe, mec;
+	bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
 
 	/* policy for amdgpu compute queue ownership */
 	for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
@@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
 		if (mec >= adev->gfx.mec.num_mec)
 			break;
 
-		/* FIXME: spreading the queues across pipes causes perf regressions */
-		if (0) {
+		if (multipipe_policy) {
 			/* policy: amdgpu owns the first two queues of the first MEC */
 			if (mec == 0 && queue < 2)
 				set_bit(i, adev->gfx.mec.queue_bitmap);
-- 
2.9.3

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] drm/amdgpu: add option for force enable multipipe policy for compute
       [not found] ` <20170926162246.3200-1-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-09-26 16:22   ` [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics Andres Rodriguez
@ 2017-09-26 16:22   ` Andres Rodriguez
       [not found]     ` <20170926162246.3200-3-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-09-26 18:15   ` [PATCH 0/2] Selectively spread compute rings across pipes Felix Kuehling
  2 siblings, 1 reply; 12+ messages in thread
From: Andres Rodriguez @ 2017-09-26 16:22 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, andresx7-Re5JQEeQqe8AvxtiuMwx3w

Useful for testing the effects of multipipe compute without recompiling.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 6 ++++++
 3 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d62a35e..b2f0b5e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -121,6 +121,7 @@ extern int amdgpu_cntl_sb_buf_per_se;
 extern int amdgpu_param_buf_per_se;
 extern int amdgpu_job_hang_limit;
 extern int amdgpu_lbpw;
+extern int amdgpu_compute_multipipe;
 
 #ifdef CONFIG_DRM_AMDGPU_SI
 extern int amdgpu_si_support;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 792b117..308749c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -122,6 +122,7 @@ int amdgpu_cntl_sb_buf_per_se = 0;
 int amdgpu_param_buf_per_se = 0;
 int amdgpu_job_hang_limit = 0;
 int amdgpu_lbpw = -1;
+int amdgpu_compute_multipipe = -1;
 
 MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
 module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
@@ -265,6 +266,9 @@ module_param_named(job_hang_limit, amdgpu_job_hang_limit, int ,0444);
 MODULE_PARM_DESC(lbpw, "Load Balancing Per Watt (LBPW) support (1 = enable, 0 = disable, -1 = auto)");
 module_param_named(lbpw, amdgpu_lbpw, int, 0444);
 
+MODULE_PARM_DESC(compute_multipipe, "Force compute queues to be spread across pipes (1 = enable, 0 = disable, -1 = auto)");
+module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444);
+
 #ifdef CONFIG_DRM_AMDGPU_SI
 
 #if defined(CONFIG_DRM_RADEON) || defined(CONFIG_DRM_RADEON_MODULE)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 3d76e76..48d94ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -111,6 +111,12 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se, unsigned max_s
 
 static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
 {
+	if (amdgpu_compute_multipipe != -1) {
+		DRM_INFO("amdgpu: forcing compute pipe policy %d\n",
+			 amdgpu_compute_multipipe);
+		return amdgpu_compute_multipipe == 1;
+	}
+
 	/* FIXME: spreading the queues across pipes causes perf regressions
 	 * on POLARIS11 compute workloads */
 	if (adev->asic_type == CHIP_POLARIS11)
-- 
2.9.3

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Selectively spread compute rings across pipes
       [not found] ` <20170926162246.3200-1-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-09-26 16:22   ` [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics Andres Rodriguez
  2017-09-26 16:22   ` [PATCH 2/2] drm/amdgpu: add option for force enable multipipe policy for compute Andres Rodriguez
@ 2017-09-26 18:15   ` Felix Kuehling
       [not found]     ` <63adcb76-1a0b-e351-c70a-08761b02120e-5C7GfCeVMHo@public.gmane.org>
  2 siblings, 1 reply; 12+ messages in thread
From: Felix Kuehling @ 2017-09-26 18:15 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Andres Rodriguez

The series is Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>

Regards,
  Felix


On 2017-09-26 12:22 PM, Andres Rodriguez wrote:
> This was disabled due to an OCL perf regression as discussed on amd-gfx.
>
> This series re-enables the feature for ASICs that are not affected, and also
> introduces a boot parameter to force the policy on or off. This should help
> future effort of comparing performance with the feature enabled/disabled.
>
>
>
> Andres Rodriguez (2):
>   drm/amdgpu: use multipipe compute policy on non PL11 asics
>   drm/amdgpu: add option for force enable multipipe policy for compute
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  4 ++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 20 ++++++++++++++++++--
>  3 files changed, 23 insertions(+), 2 deletions(-)
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Selectively spread compute rings across pipes
       [not found]     ` <63adcb76-1a0b-e351-c70a-08761b02120e-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-28 19:03       ` Alex Deucher
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Deucher @ 2017-09-28 19:03 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Andres Rodriguez, amd-gfx list

On Tue, Sep 26, 2017 at 2:15 PM, Felix Kuehling <felix.kuehling@amd.com> wrote:
> The series is Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>
> Regards,
>   Felix
>
>
> On 2017-09-26 12:22 PM, Andres Rodriguez wrote:
>> This was disabled due to an OCL perf regression as discussed on amd-gfx.
>>
>> This series re-enables the feature for ASICs that are not affected, and also
>> introduces a boot parameter to force the policy on or off. This should help
>> future effort of comparing performance with the feature enabled/disabled.

Applied.  thanks!

Alex

>>
>>
>>
>> Andres Rodriguez (2):
>>   drm/amdgpu: use multipipe compute policy on non PL11 asics
>>   drm/amdgpu: add option for force enable multipipe policy for compute
>>
>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  4 ++++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 20 ++++++++++++++++++--
>>  3 files changed, 23 insertions(+), 2 deletions(-)
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add option for force enable multipipe policy for compute
       [not found]     ` <20170926162246.3200-3-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-29  8:59       ` Michel Dänzer
  0 siblings, 0 replies; 12+ messages in thread
From: Michel Dänzer @ 2017-09-29  8:59 UTC (permalink / raw)
  To: Andres Rodriguez; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 26/09/17 06:22 PM, Andres Rodriguez wrote:
> Useful for testing the effects of multipipe compute without recompiling.
> 
> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>

I'm skeptical that it's a good idea to add a module parameter for this.
See Daniel's rationale in
https://lists.freedesktop.org/archives/dri-devel/2017-September/153690.html
. In particular, since we don't fully understand how this stuff works
yet, it's better for now to stick to changing the code for experiments.


In general, the output of

 modinfo -p amdgpu

is quite a horror show already, we should try to clean that up rather
than making it worse.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics
       [not found]     ` <20170926162246.3200-2-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-11-06  6:49       ` Chunming Zhou
       [not found]         ` <2200a1fa-d1d3-bd92-825d-987ecd355dac@gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Chunming Zhou @ 2017-11-06  6:49 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo

Hi Andres,

With your this patch, OCLperf hung.

Could you explain more?

If I am correctly, the difference of with and without this patch is 
setting first two queue or setting all queues of pipe0 to queue_bitmap.

Then UMD can use different number queue to submit command for compute 
selected by amdgpu_queue_mgr_map.

I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user 
ring to different hw ring depending on busy or idle, right?

If yes, I see a bug in it, which will result in our sched_fence not 
work. Our sched fence assumes the job will be executed in order, your 
mapping queue breaks this.


Regards,

David Zhou


On 2017年09月27日 00:22, Andres Rodriguez wrote:
> A performance regression for OpenCL tests on Polaris11 had this feature
> disabled for all asics.
>
> Instead, disable it selectively on the affected asics.
>
> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 4f6c68f..3d76e76 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se, unsigned max_s
>   	}
>   }
>   
> +static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
> +{
> +	/* FIXME: spreading the queues across pipes causes perf regressions
> +	 * on POLARIS11 compute workloads */
> +	if (adev->asic_type == CHIP_POLARIS11)
> +		return false;
> +
> +	return adev->gfx.mec.num_mec > 1;
> +}
> +
>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
>   {
>   	int i, queue, pipe, mec;
> +	bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
>   
>   	/* policy for amdgpu compute queue ownership */
>   	for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
>   		if (mec >= adev->gfx.mec.num_mec)
>   			break;
>   
> -		/* FIXME: spreading the queues across pipes causes perf regressions */
> -		if (0) {
> +		if (multipipe_policy) {
>   			/* policy: amdgpu owns the first two queues of the first MEC */
>   			if (mec == 0 && queue < 2)
>   				set_bit(i, adev->gfx.mec.queue_bitmap);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics
       [not found]           ` <2200a1fa-d1d3-bd92-825d-987ecd355dac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-11-06 18:00             ` Andres Rodriguez
       [not found]               ` <CAFQ_0eFcs+DLu6sEEEtX1GJU+ccb1B2bHPSkrAdk3OUVfqq3zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andres Rodriguez @ 2017-11-06 18:00 UTC (permalink / raw)
  To: Chunming Zhou, amd-gfx list; +Cc: Deucher, Alexander

Sorry my mail client seems to have blown up. My reply got cut off,
here is the full version:



On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> Hi Andres,
>
Hi David,

> With your this patch, OCLperf hung.
Is this on all ASICs or just a specific one?

>
> Could you explain more?
>
> If I am correctly, the difference of with and without this patch is
> setting first two queue or setting all queues of pipe0 to queue_bitmap.
It is slightly different. With this patch we will also use the first
two queues of all pipes, not just pipe 0;

Pre-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11111111  00000000  00000000  00000000

Post-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11000000  11000000  11000000  11000000

What this means is that we are allowing real multithreading for
compute. Jobs on different pipes allow for parallel execution of work.
Jobs on the same pipe (but different queues) use timeslicing to share
the hardware.


>
> Then UMD can use different number queue to submit command for compute
> selected by amdgpu_queue_mgr_map.
>
> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> ring to different hw ring depending on busy or idle, right?
Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what
the mapping is for a usermode ring to a kernel ring id.

> If yes, I see a bug in it, which will result in our sched_fence not
> work. Our sched fence assumes the job will be executed in order, your
> mapping queue breaks this.

I think here you mean that work will execute out of order because it
will go to different rings?

That should not happen, since the id mapping is permanent on a
per-context basis. Once a mapping is decided, it will be cached for
this context so that we keep execution order guarantees. See the
id-caching code in amdgpu_queue_mgr.c for reference.

As long as the usermode keeps submitting work to the same ring, it
will all be executed in order (all in the same ring). There is no
change in this guarantee compared to pre-patch. Note that even before
this patch amdgpu_queue_mgr_map has been using an LRU policy for a
long time now.

Regards,
Andres

On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez <andresx7@gmail.com> wrote:
>
>
> On 2017-11-06 01:49 AM, Chunming Zhou wrote:
>>
>> Hi Andres,
>>
>
> Hi David,
>
>> With your this patch, OCLperf hung.
>
>
> Is this on all ASICs or just a specific one?
>
>>
>> Could you explain more?
>>
>> If I am correctly, the difference of with and without this patch is
>> setting first two queue or setting all queues of pipe0 to queue_bitmap.
>
>
> It is slightly different. With this patch we will also use the first two
> queues of all pipes, not just pipe 0;
>
> Pre-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11111111  00000000  00000000  00000000
>
> Post-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11000000  11000000  11000000  11000000
>
> What this means is that we are allowing real multithreading for compute.
> Jobs on different pipes allow for parallel execution of work. Jobs on the
> same pipe (but different queues) use timeslicing to share the hardware.
>
>
>
>>
>> Then UMD can use different number queue to submit command for compute
>> selected by amdgpu_queue_mgr_map.
>>
>> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user ring
>> to different hw ring depending on busy or idle, right?
>>
>> If yes, I see a bug in it, which will result in our sched_fence not work.
>> Our sched fence assumes the job will be executed in order, your mapping
>> queue breaks this.
>>
>>
>> Regards,
>>
>> David Zhou
>>
>>
>> On 2017年09月27日 00:22, Andres Rodriguez wrote:
>>>
>>> A performance regression for OpenCL tests on Polaris11 had this feature
>>> disabled for all asics.
>>>
>>> Instead, disable it selectively on the affected asics.
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index 4f6c68f..3d76e76 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask,
>>> unsigned max_se, unsigned max_s
>>>       }
>>>   }
>>> +static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
>>> +{
>>> +    /* FIXME: spreading the queues across pipes causes perf regressions
>>> +     * on POLARIS11 compute workloads */
>>> +    if (adev->asic_type == CHIP_POLARIS11)
>>> +        return false;
>>> +
>>> +    return adev->gfx.mec.num_mec > 1;
>>> +}
>>> +
>>>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
>>>   {
>>>       int i, queue, pipe, mec;
>>> +    bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
>>>       /* policy for amdgpu compute queue ownership */
>>>       for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
>>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct
>>> amdgpu_device *adev)
>>>           if (mec >= adev->gfx.mec.num_mec)
>>>               break;
>>> -        /* FIXME: spreading the queues across pipes causes perf
>>> regressions */
>>> -        if (0) {
>>> +        if (multipipe_policy) {
>>>               /* policy: amdgpu owns the first two queues of the first
>>> MEC */
>>>               if (mec == 0 && queue < 2)
>>>                   set_bit(i, adev->gfx.mec.queue_bitmap);
>>
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics
       [not found]               ` <CAFQ_0eFcs+DLu6sEEEtX1GJU+ccb1B2bHPSkrAdk3OUVfqq3zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-07  3:19                 ` Zhou, David(ChunMing)
       [not found]                   ` <MWHPR1201MB0206CEC26FF751CB39B529D6B4510-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Zhou, David(ChunMing) @ 2017-11-07  3:19 UTC (permalink / raw)
  To: Andres Rodriguez, amd-gfx list; +Cc: Deucher, Alexander


[-- Attachment #1.1: Type: text/plain, Size: 6238 bytes --]

Then snychronization should have no problem, it maybe relate to multipipe hw setting issue.


Regards,

David Zhou

________________________________
From: Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Tuesday, November 7, 2017 2:00:57 AM
To: Zhou, David(ChunMing); amd-gfx list
Cc: Deucher, Alexander
Subject: Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics

Sorry my mail client seems to have blown up. My reply got cut off,
here is the full version:



On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> Hi Andres,
>
Hi David,

> With your this patch, OCLperf hung.
Is this on all ASICs or just a specific one?

>
> Could you explain more?
>
> If I am correctly, the difference of with and without this patch is
> setting first two queue or setting all queues of pipe0 to queue_bitmap.
It is slightly different. With this patch we will also use the first
two queues of all pipes, not just pipe 0;

Pre-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11111111  00000000  00000000  00000000

Post-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11000000  11000000  11000000  11000000

What this means is that we are allowing real multithreading for
compute. Jobs on different pipes allow for parallel execution of work.
Jobs on the same pipe (but different queues) use timeslicing to share
the hardware.


>
> Then UMD can use different number queue to submit command for compute
> selected by amdgpu_queue_mgr_map.
>
> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> ring to different hw ring depending on busy or idle, right?
Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what
the mapping is for a usermode ring to a kernel ring id.

> If yes, I see a bug in it, which will result in our sched_fence not
> work. Our sched fence assumes the job will be executed in order, your
> mapping queue breaks this.

I think here you mean that work will execute out of order because it
will go to different rings?

That should not happen, since the id mapping is permanent on a
per-context basis. Once a mapping is decided, it will be cached for
this context so that we keep execution order guarantees. See the
id-caching code in amdgpu_queue_mgr.c for reference.

As long as the usermode keeps submitting work to the same ring, it
will all be executed in order (all in the same ring). There is no
change in this guarantee compared to pre-patch. Note that even before
this patch amdgpu_queue_mgr_map has been using an LRU policy for a
long time now.

Regards,
Andres

On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>
> On 2017-11-06 01:49 AM, Chunming Zhou wrote:
>>
>> Hi Andres,
>>
>
> Hi David,
>
>> With your this patch, OCLperf hung.
>
>
> Is this on all ASICs or just a specific one?
>
>>
>> Could you explain more?
>>
>> If I am correctly, the difference of with and without this patch is
>> setting first two queue or setting all queues of pipe0 to queue_bitmap.
>
>
> It is slightly different. With this patch we will also use the first two
> queues of all pipes, not just pipe 0;
>
> Pre-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11111111  00000000  00000000  00000000
>
> Post-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11000000  11000000  11000000  11000000
>
> What this means is that we are allowing real multithreading for compute.
> Jobs on different pipes allow for parallel execution of work. Jobs on the
> same pipe (but different queues) use timeslicing to share the hardware.
>
>
>
>>
>> Then UMD can use different number queue to submit command for compute
>> selected by amdgpu_queue_mgr_map.
>>
>> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user ring
>> to different hw ring depending on busy or idle, right?
>>
>> If yes, I see a bug in it, which will result in our sched_fence not work.
>> Our sched fence assumes the job will be executed in order, your mapping
>> queue breaks this.
>>
>>
>> Regards,
>>
>> David Zhou
>>
>>
>> On 2017年09月27日 00:22, Andres Rodriguez wrote:
>>>
>>> A performance regression for OpenCL tests on Polaris11 had this feature
>>> disabled for all asics.
>>>
>>> Instead, disable it selectively on the affected asics.
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index 4f6c68f..3d76e76 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask,
>>> unsigned max_se, unsigned max_s
>>>       }
>>>   }
>>> +static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
>>> +{
>>> +    /* FIXME: spreading the queues across pipes causes perf regressions
>>> +     * on POLARIS11 compute workloads */
>>> +    if (adev->asic_type == CHIP_POLARIS11)
>>> +        return false;
>>> +
>>> +    return adev->gfx.mec.num_mec > 1;
>>> +}
>>> +
>>>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
>>>   {
>>>       int i, queue, pipe, mec;
>>> +    bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
>>>       /* policy for amdgpu compute queue ownership */
>>>       for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
>>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct
>>> amdgpu_device *adev)
>>>           if (mec >= adev->gfx.mec.num_mec)
>>>               break;
>>> -        /* FIXME: spreading the queues across pipes causes perf
>>> regressions */
>>> -        if (0) {
>>> +        if (multipipe_policy) {
>>>               /* policy: amdgpu owns the first two queues of the first
>>> MEC */
>>>               if (mec == 0 && queue < 2)
>>>                   set_bit(i, adev->gfx.mec.queue_bitmap);
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 9444 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics
       [not found]                   ` <MWHPR1201MB0206CEC26FF751CB39B529D6B4510-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-11-07  7:26                     ` Andres Rodriguez
       [not found]                       ` <CAFQ_0eEqvKgUEUicckuPtvjZq-f4BXdhQO_p8hPVvy4Jh_KQgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andres Rodriguez @ 2017-11-07  7:26 UTC (permalink / raw)
  To: Zhou, David(ChunMing); +Cc: Deucher, Alexander, amd-gfx list


[-- Attachment #1.1: Type: text/plain, Size: 7164 bytes --]

Do you have any work actually going into multiple pipes? My understanding
is that opencl will only use one queue at a time (but I'm not really
certain about that).

What you can also check is if the app works correctly when it executed on
pipe0, and if it hangs on pipe 1+. I removed all the locations where pipe0
was hardcoded in the open driver, but it is possible it is still hardcoded
somewhere on the closed stack.

Regards,
Andres

On Nov 6, 2017 10:19 PM, "Zhou, David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org>
wrote:

> Then snychronization should have no problem, it maybe relate to multipipe
> hw setting issue.
>
>
> Regards,
>
> David Zhou
> ------------------------------
> *From:* Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> *Sent:* Tuesday, November 7, 2017 2:00:57 AM
> *To:* Zhou, David(ChunMing); amd-gfx list
> *Cc:* Deucher, Alexander
> *Subject:* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on
> non PL11 asics
>
> Sorry my mail client seems to have blown up. My reply got cut off,
> here is the full version:
>
>
>
> On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> > Hi Andres,
> >
> Hi David,
>
> > With your this patch, OCLperf hung.
> Is this on all ASICs or just a specific one?
>
> >
> > Could you explain more?
> >
> > If I am correctly, the difference of with and without this patch is
> > setting first two queue or setting all queues of pipe0 to queue_bitmap.
> It is slightly different. With this patch we will also use the first
> two queues of all pipes, not just pipe 0;
>
> Pre-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11111111  00000000  00000000  00000000
>
> Post-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11000000  11000000  11000000  11000000
>
> What this means is that we are allowing real multithreading for
> compute. Jobs on different pipes allow for parallel execution of work.
> Jobs on the same pipe (but different queues) use timeslicing to share
> the hardware.
>
>
> >
> > Then UMD can use different number queue to submit command for compute
> > selected by amdgpu_queue_mgr_map.
> >
> > I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> > ring to different hw ring depending on busy or idle, right?
> Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what
> the mapping is for a usermode ring to a kernel ring id.
>
> > If yes, I see a bug in it, which will result in our sched_fence not
> > work. Our sched fence assumes the job will be executed in order, your
> > mapping queue breaks this.
>
> I think here you mean that work will execute out of order because it
> will go to different rings?
>
> That should not happen, since the id mapping is permanent on a
> per-context basis. Once a mapping is decided, it will be cached for
> this context so that we keep execution order guarantees. See the
> id-caching code in amdgpu_queue_mgr.c for reference.
>
> As long as the usermode keeps submitting work to the same ring, it
> will all be executed in order (all in the same ring). There is no
> change in this guarantee compared to pre-patch. Note that even before
> this patch amdgpu_queue_mgr_map has been using an LRU policy for a
> long time now.
>
> Regards,
> Andres
>
> On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> wrote:
> >
> >
> > On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> >>
> >> Hi Andres,
> >>
> >
> > Hi David,
> >
> >> With your this patch, OCLperf hung.
> >
> >
> > Is this on all ASICs or just a specific one?
> >
> >>
> >> Could you explain more?
> >>
> >> If I am correctly, the difference of with and without this patch is
> >> setting first two queue or setting all queues of pipe0 to queue_bitmap.
> >
> >
> > It is slightly different. With this patch we will also use the first two
> > queues of all pipes, not just pipe 0;
> >
> > Pre-patch:
> >
> > |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
> >  11111111  00000000  00000000  00000000
> >
> > Post-patch:
> >
> > |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
> >  11000000  11000000  11000000  11000000
> >
> > What this means is that we are allowing real multithreading for compute.
> > Jobs on different pipes allow for parallel execution of work. Jobs on the
> > same pipe (but different queues) use timeslicing to share the hardware.
> >
> >
> >
> >>
> >> Then UMD can use different number queue to submit command for compute
> >> selected by amdgpu_queue_mgr_map.
> >>
> >> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> ring
> >> to different hw ring depending on busy or idle, right?
> >>
> >> If yes, I see a bug in it, which will result in our sched_fence not
> work.
> >> Our sched fence assumes the job will be executed in order, your mapping
> >> queue breaks this.
> >>
> >>
> >> Regards,
> >>
> >> David Zhou
> >>
> >>
> >> On 2017年09月27日 00:22, Andres Rodriguez wrote:
> >>>
> >>> A performance regression for OpenCL tests on Polaris11 had this feature
> >>> disabled for all asics.
> >>>
> >>> Instead, disable it selectively on the affected asics.
> >>>
> >>> Signed-off-by: Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
> >>>   1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> index 4f6c68f..3d76e76 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> >>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask,
> >>> unsigned max_se, unsigned max_s
> >>>       }
> >>>   }
> >>> +static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device
> *adev)
> >>> +{
> >>> +    /* FIXME: spreading the queues across pipes causes perf
> regressions
> >>> +     * on POLARIS11 compute workloads */
> >>> +    if (adev->asic_type == CHIP_POLARIS11)
> >>> +        return false;
> >>> +
> >>> +    return adev->gfx.mec.num_mec > 1;
> >>> +}
> >>> +
> >>>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
> >>>   {
> >>>       int i, queue, pipe, mec;
> >>> +    bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
> >>>       /* policy for amdgpu compute queue ownership */
> >>>       for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
> >>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct
> >>> amdgpu_device *adev)
> >>>           if (mec >= adev->gfx.mec.num_mec)
> >>>               break;
> >>> -        /* FIXME: spreading the queues across pipes causes perf
> >>> regressions */
> >>> -        if (0) {
> >>> +        if (multipipe_policy) {
> >>>               /* policy: amdgpu owns the first two queues of the first
> >>> MEC */
> >>>               if (mec == 0 && queue < 2)
> >>>                   set_bit(i, adev->gfx.mec.queue_bitmap);
> >>
> >>
> >
>

[-- Attachment #1.2: Type: text/html, Size: 9747 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics
       [not found]                       ` <CAFQ_0eEqvKgUEUicckuPtvjZq-f4BXdhQO_p8hPVvy4Jh_KQgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-11-07  9:23                         ` Zhou, David(ChunMing)
       [not found]                           ` <MWHPR1201MB02066E434C312ECA541C51C8B4510-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Zhou, David(ChunMing) @ 2017-11-07  9:23 UTC (permalink / raw)
  To: Andres Rodriguez; +Cc: Deucher, Alexander, amd-gfx list


[-- Attachment #1.1: Type: text/plain, Size: 7785 bytes --]

I got the infomation about this issue:

"

  *   If I install #491963 (failed in the report) with “./amdgpu-pro-install -y --opencl=legacy” command, the test passed. It failed when rocm is also installed with “./amdgpu-pro-install -y --opencl=legacy,rocm” command."

So I guess the hung is related to the pipe is used both ORCA and rocm. But Felix said they dont support rocm on Tonga, that could mean this issue doesn't matter currently.


Regards,

David  Zhou

________________________________
From: Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Tuesday, November 7, 2017 3:26:38 PM
To: Zhou, David(ChunMing)
Cc: amd-gfx list; Deucher, Alexander
Subject: Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics

Do you have any work actually going into multiple pipes? My understanding is that opencl will only use one queue at a time (but I'm not really certain about that).

What you can also check is if the app works correctly when it executed on pipe0, and if it hangs on pipe 1+. I removed all the locations where pipe0 was hardcoded in the open driver, but it is possible it is still hardcoded somewhere on the closed stack.

Regards,
Andres

On Nov 6, 2017 10:19 PM, "Zhou, David(ChunMing)" <David1.Zhou-5C7GfCeVMHo@public.gmane.org<mailto:David1.Zhou-5C7GfCeVMHo@public.gmane.org>> wrote:

Then snychronization should have no problem, it maybe relate to multipipe hw setting issue.


Regards,

David Zhou

________________________________
From: Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org<mailto:andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>>
Sent: Tuesday, November 7, 2017 2:00:57 AM
To: Zhou, David(ChunMing); amd-gfx list
Cc: Deucher, Alexander
Subject: Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics

Sorry my mail client seems to have blown up. My reply got cut off,
here is the full version:



On 2017-11-06 01:49 AM, Chunming Zhou wrote:
> Hi Andres,
>
Hi David,

> With your this patch, OCLperf hung.
Is this on all ASICs or just a specific one?

>
> Could you explain more?
>
> If I am correctly, the difference of with and without this patch is
> setting first two queue or setting all queues of pipe0 to queue_bitmap.
It is slightly different. With this patch we will also use the first
two queues of all pipes, not just pipe 0;

Pre-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11111111  00000000  00000000  00000000

Post-patch:

|-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
 11000000  11000000  11000000  11000000

What this means is that we are allowing real multithreading for
compute. Jobs on different pipes allow for parallel execution of work.
Jobs on the same pipe (but different queues) use timeslicing to share
the hardware.


>
> Then UMD can use different number queue to submit command for compute
> selected by amdgpu_queue_mgr_map.
>
> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user
> ring to different hw ring depending on busy or idle, right?
Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what
the mapping is for a usermode ring to a kernel ring id.

> If yes, I see a bug in it, which will result in our sched_fence not
> work. Our sched fence assumes the job will be executed in order, your
> mapping queue breaks this.

I think here you mean that work will execute out of order because it
will go to different rings?

That should not happen, since the id mapping is permanent on a
per-context basis. Once a mapping is decided, it will be cached for
this context so that we keep execution order guarantees. See the
id-caching code in amdgpu_queue_mgr.c for reference.

As long as the usermode keeps submitting work to the same ring, it
will all be executed in order (all in the same ring). There is no
change in this guarantee compared to pre-patch. Note that even before
this patch amdgpu_queue_mgr_map has been using an LRU policy for a
long time now.

Regards,
Andres

On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org<mailto:andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> wrote:
>
>
> On 2017-11-06 01:49 AM, Chunming Zhou wrote:
>>
>> Hi Andres,
>>
>
> Hi David,
>
>> With your this patch, OCLperf hung.
>
>
> Is this on all ASICs or just a specific one?
>
>>
>> Could you explain more?
>>
>> If I am correctly, the difference of with and without this patch is
>> setting first two queue or setting all queues of pipe0 to queue_bitmap.
>
>
> It is slightly different. With this patch we will also use the first two
> queues of all pipes, not just pipe 0;
>
> Pre-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11111111  00000000  00000000  00000000
>
> Post-patch:
>
> |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>  11000000  11000000  11000000  11000000
>
> What this means is that we are allowing real multithreading for compute.
> Jobs on different pipes allow for parallel execution of work. Jobs on the
> same pipe (but different queues) use timeslicing to share the hardware.
>
>
>
>>
>> Then UMD can use different number queue to submit command for compute
>> selected by amdgpu_queue_mgr_map.
>>
>> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map user ring
>> to different hw ring depending on busy or idle, right?
>>
>> If yes, I see a bug in it, which will result in our sched_fence not work.
>> Our sched fence assumes the job will be executed in order, your mapping
>> queue breaks this.
>>
>>
>> Regards,
>>
>> David Zhou
>>
>>
>> On 2017年09月27日 00:22, Andres Rodriguez wrote:
>>>
>>> A performance regression for OpenCL tests on Polaris11 had this feature
>>> disabled for all asics.
>>>
>>> Instead, disable it selectively on the affected asics.
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org<mailto:andresx7@gmail.com>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index 4f6c68f..3d76e76 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned *mask,
>>> unsigned max_se, unsigned max_s
>>>       }
>>>   }
>>> +static bool amdgpu_gfx_is_multipipe_capable(struct amdgpu_device *adev)
>>> +{
>>> +    /* FIXME: spreading the queues across pipes causes perf regressions
>>> +     * on POLARIS11 compute workloads */
>>> +    if (adev->asic_type == CHIP_POLARIS11)
>>> +        return false;
>>> +
>>> +    return adev->gfx.mec.num_mec > 1;
>>> +}
>>> +
>>>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device *adev)
>>>   {
>>>       int i, queue, pipe, mec;
>>> +    bool multipipe_policy = amdgpu_gfx_is_multipipe_capable(adev);
>>>       /* policy for amdgpu compute queue ownership */
>>>       for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
>>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct
>>> amdgpu_device *adev)
>>>           if (mec >= adev->gfx.mec.num_mec)
>>>               break;
>>> -        /* FIXME: spreading the queues across pipes causes perf
>>> regressions */
>>> -        if (0) {
>>> +        if (multipipe_policy) {
>>>               /* policy: amdgpu owns the first two queues of the first
>>> MEC */
>>>               if (mec == 0 && queue < 2)
>>>                   set_bit(i, adev->gfx.mec.queue_bitmap);
>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 13426 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics
       [not found]                           ` <MWHPR1201MB02066E434C312ECA541C51C8B4510-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-11-07 22:27                             ` Felix Kuehling
  0 siblings, 0 replies; 12+ messages in thread
From: Felix Kuehling @ 2017-11-07 22:27 UTC (permalink / raw)
  To: Zhou, David(ChunMing), Andres Rodriguez; +Cc: Deucher, Alexander, amd-gfx list

Not sure amd-gfx is the right list for this. The code I'm talking about
is not upstream yet, and the problem being discussed occurs on the DKMS
branch.


Even though we say we don't support Tonga (there are known issues with
ROCm on Tonga), KFD will detect and initialize it. That means the HWS is
initialized. If user mode initializes the ROCr Runtime (even if it's
just for device enumeration), I believe at least one user mode queue is
created, that will just sit idle. That idle user mode queue is probably
on pipe 0. The pipe will periodically switch between the idle user mode
queue and a kernel compute queue.


You can see a hexdump of MQDs of all KFD user mode queues in
/sys/kernel/debug/kfd/mqds. You can also inspect the HQDs in
/sys/kernel/debug/kfd/hqds.


Regards,
  Felix


On 2017-11-07 04:23 AM, Zhou, David(ChunMing) wrote:
>
> I got the infomation about this issue:
>
> "
>
> # If I install #491963 (failed in the report) with “./amdgpu-pro-install
> -y --opencl=legacy” command, the test passed. It failed when rocm is
> also installed with “./amdgpu-pro-install -y --opencl=legacy,rocm”
> command."
>
> So I guess the hung is related to the pipe is used both ORCA and rocm.
> But Felix said they dont support rocm on Tonga, that could mean this
> issue doesn't matter currently.
>
>
> Regards,
>
> David  Zhou
>
> ------------------------------------------------------------------------
> *From:* Andres Rodriguez <andresx7@gmail.com>
> *Sent:* Tuesday, November 7, 2017 3:26:38 PM
> *To:* Zhou, David(ChunMing)
> *Cc:* amd-gfx list; Deucher, Alexander
> *Subject:* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute policy on
> non PL11 asics
>  
> Do you have any work actually going into multiple pipes? My
> understanding is that opencl will only use one queue at a time (but
> I'm not really certain about that).
>
> What you can also check is if the app works correctly when it executed
> on pipe0, and if it hangs on pipe 1+. I removed all the locations
> where pipe0 was hardcoded in the open driver, but it is possible it is
> still hardcoded somewhere on the closed stack.
>
> Regards,
> Andres 
>
> On Nov 6, 2017 10:19 PM, "Zhou, David(ChunMing)" <David1.Zhou@amd.com
> <mailto:David1.Zhou@amd.com>> wrote:
>
>     Then snychronization should have no problem, it maybe relate to
>     multipipe hw setting issue.
>
>
>     Regards,
>
>     David Zhou
>
>     ------------------------------------------------------------------------
>     *From:* Andres Rodriguez <andresx7@gmail.com
>     <mailto:andresx7@gmail.com>>
>     *Sent:* Tuesday, November 7, 2017 2:00:57 AM
>     *To:* Zhou, David(ChunMing); amd-gfx list
>     *Cc:* Deucher, Alexander
>     *Subject:* Re: [PATCH 1/2] drm/amdgpu: use multipipe compute
>     policy on non PL11 asics
>      
>     Sorry my mail client seems to have blown up. My reply got cut off,
>     here is the full version:
>
>
>
>     On 2017-11-06 01:49 AM, Chunming Zhou wrote:
>     > Hi Andres,
>     >
>     Hi David,
>
>     > With your this patch, OCLperf hung.
>     Is this on all ASICs or just a specific one?
>
>     >
>     > Could you explain more?
>     >
>     > If I am correctly, the difference of with and without this patch is
>     > setting first two queue or setting all queues of pipe0 to
>     queue_bitmap.
>     It is slightly different. With this patch we will also use the first
>     two queues of all pipes, not just pipe 0;
>
>     Pre-patch:
>
>     |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>      11111111  00000000  00000000  00000000
>
>     Post-patch:
>
>     |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>      11000000  11000000  11000000  11000000
>
>     What this means is that we are allowing real multithreading for
>     compute. Jobs on different pipes allow for parallel execution of work.
>     Jobs on the same pipe (but different queues) use timeslicing to share
>     the hardware.
>
>
>     >
>     > Then UMD can use different number queue to submit command for
>     compute
>     > selected by amdgpu_queue_mgr_map.
>     >
>     > I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can map
>     user
>     > ring to different hw ring depending on busy or idle, right?
>     Yes, when a queue is first used, amdgpu_queue_mgr_map will decide what
>     the mapping is for a usermode ring to a kernel ring id.
>
>     > If yes, I see a bug in it, which will result in our sched_fence not
>     > work. Our sched fence assumes the job will be executed in order,
>     your
>     > mapping queue breaks this.
>
>     I think here you mean that work will execute out of order because it
>     will go to different rings?
>
>     That should not happen, since the id mapping is permanent on a
>     per-context basis. Once a mapping is decided, it will be cached for
>     this context so that we keep execution order guarantees. See the
>     id-caching code in amdgpu_queue_mgr.c for reference.
>
>     As long as the usermode keeps submitting work to the same ring, it
>     will all be executed in order (all in the same ring). There is no
>     change in this guarantee compared to pre-patch. Note that even before
>     this patch amdgpu_queue_mgr_map has been using an LRU policy for a
>     long time now.
>
>     Regards,
>     Andres
>
>     On Mon, Nov 6, 2017 at 12:44 PM, Andres Rodriguez
>     <andresx7@gmail.com <mailto:andresx7@gmail.com>> wrote:
>     >
>     >
>     > On 2017-11-06 01:49 AM, Chunming Zhou wrote:
>     >>
>     >> Hi Andres,
>     >>
>     >
>     > Hi David,
>     >
>     >> With your this patch, OCLperf hung.
>     >
>     >
>     > Is this on all ASICs or just a specific one?
>     >
>     >>
>     >> Could you explain more?
>     >>
>     >> If I am correctly, the difference of with and without this patch is
>     >> setting first two queue or setting all queues of pipe0 to
>     queue_bitmap.
>     >
>     >
>     > It is slightly different. With this patch we will also use the
>     first two
>     > queues of all pipes, not just pipe 0;
>     >
>     > Pre-patch:
>     >
>     > |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>     >  11111111  00000000  00000000  00000000
>     >
>     > Post-patch:
>     >
>     > |-Pipe 0-||-Pipe 1-||-Pipe 2-||-Pipe 3-|
>     >  11000000  11000000  11000000  11000000
>     >
>     > What this means is that we are allowing real multithreading for
>     compute.
>     > Jobs on different pipes allow for parallel execution of work.
>     Jobs on the
>     > same pipe (but different queues) use timeslicing to share the
>     hardware.
>     >
>     >
>     >
>     >>
>     >> Then UMD can use different number queue to submit command for
>     compute
>     >> selected by amdgpu_queue_mgr_map.
>     >>
>     >> I checked amdgpu_queue_mgr_map implementation,  CS_IOCTL can
>     map user ring
>     >> to different hw ring depending on busy or idle, right?
>     >>
>     >> If yes, I see a bug in it, which will result in our sched_fence
>     not work.
>     >> Our sched fence assumes the job will be executed in order, your
>     mapping
>     >> queue breaks this.
>     >>
>     >>
>     >> Regards,
>     >>
>     >> David Zhou
>     >>
>     >>
>     >> On 2017年09月27日 00:22, Andres Rodriguez wrote:
>     >>>
>     >>> A performance regression for OpenCL tests on Polaris11 had
>     this feature
>     >>> disabled for all asics.
>     >>>
>     >>> Instead, disable it selectively on the affected asics.
>     >>>
>     >>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com
>     <mailto:andresx7@gmail.com>>
>     >>> ---
>     >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 14 ++++++++++++--
>     >>>   1 file changed, 12 insertions(+), 2 deletions(-)
>     >>>
>     >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>     >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>     >>> index 4f6c68f..3d76e76 100644
>     >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>     >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>     >>> @@ -109,9 +109,20 @@ void amdgpu_gfx_parse_disable_cu(unsigned
>     *mask,
>     >>> unsigned max_se, unsigned max_s
>     >>>       }
>     >>>   }
>     >>> +static bool amdgpu_gfx_is_multipipe_capable(struct
>     amdgpu_device *adev)
>     >>> +{
>     >>> +    /* FIXME: spreading the queues across pipes causes perf
>     regressions
>     >>> +     * on POLARIS11 compute workloads */
>     >>> +    if (adev->asic_type == CHIP_POLARIS11)
>     >>> +        return false;
>     >>> +
>     >>> +    return adev->gfx.mec.num_mec > 1;
>     >>> +}
>     >>> +
>     >>>   void amdgpu_gfx_compute_queue_acquire(struct amdgpu_device
>     *adev)
>     >>>   {
>     >>>       int i, queue, pipe, mec;
>     >>> +    bool multipipe_policy =
>     amdgpu_gfx_is_multipipe_capable(adev);
>     >>>       /* policy for amdgpu compute queue ownership */
>     >>>       for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
>     >>> @@ -125,8 +136,7 @@ void amdgpu_gfx_compute_queue_acquire(struct
>     >>> amdgpu_device *adev)
>     >>>           if (mec >= adev->gfx.mec.num_mec)
>     >>>               break;
>     >>> -        /* FIXME: spreading the queues across pipes causes perf
>     >>> regressions */
>     >>> -        if (0) {
>     >>> +        if (multipipe_policy) {
>     >>>               /* policy: amdgpu owns the first two queues of
>     the first
>     >>> MEC */
>     >>>               if (mec == 0 && queue < 2)
>     >>>                   set_bit(i, adev->gfx.mec.queue_bitmap);
>     >>
>     >>
>     >
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-11-07 22:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-26 16:22 [PATCH 0/2] Selectively spread compute rings across pipes Andres Rodriguez
     [not found] ` <20170926162246.3200-1-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-26 16:22   ` [PATCH 1/2] drm/amdgpu: use multipipe compute policy on non PL11 asics Andres Rodriguez
     [not found]     ` <20170926162246.3200-2-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-06  6:49       ` Chunming Zhou
     [not found]         ` <2200a1fa-d1d3-bd92-825d-987ecd355dac@gmail.com>
     [not found]           ` <2200a1fa-d1d3-bd92-825d-987ecd355dac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-06 18:00             ` Andres Rodriguez
     [not found]               ` <CAFQ_0eFcs+DLu6sEEEtX1GJU+ccb1B2bHPSkrAdk3OUVfqq3zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-07  3:19                 ` Zhou, David(ChunMing)
     [not found]                   ` <MWHPR1201MB0206CEC26FF751CB39B529D6B4510-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-11-07  7:26                     ` Andres Rodriguez
     [not found]                       ` <CAFQ_0eEqvKgUEUicckuPtvjZq-f4BXdhQO_p8hPVvy4Jh_KQgQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-07  9:23                         ` Zhou, David(ChunMing)
     [not found]                           ` <MWHPR1201MB02066E434C312ECA541C51C8B4510-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-11-07 22:27                             ` Felix Kuehling
2017-09-26 16:22   ` [PATCH 2/2] drm/amdgpu: add option for force enable multipipe policy for compute Andres Rodriguez
     [not found]     ` <20170926162246.3200-3-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-29  8:59       ` Michel Dänzer
2017-09-26 18:15   ` [PATCH 0/2] Selectively spread compute rings across pipes Felix Kuehling
     [not found]     ` <63adcb76-1a0b-e351-c70a-08761b02120e-5C7GfCeVMHo@public.gmane.org>
2017-09-28 19:03       ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.