All of lore.kernel.org
 help / color / mirror / Atom feed
* GART write flush error on SI w/ amdgpu
@ 2017-05-09 12:13 Nicolai Hähnle
       [not found] ` <c53a99e5-eea0-17d1-05e6-b6e66746d774-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Nicolai Hähnle @ 2017-05-09 12:13 UTC (permalink / raw)
  To: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 3016 bytes --]

Hi all,

I'm seeing some very strange errors on Verde with CPU readback from 
GART, and am pretty much out of ideas. Some help would be very much 
appreciated.

The error manifests with the 
GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu, 
but *not* on radeon. Here's what the test does:

1. Upload a texture.
2. Read the texture back via a shader that uses shader buffer writes to 
write data to a buffer that is allocated in GART.
3. The CPU then reads from the buffer -- and sometimes gets stale data.

This sequence is repeated for many sub-tests. There are some sub-tests 
where the CPU reads stale data from the buffer, i.e. the shader writes 
simply don't make it to the CPU. The tests vary superficially, e.g. the 
first failing test is (almost?) always one where data is written in 
16-bit words (but there are succeeding sub-tests with 16-bit writes as 
well).

The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);) 
between the fence wait and the return of glMapBuffer does not fix the 
problem. The data must be stuck in a cache somewhere.

Since the test runs okay with the radeon module, I tried some changes 
based on comparing the IB submit between radeon and amdgpu, and based on 
comparing register settings via scans obtained from umr. Some of the 
things I've tried:

- Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and 
amdgpu/gfx9 set this)
- Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the 
vmid (radeon does this)
- Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of 
PFP (which seems more logical, and is done by gfx7+), or remove the 
corresponding WRITE_DATA entirely

None of these changes helped.

What *does* help is adding an artificial wait. Specifically, I'm adding 
a sequence of

- WRITE_DATA
- CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
- WAIT_REG_MEM

as can be seen in the attached patch. This works around the problem, but 
it makes no sense:

Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence 
works around the problem. However(!) it does not actually cause the UMD 
to wait any longer than before. Without this change, the UMD immediately 
sees a signaled user fence (and never uses an ioctl to wait), and with 
this change, it *still* sees a signaled user fence.

Also, note that the way I've hacked the change, the wait sequence is 
only added for the user fence emit (and I'm using a modified UMD to 
ensure that there is enough memory to be used by the added wait sequence).

Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around 
the problem.

So for whatever reason, the added wait sequence *before* the 
SURFACE_SYNC encourages some part of the GPU to flush the data from 
wherever it's stuck, and that's just really bizarre. There must be 
something really simple I'm missing, and any pointers would be appreciated.

Thanks,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

[-- Attachment #2: gfx6_hacks.diff --]
[-- Type: text/x-patch, Size: 6321 bytes --]

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 6e20536..c1715bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -226,20 +226,23 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 
 	/* wrap the last IB with fence */
 	if (job && job->uf_addr) {
 		amdgpu_ring_emit_fence(ring, job->vm_id, job->uf_addr, job->uf_sequence,
 				       AMDGPU_FENCE_FLAG_64BIT);
 	}
 
 	if (patch_offset != ~0 && ring->funcs->patch_cond_exec)
 		amdgpu_ring_patch_cond_exec(ring, patch_offset);
 
+// 	if (job && ring->funcs->emit_hack)
+// 		ring->funcs->emit_hack(ring, job->vm_id, job->uf_addr + 8);
+
 	ring->current_ctx = fence_ctx;
 	if (vm && ring->funcs->emit_switch_buffer)
 		amdgpu_ring_emit_switch_buffer(ring);
 	amdgpu_ring_commit(ring);
 	return 0;
 }
 
 /**
  * amdgpu_ib_pool_init - Init the IB (Indirect Buffer) pool
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 5f10aa6..2c37e9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -1867,37 +1867,89 @@ static void gfx_v6_0_ring_emit_vgt_flush(struct amdgpu_ring *ring)
 static void gfx_v6_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
 {
 	amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
 	amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(0) | /* engine = 0? */
 				 WRITE_DATA_DST_SEL(0)));
 	amdgpu_ring_write(ring, mmHDP_DEBUG0);
 	amdgpu_ring_write(ring, 0);
 	amdgpu_ring_write(ring, 0x1);
 }
 
+static void gfx_v6_0_ring_emit_hack(struct amdgpu_ring *ring, unsigned vm_id, uint64_t addr)
+{
+	bool write64bit = false;
+	bool int_sel = false;
+
+	amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
+	amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(0) |
+				 (1 << 20) | /* write confirm */
+				 WRITE_DATA_DST_SEL(1)));
+	amdgpu_ring_write(ring, addr & 0xfffffffc);
+	amdgpu_ring_write(ring, (upper_32_bits(addr) & 0xffff));
+	amdgpu_ring_write(ring, 0xdeadbeef);
+
+	amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP, 4));
+// 	amdgpu_ring_write(ring, EVENT_TYPE(BOTTOM_OF_PIPE_TS) | EVENT_INDEX(5));
+	amdgpu_ring_write(ring, EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5));
+	amdgpu_ring_write(ring, addr & 0xfffffffc);
+	amdgpu_ring_write(ring, (upper_32_bits(addr) & 0xffff) |
+				((write64bit ? 2 : 1) << CP_EOP_DONE_DATA_CNTL__DATA_SEL__SHIFT) |
+				((int_sel ? 2 : 0) << CP_EOP_DONE_DATA_CNTL__INT_SEL__SHIFT));
+	amdgpu_ring_write(ring, 0xcafecafe);
+	amdgpu_ring_write(ring, 0); /* unused */
+
+	amdgpu_ring_write(ring, PACKET3(PACKET3_WAIT_REG_MEM, 5));
+	amdgpu_ring_write(ring, (WAIT_REG_MEM_MEM_SPACE(1) | /* memory */
+				 WAIT_REG_MEM_FUNCTION(3) | /* equal */
+				 WAIT_REG_MEM_ENGINE(0)));   /* use me */
+	amdgpu_ring_write(ring, addr & 0xfffffffc);
+	amdgpu_ring_write(ring, upper_32_bits(addr) & 0xffffffff);
+	amdgpu_ring_write(ring, 0xcafecafe);
+	amdgpu_ring_write(ring, 0xffffffff);
+	amdgpu_ring_write(ring, 4); /* poll interval */
+}
+
 static void gfx_v6_0_ring_emit_fence(struct amdgpu_ring *ring, unsigned vmid, u64 addr,
 				     u64 seq, unsigned flags)
 {
 	bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
 	bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
+
+	if (write64bit)
+		gfx_v6_0_ring_emit_hack(ring, 0, addr + 8);
+
 	/* flush read cache over gart */
+	if (false && vmid != 0) {
+		amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+		amdgpu_ring_write(ring, (mmCP_COHER_CNTL2 - PACKET3_SET_CONFIG_REG_START));
+		amdgpu_ring_write(ring, vmid);
+		amdgpu_ring_write(ring, PACKET3(PACKET3_SURFACE_SYNC, 3));
+		amdgpu_ring_write(ring, PACKET3_TCL1_ACTION_ENA |
+				  PACKET3_TC_ACTION_ENA |
+				  PACKET3_SH_KCACHE_ACTION_ENA |
+				  PACKET3_SH_ICACHE_ACTION_ENA);
+		amdgpu_ring_write(ring, 0xFFFFFFFF);
+		amdgpu_ring_write(ring, 0);
+		amdgpu_ring_write(ring, 10); /* poll interval */
+	}
 	amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
 	amdgpu_ring_write(ring, (mmCP_COHER_CNTL2 - PACKET3_SET_CONFIG_REG_START));
 	amdgpu_ring_write(ring, 0);
 	amdgpu_ring_write(ring, PACKET3(PACKET3_SURFACE_SYNC, 3));
 	amdgpu_ring_write(ring, PACKET3_TCL1_ACTION_ENA |
 			  PACKET3_TC_ACTION_ENA |
 			  PACKET3_SH_KCACHE_ACTION_ENA |
 			  PACKET3_SH_ICACHE_ACTION_ENA);
 	amdgpu_ring_write(ring, 0xFFFFFFFF);
 	amdgpu_ring_write(ring, 0);
 	amdgpu_ring_write(ring, 10); /* poll interval */
+
 	/* EVENT_WRITE_EOP - flush caches, send int */
 	amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE_EOP, 4));
 	amdgpu_ring_write(ring, EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5));
 	amdgpu_ring_write(ring, addr & 0xfffffffc);
 	amdgpu_ring_write(ring, (upper_32_bits(addr) & 0xffff) |
 				((write64bit ? 2 : 1) << CP_EOP_DONE_DATA_CNTL__DATA_SEL__SHIFT) |
 				((int_sel ? 2 : 0) << CP_EOP_DONE_DATA_CNTL__INT_SEL__SHIFT));
 	amdgpu_ring_write(ring, lower_32_bits(seq));
 	amdgpu_ring_write(ring, upper_32_bits(seq));
 }
@@ -3640,28 +3692,30 @@ static const struct amd_ip_funcs gfx_v6_0_ip_funcs = {
 
 static const struct amdgpu_ring_funcs gfx_v6_0_ring_funcs_gfx = {
 	.type = AMDGPU_RING_TYPE_GFX,
 	.align_mask = 0xff,
 	.nop = 0x80000000,
 	.support_64bit_ptrs = false,
 	.get_rptr = gfx_v6_0_ring_get_rptr,
 	.get_wptr = gfx_v6_0_ring_get_wptr,
 	.set_wptr = gfx_v6_0_ring_set_wptr_gfx,
 	.emit_frame_size =
+		32 + /* slack for hack */
 		5 + /* gfx_v6_0_ring_emit_hdp_flush */
 		5 + /* gfx_v6_0_ring_emit_hdp_invalidate */
 		14 + 14 + 14 + /* gfx_v6_0_ring_emit_fence x3 for user fence, vm fence */
 		7 + 4 + /* gfx_v6_0_ring_emit_pipeline_sync */
 		17 + 6 + /* gfx_v6_0_ring_emit_vm_flush */
 		3 + 2, /* gfx_v6_ring_emit_cntxcntl including vgt flush */
 	.emit_ib_size = 6 + 8, /* gfx_v6_0_ring_emit_ib */
 	.emit_ib = gfx_v6_0_ring_emit_ib,
+	.emit_hack = gfx_v6_0_ring_emit_hack,
 	.emit_fence = gfx_v6_0_ring_emit_fence,
 	.emit_pipeline_sync = gfx_v6_0_ring_emit_pipeline_sync,
 	.emit_vm_flush = gfx_v6_0_ring_emit_vm_flush,
 	.emit_hdp_flush = gfx_v6_0_ring_emit_hdp_flush,
 	.emit_hdp_invalidate = gfx_v6_0_ring_emit_hdp_invalidate,
 	.test_ring = gfx_v6_0_ring_test_ring,
 	.test_ib = gfx_v6_0_ring_test_ib,
 	.insert_nop = amdgpu_ring_insert_nop,
 	.emit_cntxcntl = gfx_v6_ring_emit_cntxcntl,
 };

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
       [not found] ` <c53a99e5-eea0-17d1-05e6-b6e66746d774-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-05-26 15:47   ` Marek Olšák
       [not found]     ` <CAAxE2A7Q=qdxtm6HXF3Py9dYpARvA6+Z4ENkWTCEA572SdNEqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Olšák @ 2017-05-26 15:47 UTC (permalink / raw)
  To: Nicolai Hähnle; +Cc: amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 4139 bytes --]

On Tue, May 9, 2017 at 2:13 PM, Nicolai Hähnle <nhaehnle-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi all,
>
> I'm seeing some very strange errors on Verde with CPU readback from GART,
> and am pretty much out of ideas. Some help would be very much appreciated.
>
> The error manifests with the
> GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu,
but
> *not* on radeon. Here's what the test does:
>
> 1. Upload a texture.
> 2. Read the texture back via a shader that uses shader buffer writes to
> write data to a buffer that is allocated in GART.
> 3. The CPU then reads from the buffer -- and sometimes gets stale data.
>
> This sequence is repeated for many sub-tests. There are some sub-tests
where
> the CPU reads stale data from the buffer, i.e. the shader writes simply
> don't make it to the CPU. The tests vary superficially, e.g. the first
> failing test is (almost?) always one where data is written in 16-bit words
> (but there are succeeding sub-tests with 16-bit writes as well).
>
> The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);)
> between the fence wait and the return of glMapBuffer does not fix the
> problem. The data must be stuck in a cache somewhere.
>
> Since the test runs okay with the radeon module, I tried some changes
based
> on comparing the IB submit between radeon and amdgpu, and based on
comparing
> register settings via scans obtained from umr. Some of the things I've
> tried:
>
> - Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and
amdgpu/gfx9
> set this)
> - Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the vmid
> (radeon does this)
> - Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of
PFP
> (which seems more logical, and is done by gfx7+), or remove the
> corresponding WRITE_DATA entirely
>
> None of these changes helped.
>
> What *does* help is adding an artificial wait. Specifically, I'm adding a
> sequence of
>
> - WRITE_DATA
> - CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
> - WAIT_REG_MEM
>
> as can be seen in the attached patch. This works around the problem, but
it
> makes no sense:
>
> Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence
works
> around the problem. However(!) it does not actually cause the UMD to wait
> any longer than before. Without this change, the UMD immediately sees a
> signaled user fence (and never uses an ioctl to wait), and with this
change,
> it *still* sees a signaled user fence.
>
> Also, note that the way I've hacked the change, the wait sequence is only
> added for the user fence emit (and I'm using a modified UMD to ensure that
> there is enough memory to be used by the added wait sequence).
>
> Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around
the
> problem.
>
> So for whatever reason, the added wait sequence *before* the SURFACE_SYNC
> encourages some part of the GPU to flush the data from wherever it's
stuck,
> and that's just really bizarre. There must be something really simple I'm
> missing, and any pointers would be appreciated.

Have you tried this?

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
b/src/gallium/drivers/radeonsi/si_hw_context.c
index 92c09cb..e6ac0ba 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -133,7 +133,8 @@ void si_context_gfx_flush(void *context, unsigned flags,
                        SI_CONTEXT_PS_PARTIAL_FLUSH;

        /* DRM 3.1.0 doesn't flush TC for VI correctly. */
-       if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
+       if ((ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
||
+           (ctx->b.chip_class == SI && ctx->b.screen->info.drm_major == 3))
                ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 |
                                SI_CONTEXT_INV_VMEM_L1;

One more cache flush there shouldn't hurt.

Also, Mesa uses PFP_SYNC_ME. It shouldn't be necessary, but it's worth a
try.

Marek

[-- Attachment #1.2: Type: text/html, Size: 4843 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
       [not found]     ` <CAAxE2A7Q=qdxtm6HXF3Py9dYpARvA6+Z4ENkWTCEA572SdNEqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-20 10:34       ` Marek Olšák
       [not found]         ` <CAAxE2A7ZApX2aO1nJKwbqMWWjX4ZL6zTdtsCbEA946J2CmtjrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Olšák @ 2017-06-20 10:34 UTC (permalink / raw)
  To: Nicolai Hähnle; +Cc: amd-gfx mailing list

BTW, I noticed the flush sequence in the kernel is wrong. The correct
flush sequence should be:

1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
but no fence/interrupt.
2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
3) SURFACE_SYNC (TC, K$, I$)
4) Write CP_COHER_CNTL2.
5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the interrupt.

WAIT_REG_MEM wouldn't be needed if we were able to merge
CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
event.

The main issue with the current flush sequence in radeon and amdgpu is
that it doesn't wait for idle before writing CP_COHER_CNTL2 and
SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
idle in userspace IBs.

Marek


On Fri, May 26, 2017 at 5:47 PM, Marek Olšák <maraeo@gmail.com> wrote:
> On Tue, May 9, 2017 at 2:13 PM, Nicolai Hähnle <nhaehnle@gmail.com> wrote:
>> Hi all,
>>
>> I'm seeing some very strange errors on Verde with CPU readback from GART,
>> and am pretty much out of ideas. Some help would be very much appreciated.
>>
>> The error manifests with the
>> GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu,
>> but
>> *not* on radeon. Here's what the test does:
>>
>> 1. Upload a texture.
>> 2. Read the texture back via a shader that uses shader buffer writes to
>> write data to a buffer that is allocated in GART.
>> 3. The CPU then reads from the buffer -- and sometimes gets stale data.
>>
>> This sequence is repeated for many sub-tests. There are some sub-tests
>> where
>> the CPU reads stale data from the buffer, i.e. the shader writes simply
>> don't make it to the CPU. The tests vary superficially, e.g. the first
>> failing test is (almost?) always one where data is written in 16-bit words
>> (but there are succeeding sub-tests with 16-bit writes as well).
>>
>> The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);)
>> between the fence wait and the return of glMapBuffer does not fix the
>> problem. The data must be stuck in a cache somewhere.
>>
>> Since the test runs okay with the radeon module, I tried some changes
>> based
>> on comparing the IB submit between radeon and amdgpu, and based on
>> comparing
>> register settings via scans obtained from umr. Some of the things I've
>> tried:
>>
>> - Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and
>> amdgpu/gfx9
>> set this)
>> - Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the vmid
>> (radeon does this)
>> - Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of
>> PFP
>> (which seems more logical, and is done by gfx7+), or remove the
>> corresponding WRITE_DATA entirely
>>
>> None of these changes helped.
>>
>> What *does* help is adding an artificial wait. Specifically, I'm adding a
>> sequence of
>>
>> - WRITE_DATA
>> - CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
>> - WAIT_REG_MEM
>>
>> as can be seen in the attached patch. This works around the problem, but
>> it
>> makes no sense:
>>
>> Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence
>> works
>> around the problem. However(!) it does not actually cause the UMD to wait
>> any longer than before. Without this change, the UMD immediately sees a
>> signaled user fence (and never uses an ioctl to wait), and with this
>> change,
>> it *still* sees a signaled user fence.
>>
>> Also, note that the way I've hacked the change, the wait sequence is only
>> added for the user fence emit (and I'm using a modified UMD to ensure that
>> there is enough memory to be used by the added wait sequence).
>>
>> Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around
>> the
>> problem.
>>
>> So for whatever reason, the added wait sequence *before* the SURFACE_SYNC
>> encourages some part of the GPU to flush the data from wherever it's
>> stuck,
>> and that's just really bizarre. There must be something really simple I'm
>> missing, and any pointers would be appreciated.
>
> Have you tried this?
>
> diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
> b/src/gallium/drivers/radeonsi/si_hw_context.c
> index 92c09cb..e6ac0ba 100644
> --- a/src/gallium/drivers/radeonsi/si_hw_context.c
> +++ b/src/gallium/drivers/radeonsi/si_hw_context.c
> @@ -133,7 +133,8 @@ void si_context_gfx_flush(void *context, unsigned flags,
>                         SI_CONTEXT_PS_PARTIAL_FLUSH;
>
>         /* DRM 3.1.0 doesn't flush TC for VI correctly. */
> -       if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
> +       if ((ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
> ||
> +           (ctx->b.chip_class == SI && ctx->b.screen->info.drm_major == 3))
>                 ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 |
>                                 SI_CONTEXT_INV_VMEM_L1;
>
> One more cache flush there shouldn't hurt.
>
> Also, Mesa uses PFP_SYNC_ME. It shouldn't be necessary, but it's worth a
> try.
>
> Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
       [not found]         ` <CAAxE2A7ZApX2aO1nJKwbqMWWjX4ZL6zTdtsCbEA946J2CmtjrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-20 11:46           ` Christian König
       [not found]             ` <aeec0bc1-4814-ee82-8e59-d460e74a8ba3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  2017-06-20 11:49           ` Nicolai Hähnle
  1 sibling, 1 reply; 9+ messages in thread
From: Christian König @ 2017-06-20 11:46 UTC (permalink / raw)
  To: Marek Olšák, Nicolai Hähnle; +Cc: amd-gfx mailing list

Am 20.06.2017 um 12:34 schrieb Marek Olšák:
> BTW, I noticed the flush sequence in the kernel is wrong. The correct
> flush sequence should be:
>
> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
> but no fence/interrupt.
> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
> 3) SURFACE_SYNC (TC, K$, I$)
> 4) Write CP_COHER_CNTL2.
> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the interrupt.
>
> WAIT_REG_MEM wouldn't be needed if we were able to merge
> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
> event.
>
> The main issue with the current flush sequence in radeon and amdgpu is
> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
> idle in userspace IBs.

Well not waiting for idle between IBs is an explicit requirement, 
because it is rather bad for performance to do so.

David Zhou, Monk and I worked quite a lot on this to avoid both possible 
hazard and performance drop.

Christian.

>
> Marek
>
>
> On Fri, May 26, 2017 at 5:47 PM, Marek Olšák <maraeo@gmail.com> wrote:
>> On Tue, May 9, 2017 at 2:13 PM, Nicolai Hähnle <nhaehnle@gmail.com> wrote:
>>> Hi all,
>>>
>>> I'm seeing some very strange errors on Verde with CPU readback from GART,
>>> and am pretty much out of ideas. Some help would be very much appreciated.
>>>
>>> The error manifests with the
>>> GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu,
>>> but
>>> *not* on radeon. Here's what the test does:
>>>
>>> 1. Upload a texture.
>>> 2. Read the texture back via a shader that uses shader buffer writes to
>>> write data to a buffer that is allocated in GART.
>>> 3. The CPU then reads from the buffer -- and sometimes gets stale data.
>>>
>>> This sequence is repeated for many sub-tests. There are some sub-tests
>>> where
>>> the CPU reads stale data from the buffer, i.e. the shader writes simply
>>> don't make it to the CPU. The tests vary superficially, e.g. the first
>>> failing test is (almost?) always one where data is written in 16-bit words
>>> (but there are succeeding sub-tests with 16-bit writes as well).
>>>
>>> The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);)
>>> between the fence wait and the return of glMapBuffer does not fix the
>>> problem. The data must be stuck in a cache somewhere.
>>>
>>> Since the test runs okay with the radeon module, I tried some changes
>>> based
>>> on comparing the IB submit between radeon and amdgpu, and based on
>>> comparing
>>> register settings via scans obtained from umr. Some of the things I've
>>> tried:
>>>
>>> - Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and
>>> amdgpu/gfx9
>>> set this)
>>> - Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the vmid
>>> (radeon does this)
>>> - Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of
>>> PFP
>>> (which seems more logical, and is done by gfx7+), or remove the
>>> corresponding WRITE_DATA entirely
>>>
>>> None of these changes helped.
>>>
>>> What *does* help is adding an artificial wait. Specifically, I'm adding a
>>> sequence of
>>>
>>> - WRITE_DATA
>>> - CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
>>> - WAIT_REG_MEM
>>>
>>> as can be seen in the attached patch. This works around the problem, but
>>> it
>>> makes no sense:
>>>
>>> Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence
>>> works
>>> around the problem. However(!) it does not actually cause the UMD to wait
>>> any longer than before. Without this change, the UMD immediately sees a
>>> signaled user fence (and never uses an ioctl to wait), and with this
>>> change,
>>> it *still* sees a signaled user fence.
>>>
>>> Also, note that the way I've hacked the change, the wait sequence is only
>>> added for the user fence emit (and I'm using a modified UMD to ensure that
>>> there is enough memory to be used by the added wait sequence).
>>>
>>> Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around
>>> the
>>> problem.
>>>
>>> So for whatever reason, the added wait sequence *before* the SURFACE_SYNC
>>> encourages some part of the GPU to flush the data from wherever it's
>>> stuck,
>>> and that's just really bizarre. There must be something really simple I'm
>>> missing, and any pointers would be appreciated.
>> Have you tried this?
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
>> b/src/gallium/drivers/radeonsi/si_hw_context.c
>> index 92c09cb..e6ac0ba 100644
>> --- a/src/gallium/drivers/radeonsi/si_hw_context.c
>> +++ b/src/gallium/drivers/radeonsi/si_hw_context.c
>> @@ -133,7 +133,8 @@ void si_context_gfx_flush(void *context, unsigned flags,
>>                          SI_CONTEXT_PS_PARTIAL_FLUSH;
>>
>>          /* DRM 3.1.0 doesn't flush TC for VI correctly. */
>> -       if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
>> +       if ((ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
>> ||
>> +           (ctx->b.chip_class == SI && ctx->b.screen->info.drm_major == 3))
>>                  ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 |
>>                                  SI_CONTEXT_INV_VMEM_L1;
>>
>> One more cache flush there shouldn't hurt.
>>
>> Also, Mesa uses PFP_SYNC_ME. It shouldn't be necessary, but it's worth a
>> try.
>>
>> Marek
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
       [not found]         ` <CAAxE2A7ZApX2aO1nJKwbqMWWjX4ZL6zTdtsCbEA946J2CmtjrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-06-20 11:46           ` Christian König
@ 2017-06-20 11:49           ` Nicolai Hähnle
       [not found]             ` <ed0120b2-f34e-6f95-379f-d038574ec346-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Nicolai Hähnle @ 2017-06-20 11:49 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

On 20.06.2017 12:34, Marek Olšák wrote:
> BTW, I noticed the flush sequence in the kernel is wrong. The correct
> flush sequence should be:
> 
> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
> but no fence/interrupt.
> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
> 3) SURFACE_SYNC (TC, K$, I$)
> 4) Write CP_COHER_CNTL2.
> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the interrupt.
> 
> WAIT_REG_MEM wouldn't be needed if we were able to merge
> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
> event.
> 
> The main issue with the current flush sequence in radeon and amdgpu is
> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
> idle in userspace IBs.

This is gfx9-only though, right?

Cheers,
Nicolai


> 
> Marek
> 
> 
> On Fri, May 26, 2017 at 5:47 PM, Marek Olšák <maraeo@gmail.com> wrote:
>> On Tue, May 9, 2017 at 2:13 PM, Nicolai Hähnle <nhaehnle@gmail.com> wrote:
>>> Hi all,
>>>
>>> I'm seeing some very strange errors on Verde with CPU readback from GART,
>>> and am pretty much out of ideas. Some help would be very much appreciated.
>>>
>>> The error manifests with the
>>> GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu,
>>> but
>>> *not* on radeon. Here's what the test does:
>>>
>>> 1. Upload a texture.
>>> 2. Read the texture back via a shader that uses shader buffer writes to
>>> write data to a buffer that is allocated in GART.
>>> 3. The CPU then reads from the buffer -- and sometimes gets stale data.
>>>
>>> This sequence is repeated for many sub-tests. There are some sub-tests
>>> where
>>> the CPU reads stale data from the buffer, i.e. the shader writes simply
>>> don't make it to the CPU. The tests vary superficially, e.g. the first
>>> failing test is (almost?) always one where data is written in 16-bit words
>>> (but there are succeeding sub-tests with 16-bit writes as well).
>>>
>>> The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);)
>>> between the fence wait and the return of glMapBuffer does not fix the
>>> problem. The data must be stuck in a cache somewhere.
>>>
>>> Since the test runs okay with the radeon module, I tried some changes
>>> based
>>> on comparing the IB submit between radeon and amdgpu, and based on
>>> comparing
>>> register settings via scans obtained from umr. Some of the things I've
>>> tried:
>>>
>>> - Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and
>>> amdgpu/gfx9
>>> set this)
>>> - Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the vmid
>>> (radeon does this)
>>> - Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of
>>> PFP
>>> (which seems more logical, and is done by gfx7+), or remove the
>>> corresponding WRITE_DATA entirely
>>>
>>> None of these changes helped.
>>>
>>> What *does* help is adding an artificial wait. Specifically, I'm adding a
>>> sequence of
>>>
>>> - WRITE_DATA
>>> - CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
>>> - WAIT_REG_MEM
>>>
>>> as can be seen in the attached patch. This works around the problem, but
>>> it
>>> makes no sense:
>>>
>>> Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence
>>> works
>>> around the problem. However(!) it does not actually cause the UMD to wait
>>> any longer than before. Without this change, the UMD immediately sees a
>>> signaled user fence (and never uses an ioctl to wait), and with this
>>> change,
>>> it *still* sees a signaled user fence.
>>>
>>> Also, note that the way I've hacked the change, the wait sequence is only
>>> added for the user fence emit (and I'm using a modified UMD to ensure that
>>> there is enough memory to be used by the added wait sequence).
>>>
>>> Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around
>>> the
>>> problem.
>>>
>>> So for whatever reason, the added wait sequence *before* the SURFACE_SYNC
>>> encourages some part of the GPU to flush the data from wherever it's
>>> stuck,
>>> and that's just really bizarre. There must be something really simple I'm
>>> missing, and any pointers would be appreciated.
>>
>> Have you tried this?
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
>> b/src/gallium/drivers/radeonsi/si_hw_context.c
>> index 92c09cb..e6ac0ba 100644
>> --- a/src/gallium/drivers/radeonsi/si_hw_context.c
>> +++ b/src/gallium/drivers/radeonsi/si_hw_context.c
>> @@ -133,7 +133,8 @@ void si_context_gfx_flush(void *context, unsigned flags,
>>                          SI_CONTEXT_PS_PARTIAL_FLUSH;
>>
>>          /* DRM 3.1.0 doesn't flush TC for VI correctly. */
>> -       if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
>> +       if ((ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
>> ||
>> +           (ctx->b.chip_class == SI && ctx->b.screen->info.drm_major == 3))
>>                  ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 |
>>                                  SI_CONTEXT_INV_VMEM_L1;
>>
>> One more cache flush there shouldn't hurt.
>>
>> Also, Mesa uses PFP_SYNC_ME. It shouldn't be necessary, but it's worth a
>> try.
>>
>> Marek


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
       [not found]             ` <ed0120b2-f34e-6f95-379f-d038574ec346-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-06-20 13:56               ` Marek Olšák
  0 siblings, 0 replies; 9+ messages in thread
From: Marek Olšák @ 2017-06-20 13:56 UTC (permalink / raw)
  To: Nicolai Hähnle; +Cc: amd-gfx mailing list

On Tue, Jun 20, 2017 at 1:49 PM, Nicolai Hähnle <nhaehnle@gmail.com> wrote:
> On 20.06.2017 12:34, Marek Olšák wrote:
>>
>> BTW, I noticed the flush sequence in the kernel is wrong. The correct
>> flush sequence should be:
>>
>> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
>> but no fence/interrupt.
>> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
>> 3) SURFACE_SYNC (TC, K$, I$)
>> 4) Write CP_COHER_CNTL2.
>> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the
>> interrupt.
>>
>> WAIT_REG_MEM wouldn't be needed if we were able to merge
>> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
>> event.
>>
>> The main issue with the current flush sequence in radeon and amdgpu is
>> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
>> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
>> idle in userspace IBs.
>
>
> This is gfx9-only though, right?

No, I'm only talking about SI.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
       [not found]             ` <aeec0bc1-4814-ee82-8e59-d460e74a8ba3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-06-20 14:02               ` Marek Olšák
       [not found]                 ` <CAAxE2A5xuq59iWP=b+8=tCHNm81CGz4piY3EheSfLnVV6tXZLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Olšák @ 2017-06-20 14:02 UTC (permalink / raw)
  To: Christian König; +Cc: Nicolai Hähnle, amd-gfx mailing list

On Tue, Jun 20, 2017 at 1:46 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 20.06.2017 um 12:34 schrieb Marek Olšák:
>>
>> BTW, I noticed the flush sequence in the kernel is wrong. The correct
>> flush sequence should be:
>>
>> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
>> but no fence/interrupt.
>> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
>> 3) SURFACE_SYNC (TC, K$, I$)
>> 4) Write CP_COHER_CNTL2.
>> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the
>> interrupt.
>>
>> WAIT_REG_MEM wouldn't be needed if we were able to merge
>> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
>> event.
>>
>> The main issue with the current flush sequence in radeon and amdgpu is
>> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
>> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
>> idle in userspace IBs.
>
>
> Well not waiting for idle between IBs is an explicit requirement, because it
> is rather bad for performance to do so.
>
> David Zhou, Monk and I worked quite a lot on this to avoid both possible
> hazard and performance drop.

I guess the requirement was ignored for SI. If you don't do the TC
flush as part the EOP event, you have to wait for idle before
SURFACE_SYNC, because SURFACE_SYNC doesn't wait for idle. It's kinda
useless to flush TC when shaders are still in flight.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
@ 2019-11-28  5:01                     ` Dave Airlie
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Airlie @ 2019-11-28  5:01 UTC (permalink / raw)
  To: Marek Olšák
  Cc: Christian König, Nicolai Hähnle, amd-gfx mailing list

On Wed, 21 Jun 2017 at 00:03, Marek Olšák <maraeo@gmail.com> wrote:
>
> On Tue, Jun 20, 2017 at 1:46 PM, Christian König
> <deathsimple@vodafone.de> wrote:
> > Am 20.06.2017 um 12:34 schrieb Marek Olšák:
> >>
> >> BTW, I noticed the flush sequence in the kernel is wrong. The correct
> >> flush sequence should be:
> >>
> >> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
> >> but no fence/interrupt.
> >> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
> >> 3) SURFACE_SYNC (TC, K$, I$)
> >> 4) Write CP_COHER_CNTL2.
> >> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the
> >> interrupt.
> >>
> >> WAIT_REG_MEM wouldn't be needed if we were able to merge
> >> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
> >> event.
> >>
> >> The main issue with the current flush sequence in radeon and amdgpu is
> >> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
> >> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
> >> idle in userspace IBs.
> >
> >
> > Well not waiting for idle between IBs is an explicit requirement, because it
> > is rather bad for performance to do so.
> >
> > David Zhou, Monk and I worked quite a lot on this to avoid both possible
> > hazard and performance drop.
>
> I guess the requirement was ignored for SI. If you don't do the TC
> flush as part the EOP event, you have to wait for idle before
> SURFACE_SYNC, because SURFACE_SYNC doesn't wait for idle. It's kinda
> useless to flush TC when shaders are still in flight.

I've been chasing Verde wierdness, did this issue ever get resolved?

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GART write flush error on SI w/ amdgpu
@ 2019-11-28  5:01                     ` Dave Airlie
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Airlie @ 2019-11-28  5:01 UTC (permalink / raw)
  To: Marek Olšák
  Cc: Christian König, Nicolai Hähnle, amd-gfx mailing list

On Wed, 21 Jun 2017 at 00:03, Marek Olšák <maraeo@gmail.com> wrote:
>
> On Tue, Jun 20, 2017 at 1:46 PM, Christian König
> <deathsimple@vodafone.de> wrote:
> > Am 20.06.2017 um 12:34 schrieb Marek Olšák:
> >>
> >> BTW, I noticed the flush sequence in the kernel is wrong. The correct
> >> flush sequence should be:
> >>
> >> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
> >> but no fence/interrupt.
> >> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
> >> 3) SURFACE_SYNC (TC, K$, I$)
> >> 4) Write CP_COHER_CNTL2.
> >> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the
> >> interrupt.
> >>
> >> WAIT_REG_MEM wouldn't be needed if we were able to merge
> >> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
> >> event.
> >>
> >> The main issue with the current flush sequence in radeon and amdgpu is
> >> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
> >> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
> >> idle in userspace IBs.
> >
> >
> > Well not waiting for idle between IBs is an explicit requirement, because it
> > is rather bad for performance to do so.
> >
> > David Zhou, Monk and I worked quite a lot on this to avoid both possible
> > hazard and performance drop.
>
> I guess the requirement was ignored for SI. If you don't do the TC
> flush as part the EOP event, you have to wait for idle before
> SURFACE_SYNC, because SURFACE_SYNC doesn't wait for idle. It's kinda
> useless to flush TC when shaders are still in flight.

I've been chasing Verde wierdness, did this issue ever get resolved?

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-28  5:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-09 12:13 GART write flush error on SI w/ amdgpu Nicolai Hähnle
     [not found] ` <c53a99e5-eea0-17d1-05e6-b6e66746d774-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-05-26 15:47   ` Marek Olšák
     [not found]     ` <CAAxE2A7Q=qdxtm6HXF3Py9dYpARvA6+Z4ENkWTCEA572SdNEqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 10:34       ` Marek Olšák
     [not found]         ` <CAAxE2A7ZApX2aO1nJKwbqMWWjX4ZL6zTdtsCbEA946J2CmtjrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 11:46           ` Christian König
     [not found]             ` <aeec0bc1-4814-ee82-8e59-d460e74a8ba3-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-06-20 14:02               ` Marek Olšák
     [not found]                 ` <CAAxE2A5xuq59iWP=b+8=tCHNm81CGz4piY3EheSfLnVV6tXZLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-11-28  5:01                   ` Dave Airlie
2019-11-28  5:01                     ` Dave Airlie
2017-06-20 11:49           ` Nicolai Hähnle
     [not found]             ` <ed0120b2-f34e-6f95-379f-d038574ec346-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-06-20 13:56               ` Marek Olšák

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.