* [PATCH 0/2] drm/i915/bxt: work around HW coherency issue @ 2015-06-08 16:28 Imre Deak 2015-06-08 16:28 ` [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak 2015-06-08 16:28 ` [PATCH 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak 0 siblings, 2 replies; 29+ messages in thread From: Imre Deak @ 2015-06-08 16:28 UTC (permalink / raw) To: intel-gfx Running some basic igt tests on BXT A1 stepping uncovered a coherency problem where a cached CPU mapping won't be updated by a GPU store via a snooped mapping. While I opened an internal ticket to investigate this further this patchset is an attempt to work around the problem until a better solution is found or the problem is fixed in a later stepping. The problem can be easily triggered with workloads where we submit multiple small requests to the GPU. Here the CPU's view of seqno will get eventually stale wrt. to the value stored by the GPU at the end of the request and __wait_for_request will stall (and trigger the lost interrupt/GPU reset mechanism). I could reproduce this with igt/gem_store_dword_loop_render. I also added a new igt/gem_store_dword_batches_loop for cached mappings which shows the same problem in case of MI_STORE_MEMORY_IMM to a cached buffer. The difference between this and the above case is that here the buffer is mapped through PPGTT, while above the status page is mapped through GGTT. Imre Deak (2): drm/i915/bxt: work around HW coherency issue when accessing GPU seqno drm/i915/bxt: work around HW coherency issue for cached GEM mappings drivers/gpu/drm/i915/i915_gem.c | 6 +++++- drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ 3 files changed, 29 insertions(+), 1 deletion(-) -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 16:28 [PATCH 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak @ 2015-06-08 16:28 ` Imre Deak 2015-06-08 17:08 ` Dave Gordon ` (2 more replies) 2015-06-08 16:28 ` [PATCH 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak 1 sibling, 3 replies; 29+ messages in thread From: Imre Deak @ 2015-06-08 16:28 UTC (permalink / raw) To: intel-gfx By running igt/store_dword_loop_render on BXT we can hit a coherency problem where the seqno written at GPU command completion time is not seen by the CPU. This results in __i915_wait_request seeing the stale seqno and not completing the request (not considering the lost interrupt/GPU reset mechanism). I also verified that this isn't a case of a lost interrupt, or that the command didn't complete somehow: when the coherency issue occured I read the seqno via an uncached GTT mapping too. While the cached version of the seqno still showed the stale value the one read via the uncached mapping was the correct one. Work around this issue by clflushing the corresponding CPU cacheline following any store of the seqno and preceding any reading of it. When reading it do this only when the caller expects a coherent view. Testcase: igt/store_dword_loop_render Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ 2 files changed, 24 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9f5485d..88bc5525 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) { + /* + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM + * storing the completed request's seqno occasionally doesn't + * invalidate the CPU cache. Work around this by clflushing the + * corresponding cacheline whenever the caller wants the coherency to + * be guaranteed. Note that this cacheline is known to be + * clean at this point, since we only write it in gen8_set_seqno(), + * where we also do a clflush after the write. So this clflush in + * practice becomes an invalidate operation. + */ + if (IS_BROXTON(ring->dev) & !lazy_coherency) + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); + return intel_read_status_page(ring, I915_GEM_HWS_INDEX); } static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) { intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); + + /* See gen8_get_seqno() explaining the reason for the clflush. */ + if (IS_BROXTON(ring->dev)) + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); } static int gen8_emit_request(struct intel_ringbuffer *ringbuf, diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 39f6dfc..224a25b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, return idx; } +static inline void +intel_flush_status_page(struct intel_engine_cs *ring, int reg) +{ + drm_clflush_virt_range(&ring->status_page.page_addr[reg], + sizeof(uint32_t)); +} + static inline u32 intel_read_status_page(struct intel_engine_cs *ring, int reg) -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 16:28 ` [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak @ 2015-06-08 17:08 ` Dave Gordon 2015-06-08 17:12 ` Chris Wilson 2015-06-08 17:14 ` Imre Deak 2015-06-09 8:21 ` Jani Nikula 2015-07-01 13:40 ` Mika Kuoppala 2 siblings, 2 replies; 29+ messages in thread From: Dave Gordon @ 2015-06-08 17:08 UTC (permalink / raw) To: Imre Deak, intel-gfx On 08/06/15 17:28, Imre Deak wrote: > By running igt/store_dword_loop_render on BXT we can hit a coherency > problem where the seqno written at GPU command completion time is not > seen by the CPU. This results in __i915_wait_request seeing the stale > seqno and not completing the request (not considering the lost > interrupt/GPU reset mechanism). I also verified that this isn't a case > of a lost interrupt, or that the command didn't complete somehow: when > the coherency issue occured I read the seqno via an uncached GTT mapping > too. While the cached version of the seqno still showed the stale value > the one read via the uncached mapping was the correct one. > > Work around this issue by clflushing the corresponding CPU cacheline > following any store of the seqno and preceding any reading of it. When > reading it do this only when the caller expects a coherent view. > > Testcase: igt/store_dword_loop_render > Signed-off-by: Imre Deak <imre.deak@intel.com> Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM (and MI_STORE_DATA_INDEX): This command simply initiates the write operation with command execution proceeding normally. Although the write operation is guaranteed to complete "eventually", there is no mechanism to synchronize command execution with the completion (or even initiation) of these operations. So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in the HWSP instead? .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 17:08 ` Dave Gordon @ 2015-06-08 17:12 ` Chris Wilson 2015-06-08 17:34 ` Ville Syrjälä 2015-06-08 17:14 ` Imre Deak 1 sibling, 1 reply; 29+ messages in thread From: Chris Wilson @ 2015-06-08 17:12 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Mon, Jun 08, 2015 at 06:08:00PM +0100, Dave Gordon wrote: > On 08/06/15 17:28, Imre Deak wrote: > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > problem where the seqno written at GPU command completion time is not > > seen by the CPU. This results in __i915_wait_request seeing the stale > > seqno and not completing the request (not considering the lost > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > of a lost interrupt, or that the command didn't complete somehow: when > > the coherency issue occured I read the seqno via an uncached GTT mapping > > too. While the cached version of the seqno still showed the stale value > > the one read via the uncached mapping was the correct one. > > > > Work around this issue by clflushing the corresponding CPU cacheline > > following any store of the seqno and preceding any reading of it. When > > reading it do this only when the caller expects a coherent view. > > > > Testcase: igt/store_dword_loop_render > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM > (and MI_STORE_DATA_INDEX): > > This command simply initiates the write operation with > command execution proceeding normally. Although the write > operation is guaranteed to complete "eventually", there is > no mechanism to synchronize command execution with the > completion (or even initiation) of these operations. > > So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in > the HWSP instead? iirc there is also no guarrantee for when the post-sync write op is completed for a FLUSH_DW/PIPE_CONTROL either. I'd be happy to be corrected! -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 17:12 ` Chris Wilson @ 2015-06-08 17:34 ` Ville Syrjälä 2015-06-08 18:00 ` Chris Wilson 0 siblings, 1 reply; 29+ messages in thread From: Ville Syrjälä @ 2015-06-08 17:34 UTC (permalink / raw) To: Chris Wilson, Dave Gordon, Imre Deak, intel-gfx On Mon, Jun 08, 2015 at 06:12:47PM +0100, Chris Wilson wrote: > On Mon, Jun 08, 2015 at 06:08:00PM +0100, Dave Gordon wrote: > > On 08/06/15 17:28, Imre Deak wrote: > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > problem where the seqno written at GPU command completion time is not > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > seqno and not completing the request (not considering the lost > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > of a lost interrupt, or that the command didn't complete somehow: when > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > too. While the cached version of the seqno still showed the stale value > > > the one read via the uncached mapping was the correct one. > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > following any store of the seqno and preceding any reading of it. When > > > reading it do this only when the caller expects a coherent view. > > > > > > Testcase: igt/store_dword_loop_render > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM > > (and MI_STORE_DATA_INDEX): > > > > This command simply initiates the write operation with > > command execution proceeding normally. Although the write > > operation is guaranteed to complete "eventually", there is > > no mechanism to synchronize command execution with the > > completion (or even initiation) of these operations. > > > > So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in > > the HWSP instead? > > iirc there is also no guarrantee for when the post-sync write op is > completed for a FLUSH_DW/PIPE_CONTROL either. I'd be happy to be > corrected! My reading of the spec suggests that something like this could work: PIPE_CONTROL w/ PIPE_CONTROL_QW_WRITE PIPE_CONTROL w/ PIPE_CONTROL_NOTIFY | PIPE_CONTROL_FLUSH_ENABLE -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 17:34 ` Ville Syrjälä @ 2015-06-08 18:00 ` Chris Wilson 2015-06-08 18:40 ` Ville Syrjälä 0 siblings, 1 reply; 29+ messages in thread From: Chris Wilson @ 2015-06-08 18:00 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On Mon, Jun 08, 2015 at 08:34:51PM +0300, Ville Syrjälä wrote: > On Mon, Jun 08, 2015 at 06:12:47PM +0100, Chris Wilson wrote: > > On Mon, Jun 08, 2015 at 06:08:00PM +0100, Dave Gordon wrote: > > > On 08/06/15 17:28, Imre Deak wrote: > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > > problem where the seqno written at GPU command completion time is not > > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > > seqno and not completing the request (not considering the lost > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > > of a lost interrupt, or that the command didn't complete somehow: when > > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > > too. While the cached version of the seqno still showed the stale value > > > > the one read via the uncached mapping was the correct one. > > > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > > following any store of the seqno and preceding any reading of it. When > > > > reading it do this only when the caller expects a coherent view. > > > > > > > > Testcase: igt/store_dword_loop_render > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > > > Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM > > > (and MI_STORE_DATA_INDEX): > > > > > > This command simply initiates the write operation with > > > command execution proceeding normally. Although the write > > > operation is guaranteed to complete "eventually", there is > > > no mechanism to synchronize command execution with the > > > completion (or even initiation) of these operations. > > > > > > So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in > > > the HWSP instead? > > > > iirc there is also no guarrantee for when the post-sync write op is > > completed for a FLUSH_DW/PIPE_CONTROL either. I'd be happy to be > > corrected! > > My reading of the spec suggests that something like this could work: > PIPE_CONTROL w/ PIPE_CONTROL_QW_WRITE > PIPE_CONTROL w/ PIPE_CONTROL_NOTIFY | PIPE_CONTROL_FLUSH_ENABLE Absolutely sure? The issue is not the completion of the PIPE_CONTROL, but of ensure that the value has been written to memory and the CPU cache snooped. I don't remember there being anything as clear as the gen2-5 statement that all writes are coherent before the interrupt is raised. We can hit the issue that the seqno writes aren't coherent before the interrupt with our current method - I have seen it with hsw, so this is definitely something worth improving. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 18:00 ` Chris Wilson @ 2015-06-08 18:40 ` Ville Syrjälä 2015-06-08 19:33 ` Dave Gordon 0 siblings, 1 reply; 29+ messages in thread From: Ville Syrjälä @ 2015-06-08 18:40 UTC (permalink / raw) To: Chris Wilson, Dave Gordon, Imre Deak, intel-gfx On Mon, Jun 08, 2015 at 07:00:49PM +0100, Chris Wilson wrote: > On Mon, Jun 08, 2015 at 08:34:51PM +0300, Ville Syrjälä wrote: > > On Mon, Jun 08, 2015 at 06:12:47PM +0100, Chris Wilson wrote: > > > On Mon, Jun 08, 2015 at 06:08:00PM +0100, Dave Gordon wrote: > > > > On 08/06/15 17:28, Imre Deak wrote: > > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > > > problem where the seqno written at GPU command completion time is not > > > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > > > seqno and not completing the request (not considering the lost > > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > > > of a lost interrupt, or that the command didn't complete somehow: when > > > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > > > too. While the cached version of the seqno still showed the stale value > > > > > the one read via the uncached mapping was the correct one. > > > > > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > > > following any store of the seqno and preceding any reading of it. When > > > > > reading it do this only when the caller expects a coherent view. > > > > > > > > > > Testcase: igt/store_dword_loop_render > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > > > > > Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM > > > > (and MI_STORE_DATA_INDEX): > > > > > > > > This command simply initiates the write operation with > > > > command execution proceeding normally. Although the write > > > > operation is guaranteed to complete "eventually", there is > > > > no mechanism to synchronize command execution with the > > > > completion (or even initiation) of these operations. > > > > > > > > So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in > > > > the HWSP instead? > > > > > > iirc there is also no guarrantee for when the post-sync write op is > > > completed for a FLUSH_DW/PIPE_CONTROL either. I'd be happy to be > > > corrected! > > > > My reading of the spec suggests that something like this could work: > > PIPE_CONTROL w/ PIPE_CONTROL_QW_WRITE > > PIPE_CONTROL w/ PIPE_CONTROL_NOTIFY | PIPE_CONTROL_FLUSH_ENABLE > > Absolutely sure? The issue is not the completion of the PIPE_CONTROL, > but of ensure that the value has been written to memory and the CPU > cache snooped. I don't remember there being anything as clear as the > gen2-5 statement that all writes are coherent before the interrupt is > raised. > > We can hit the issue that the seqno writes aren't coherent before the > interrupt with our current method - I have seen it with hsw, so this is > definitely something worth improving. What I get from the spec is: - The post-sync operation is started after previous and current flushes have completed - The flush enable bit causes the CS to wait until all previous post-sync operations have completed, which hopefully means the store is visible to everyone - The notify interrupt is signalled after the current sync operation has completed, which I hope means the flush enable stall has also finished (and if not a three PIPE_CONTROL seqence could be used instead) So no, I'm not absolutely sure by any means. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 18:40 ` Ville Syrjälä @ 2015-06-08 19:33 ` Dave Gordon 2015-06-10 10:59 ` Imre Deak 0 siblings, 1 reply; 29+ messages in thread From: Dave Gordon @ 2015-06-08 19:33 UTC (permalink / raw) To: Ville Syrjälä, Chris Wilson, Imre Deak, intel-gfx On 08/06/15 19:40, Ville Syrjälä wrote: > On Mon, Jun 08, 2015 at 07:00:49PM +0100, Chris Wilson wrote: >> On Mon, Jun 08, 2015 at 08:34:51PM +0300, Ville Syrjälä wrote: >>> On Mon, Jun 08, 2015 at 06:12:47PM +0100, Chris Wilson wrote: >>>> On Mon, Jun 08, 2015 at 06:08:00PM +0100, Dave Gordon wrote: >>>>> On 08/06/15 17:28, Imre Deak wrote: >>>>>> By running igt/store_dword_loop_render on BXT we can hit a coherency >>>>>> problem where the seqno written at GPU command completion time is not >>>>>> seen by the CPU. This results in __i915_wait_request seeing the stale >>>>>> seqno and not completing the request (not considering the lost >>>>>> interrupt/GPU reset mechanism). I also verified that this isn't a case >>>>>> of a lost interrupt, or that the command didn't complete somehow: when >>>>>> the coherency issue occured I read the seqno via an uncached GTT mapping >>>>>> too. While the cached version of the seqno still showed the stale value >>>>>> the one read via the uncached mapping was the correct one. >>>>>> >>>>>> Work around this issue by clflushing the corresponding CPU cacheline >>>>>> following any store of the seqno and preceding any reading of it. When >>>>>> reading it do this only when the caller expects a coherent view. >>>>>> >>>>>> Testcase: igt/store_dword_loop_render >>>>>> Signed-off-by: Imre Deak <imre.deak@intel.com> >>>>> >>>>> Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM >>>>> (and MI_STORE_DATA_INDEX): >>>>> >>>>> This command simply initiates the write operation with >>>>> command execution proceeding normally. Although the write >>>>> operation is guaranteed to complete "eventually", there is >>>>> no mechanism to synchronize command execution with the >>>>> completion (or even initiation) of these operations. >>>>> >>>>> So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in >>>>> the HWSP instead? >>>> >>>> iirc there is also no guarrantee for when the post-sync write op is >>>> completed for a FLUSH_DW/PIPE_CONTROL either. I'd be happy to be >>>> corrected! >>> >>> My reading of the spec suggests that something like this could work: >>> PIPE_CONTROL w/ PIPE_CONTROL_QW_WRITE >>> PIPE_CONTROL w/ PIPE_CONTROL_NOTIFY | PIPE_CONTROL_FLUSH_ENABLE >> >> Absolutely sure? The issue is not the completion of the PIPE_CONTROL, >> but of ensure that the value has been written to memory and the CPU >> cache snooped. I don't remember there being anything as clear as the >> gen2-5 statement that all writes are coherent before the interrupt is >> raised. >> >> We can hit the issue that the seqno writes aren't coherent before the >> interrupt with our current method - I have seen it with hsw, so this is >> definitely something worth improving. > > What I get from the spec is: > - The post-sync operation is started after previous and current flushes > have completed > - The flush enable bit causes the CS to wait until all previous > post-sync operations have completed, which hopefully means the > store is visible to everyone > - The notify interrupt is signalled after the current sync operation > has completed, which I hope means the flush enable stall has also > finished (and if not a three PIPE_CONTROL seqence could be used > instead) > > So no, I'm not absolutely sure by any means. Here's the commentary for MI_FLUSH_DW: The MI_FLUSH_DW command is used to perform an internal "flush" operation. The parser pauses on an internal flush until all drawing engines have completed any pending operations. In addition, this command can also be used to: Flush any dirty data to memory. Invalidate the TLB cache inside the hardware Usage note: After this command is completed with a Store DWord enabled, CPU access to graphics memory will be coherent (assuming the Render Cache flush is not inhibited). ... NOTIFY: If ENABLED, a Sync Completion Interrupt will be generated (if enabled by the MI Interrupt Control registers) once the sync operation is complete. See Interrupt Control Registers in Memory Interface Registers for details. So that's likewise unclear about whether the interrupt comes after just the "sync" or after the "post sync operation" as well. Then there's the DC_FLUSH bit in the PIPE_CONTROL instruction: DC Flush (L3 Flush) by default doesn’t result in flushing/invalidating the IA Coherent lines from L3$, however this can be achieved by setting control bit “Pipe line flush Coherent lines” in “L3SQCREG4” register. And is the GPU's MI_CLFLUSH instruction of any use? .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 19:33 ` Dave Gordon @ 2015-06-10 10:59 ` Imre Deak 2015-06-10 15:10 ` Jesse Barnes 0 siblings, 1 reply; 29+ messages in thread From: Imre Deak @ 2015-06-10 10:59 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On ma, 2015-06-08 at 20:33 +0100, Dave Gordon wrote: > On 08/06/15 19:40, Ville Syrjälä wrote: > > On Mon, Jun 08, 2015 at 07:00:49PM +0100, Chris Wilson wrote: > >> On Mon, Jun 08, 2015 at 08:34:51PM +0300, Ville Syrjälä wrote: > >>> On Mon, Jun 08, 2015 at 06:12:47PM +0100, Chris Wilson wrote: > >>>> On Mon, Jun 08, 2015 at 06:08:00PM +0100, Dave Gordon wrote: > >>>>> On 08/06/15 17:28, Imre Deak wrote: > >>>>>> By running igt/store_dword_loop_render on BXT we can hit a coherency > >>>>>> problem where the seqno written at GPU command completion time is not > >>>>>> seen by the CPU. This results in __i915_wait_request seeing the stale > >>>>>> seqno and not completing the request (not considering the lost > >>>>>> interrupt/GPU reset mechanism). I also verified that this isn't a case > >>>>>> of a lost interrupt, or that the command didn't complete somehow: when > >>>>>> the coherency issue occured I read the seqno via an uncached GTT mapping > >>>>>> too. While the cached version of the seqno still showed the stale value > >>>>>> the one read via the uncached mapping was the correct one. > >>>>>> > >>>>>> Work around this issue by clflushing the corresponding CPU cacheline > >>>>>> following any store of the seqno and preceding any reading of it. When > >>>>>> reading it do this only when the caller expects a coherent view. > >>>>>> > >>>>>> Testcase: igt/store_dword_loop_render > >>>>>> Signed-off-by: Imre Deak <imre.deak@intel.com> > >>>>> > >>>>> Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM > >>>>> (and MI_STORE_DATA_INDEX): > >>>>> > >>>>> This command simply initiates the write operation with > >>>>> command execution proceeding normally. Although the write > >>>>> operation is guaranteed to complete "eventually", there is > >>>>> no mechanism to synchronize command execution with the > >>>>> completion (or even initiation) of these operations. > >>>>> > >>>>> So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in > >>>>> the HWSP instead? > >>>> > >>>> iirc there is also no guarrantee for when the post-sync write op is > >>>> completed for a FLUSH_DW/PIPE_CONTROL either. I'd be happy to be > >>>> corrected! > >>> > >>> My reading of the spec suggests that something like this could work: > >>> PIPE_CONTROL w/ PIPE_CONTROL_QW_WRITE > >>> PIPE_CONTROL w/ PIPE_CONTROL_NOTIFY | PIPE_CONTROL_FLUSH_ENABLE > >> > >> Absolutely sure? The issue is not the completion of the PIPE_CONTROL, > >> but of ensure that the value has been written to memory and the CPU > >> cache snooped. I don't remember there being anything as clear as the > >> gen2-5 statement that all writes are coherent before the interrupt is > >> raised. > >> > >> We can hit the issue that the seqno writes aren't coherent before the > >> interrupt with our current method - I have seen it with hsw, so this is > >> definitely something worth improving. > > > > What I get from the spec is: > > - The post-sync operation is started after previous and current flushes > > have completed > > - The flush enable bit causes the CS to wait until all previous > > post-sync operations have completed, which hopefully means the > > store is visible to everyone > > - The notify interrupt is signalled after the current sync operation > > has completed, which I hope means the flush enable stall has also > > finished (and if not a three PIPE_CONTROL seqence could be used > > instead) > > > > So no, I'm not absolutely sure by any means. > > Here's the commentary for MI_FLUSH_DW: > > The MI_FLUSH_DW command is used to perform an internal "flush" > operation. The parser pauses on an internal flush until all > drawing engines have completed any pending operations. In > addition, this command can also be used to: Flush any dirty > data to memory. Invalidate the TLB cache inside the hardware > > Usage note: After this command is completed with a Store DWord > enabled, CPU access to graphics memory will be coherent > (assuming the Render Cache flush is not inhibited). > > ... > > NOTIFY: If ENABLED, a Sync Completion Interrupt will be > generated (if enabled by the MI Interrupt Control registers) > once the sync operation is complete. See Interrupt Control > Registers in Memory Interface Registers for details. > > So that's likewise unclear about whether the interrupt comes after just > the "sync" or after the "post sync operation" as well. > > Then there's the DC_FLUSH bit in the PIPE_CONTROL instruction: > > DC Flush (L3 Flush) by default doesn’t result in > flushing/invalidating the IA Coherent lines from L3$, however > this can be achieved by setting control bit “Pipe line flush > Coherent lines” in “L3SQCREG4” register. I think the discussion here is about two separate things: 1. Possible ordering issue between the seqno store and the completion interrupt 2. Coherency issue that leaves the CPU with a stale view of the seqno indefinitely, which this patch works around I'm confident that in my case the problem is not due to ordering. If it was "only" ordering then the value would show up eventually. This is not the case though, __wait_for_request will see the stale value indefinitely even though it gets woken up periodically afterwards by the lost IRQ logic (with hangcheck disabled). I have tried to replace the MI_STORE_MEMORY_IMM+MI_USER_INTERRUPT on the render ring with a single PIPE_CONTROL to store the seqno and send a notify interrupt, setting the CS_STALL, PIPE_CONTROL_FLUSH_ENABLE, DC_FLUSH_ENABLE bits. This didn't get rid of the problem, I had the same behavior as before. I also tried setting L3SQCREG4 bit 21 as you suggested, but it didn't have an effect. Similarly I tried now to replace the MI_STORE_MEMORY_IMM +MI_USER_INTERRUPT on the BSD/BLT/VEBOX rings with a MI_FLUSH_DW storing the seqno and sending a sync interrupt. As above, this didn't solve the problem. I wonder if stores by MI_STORE_MEMORY_IMM even go through the L3$. According to Ville they are not and so I'm not sure how the above flush operations would solve this problem. > And is the GPU's MI_CLFLUSH instruction of any use? According to the spec this command is RCS specific, so it wouldn't help in case of the other rings (where I also see this problem without this patch). --Imre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 10:59 ` Imre Deak @ 2015-06-10 15:10 ` Jesse Barnes 2015-06-10 15:26 ` Imre Deak 0 siblings, 1 reply; 29+ messages in thread From: Jesse Barnes @ 2015-06-10 15:10 UTC (permalink / raw) To: imre.deak, Dave Gordon; +Cc: intel-gfx On 06/10/2015 03:59 AM, Imre Deak wrote: > I think the discussion here is about two separate things: > 1. Possible ordering issue between the seqno store and the completion > interrupt > 2. Coherency issue that leaves the CPU with a stale view of the seqno > indefinitely, which this patch works around > > I'm confident that in my case the problem is not due to ordering. If it > was "only" ordering then the value would show up eventually. This is not > the case though, __wait_for_request will see the stale value > indefinitely even though it gets woken up periodically afterwards by the > lost IRQ logic (with hangcheck disabled). Yeah, based on your workaround it sounds like the write from the CS is landing in memory but failing to invalidate the associated CPU cacheline. I assume mapping the HWSP as uncached also works around this issue? I wonder if this is just an issue with GGTT mappings on BXT. If we had per context HSWPs using PPGTT (and maybe even 48 bit PPGTT) mappings, the snoop may be performed correctly... Looks like we don't have a store_dword variant for explicit coherent or incoherent buffer writes (though it does test PPGTT writes at least); that would make this easy to test. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 15:10 ` Jesse Barnes @ 2015-06-10 15:26 ` Imre Deak 2015-06-10 15:33 ` Jesse Barnes 2015-06-10 15:52 ` Chris Wilson 0 siblings, 2 replies; 29+ messages in thread From: Imre Deak @ 2015-06-10 15:26 UTC (permalink / raw) To: Jesse Barnes; +Cc: intel-gfx On ke, 2015-06-10 at 08:10 -0700, Jesse Barnes wrote: > On 06/10/2015 03:59 AM, Imre Deak wrote: > > I think the discussion here is about two separate things: > > 1. Possible ordering issue between the seqno store and the completion > > interrupt > > 2. Coherency issue that leaves the CPU with a stale view of the seqno > > indefinitely, which this patch works around > > > > I'm confident that in my case the problem is not due to ordering. If it > > was "only" ordering then the value would show up eventually. This is not > > the case though, __wait_for_request will see the stale value > > indefinitely even though it gets woken up periodically afterwards by the > > lost IRQ logic (with hangcheck disabled). > > Yeah, based on your workaround it sounds like the write from the CS is > landing in memory but failing to invalidate the associated CPU > cacheline. I assume mapping the HWSP as uncached also works around this > issue? I assume it would, but it would of course have a bigger overhead. Based on my testing the coherency problem happens only occasionally, so for the rest of the times we still would want to benefit from cached reads. See especially __i915_spin_request(). > I wonder if this is just an issue with GGTT mappings on BXT. If we had > per context HSWPs using PPGTT (and maybe even 48 bit PPGTT) mappings, > the snoop may be performed correctly... Looks like we don't have a > store_dword variant for explicit coherent or incoherent buffer writes > (though it does test PPGTT writes at least); that would make this easy > to test. Well, I tried to play around with the GGTT PTE/PAT flags to see if we're not setting something or there is a bit that is significant despite the specification. The spec says it's only the snoop flag that matters and everything maps to PAT index 0. That looks correct in our code. In any case I would expect that the MI_STORE_DATA_IMM would be coherent since this is explicitly stated in the spec. I also added a new testcase to gem_storedw_batches_loop that tests the same thing via PPGTT mappings, it fails the same way. --Imre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 15:26 ` Imre Deak @ 2015-06-10 15:33 ` Jesse Barnes 2015-06-10 15:55 ` Imre Deak 2015-06-10 15:52 ` Chris Wilson 1 sibling, 1 reply; 29+ messages in thread From: Jesse Barnes @ 2015-06-10 15:33 UTC (permalink / raw) To: imre.deak; +Cc: intel-gfx On 06/10/2015 08:26 AM, Imre Deak wrote: > On ke, 2015-06-10 at 08:10 -0700, Jesse Barnes wrote: >> On 06/10/2015 03:59 AM, Imre Deak wrote: >>> I think the discussion here is about two separate things: >>> 1. Possible ordering issue between the seqno store and the completion >>> interrupt >>> 2. Coherency issue that leaves the CPU with a stale view of the seqno >>> indefinitely, which this patch works around >>> >>> I'm confident that in my case the problem is not due to ordering. If it >>> was "only" ordering then the value would show up eventually. This is not >>> the case though, __wait_for_request will see the stale value >>> indefinitely even though it gets woken up periodically afterwards by the >>> lost IRQ logic (with hangcheck disabled). >> >> Yeah, based on your workaround it sounds like the write from the CS is >> landing in memory but failing to invalidate the associated CPU >> cacheline. I assume mapping the HWSP as uncached also works around this >> issue? > > I assume it would, but it would of course have a bigger overhead. Based > on my testing the coherency problem happens only occasionally, so for > the rest of the times we still would want to benefit from cached reads. > See especially __i915_spin_request(). Yeah, pretty sure we want it cached given how often we read from it. I was just curious if the UC mapping would address this just to narrow things down even further. >> I wonder if this is just an issue with GGTT mappings on BXT. If we had >> per context HSWPs using PPGTT (and maybe even 48 bit PPGTT) mappings, >> the snoop may be performed correctly... Looks like we don't have a >> store_dword variant for explicit coherent or incoherent buffer writes >> (though it does test PPGTT writes at least); that would make this easy >> to test. > > Well, I tried to play around with the GGTT PTE/PAT flags to see if we're > not setting something or there is a bit that is significant despite the > specification. The spec says it's only the snoop flag that matters and > everything maps to PAT index 0. That looks correct in our code. In any > case I would expect that the MI_STORE_DATA_IMM would be coherent since > this is explicitly stated in the spec. > > I also added a new testcase to gem_storedw_batches_loop that tests the > same thing via PPGTT mappings, it fails the same way. Ok interesting, so it sounds like a more general problem than just the GGTT. Thanks, Jesse _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 15:33 ` Jesse Barnes @ 2015-06-10 15:55 ` Imre Deak 0 siblings, 0 replies; 29+ messages in thread From: Imre Deak @ 2015-06-10 15:55 UTC (permalink / raw) To: Jesse Barnes; +Cc: intel-gfx On ke, 2015-06-10 at 08:33 -0700, Jesse Barnes wrote: > On 06/10/2015 08:26 AM, Imre Deak wrote: > > On ke, 2015-06-10 at 08:10 -0700, Jesse Barnes wrote: > >> On 06/10/2015 03:59 AM, Imre Deak wrote: > >>> I think the discussion here is about two separate things: > >>> 1. Possible ordering issue between the seqno store and the completion > >>> interrupt > >>> 2. Coherency issue that leaves the CPU with a stale view of the seqno > >>> indefinitely, which this patch works around > >>> > >>> I'm confident that in my case the problem is not due to ordering. If it > >>> was "only" ordering then the value would show up eventually. This is not > >>> the case though, __wait_for_request will see the stale value > >>> indefinitely even though it gets woken up periodically afterwards by the > >>> lost IRQ logic (with hangcheck disabled). > >> > >> Yeah, based on your workaround it sounds like the write from the CS is > >> landing in memory but failing to invalidate the associated CPU > >> cacheline. I assume mapping the HWSP as uncached also works around this > >> issue? > > > > I assume it would, but it would of course have a bigger overhead. Based > > on my testing the coherency problem happens only occasionally, so for > > the rest of the times we still would want to benefit from cached reads. > > See especially __i915_spin_request(). > > Yeah, pretty sure we want it cached given how often we read from it. I > was just curious if the UC mapping would address this just to narrow > things down even further. I must admit that I also remembered our A0 power-on event where the Android guys had a similar issue and came up with a solution to map the status page uncached. At that time we didn't see this problem with the upstream kernel. Not sure what was the difference then and why the problem is back now, maybe we just didn't run the tests long enough. So I'm pretty sure it's the same issue. --Imre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 15:26 ` Imre Deak 2015-06-10 15:33 ` Jesse Barnes @ 2015-06-10 15:52 ` Chris Wilson 2015-06-11 8:02 ` Dave Gordon 1 sibling, 1 reply; 29+ messages in thread From: Chris Wilson @ 2015-06-10 15:52 UTC (permalink / raw) To: Imre Deak; +Cc: intel-gfx On Wed, Jun 10, 2015 at 06:26:58PM +0300, Imre Deak wrote: > On ke, 2015-06-10 at 08:10 -0700, Jesse Barnes wrote: > > On 06/10/2015 03:59 AM, Imre Deak wrote: > > > I think the discussion here is about two separate things: > > > 1. Possible ordering issue between the seqno store and the completion > > > interrupt > > > 2. Coherency issue that leaves the CPU with a stale view of the seqno > > > indefinitely, which this patch works around > > > > > > I'm confident that in my case the problem is not due to ordering. If it > > > was "only" ordering then the value would show up eventually. This is not > > > the case though, __wait_for_request will see the stale value > > > indefinitely even though it gets woken up periodically afterwards by the > > > lost IRQ logic (with hangcheck disabled). > > > > Yeah, based on your workaround it sounds like the write from the CS is > > landing in memory but failing to invalidate the associated CPU > > cacheline. I assume mapping the HWSP as uncached also works around this > > issue? > > I assume it would, but it would of course have a bigger overhead. Based > on my testing the coherency problem happens only occasionally, so for > the rest of the times we still would want to benefit from cached reads. > See especially __i915_spin_request(). Yeah, I am not quite sure about that myself. The reason we don't do forced coherent reads there is because of the impact that has with fw and the snb/ivb/hsw workaround. If we have a viable strategy that gives us accurate seqno at marginal cost, then it will be beneficial to do so from within the busy-spin - in order to minimise the amount of cycles we spin. We always spend a jiffie of CPU time, we might as spend it getting accurate results (so the coherent read is quick enough and doesn't slow down the normal case). I seem to recall that we tried clflushing for gen6+, but we didn't record any details, so it may be worth retesting ivb with that w/a. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 15:52 ` Chris Wilson @ 2015-06-11 8:02 ` Dave Gordon 2015-06-11 8:20 ` Chris Wilson 2015-06-11 19:14 ` Imre Deak 0 siblings, 2 replies; 29+ messages in thread From: Dave Gordon @ 2015-06-11 8:02 UTC (permalink / raw) To: Chris Wilson, Imre Deak, Jesse Barnes, intel-gfx On 10/06/15 16:52, Chris Wilson wrote: > On Wed, Jun 10, 2015 at 06:26:58PM +0300, Imre Deak wrote: >> On ke, 2015-06-10 at 08:10 -0700, Jesse Barnes wrote: >>> On 06/10/2015 03:59 AM, Imre Deak wrote: >>>> I think the discussion here is about two separate things: >>>> 1. Possible ordering issue between the seqno store and the completion >>>> interrupt >>>> 2. Coherency issue that leaves the CPU with a stale view of the seqno >>>> indefinitely, which this patch works around >>>> >>>> I'm confident that in my case the problem is not due to ordering. If it >>>> was "only" ordering then the value would show up eventually. This is not >>>> the case though, __wait_for_request will see the stale value >>>> indefinitely even though it gets woken up periodically afterwards by the >>>> lost IRQ logic (with hangcheck disabled). >>> >>> Yeah, based on your workaround it sounds like the write from the CS is >>> landing in memory but failing to invalidate the associated CPU >>> cacheline. I assume mapping the HWSP as uncached also works around this >>> issue? >> >> I assume it would, but it would of course have a bigger overhead. Would it though? We shouldn't be repeatedly reading from the HWSP unless we're waiting for something to change, in which case we absolutely want the latest value without any caching whatsoever. Since the only thing we want to read from the HWSP is the seqno (via engine->get_seqno()), and that has a 'lazy_coherency' flag, we could cache the last value read explicitly in the intel_engine_cs and return that if lazy_coherency is set, and get it directly from the uncached HWSP only when lazy_coherency is false (i.e. it would become very-lazy-coherency). >> Based >> on my testing the coherency problem happens only occasionally, so for >> the rest of the times we still would want to benefit from cached reads. >> See especially __i915_spin_request(). I would have expected __i915_spin_request() NOT to allow lazy coherency, at least inside the loop. It could do one pre-check with lazy coherency on, for the presumably-common case where the request has already completed, but then use fully uncached reads while looping? > Yeah, I am not quite sure about that myself. The reason we don't do > forced coherent reads there is because of the impact that has with fw > and the snb/ivb/hsw workaround. If we have a viable strategy that gives us > accurate seqno at marginal cost, then it will be beneficial to do so from within > the busy-spin - in order to minimise the amount of cycles we spin. If we mapped the HWSP uncached, and used a GPU instruction that guaranteed making the seqno update globally-visible ASAP, we might not need to read a register at all, and then we wouldn't have the complications with forcewake :) > We always spend a jiffie of CPU time, we might as spend it getting > accurate results (so the coherent read is quick enough and doesn't slow > down the normal case). > > I seem to recall that we tried clflushing for gen6+, but we didn't > record any details, so it may be worth retesting ivb with that w/a. > -Chris Is that clflush on the CPU, or MI_CLFLUSH on the GPU? Does the latter help at all? .Dave. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-11 8:02 ` Dave Gordon @ 2015-06-11 8:20 ` Chris Wilson 2015-06-11 19:14 ` Imre Deak 1 sibling, 0 replies; 29+ messages in thread From: Chris Wilson @ 2015-06-11 8:20 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Thu, Jun 11, 2015 at 09:02:44AM +0100, Dave Gordon wrote: > On 10/06/15 16:52, Chris Wilson wrote: > > I seem to recall that we tried clflushing for gen6+, but we didn't > > record any details, so it may be worth retesting ivb with that w/a. > > Is that clflush on the CPU, or MI_CLFLUSH on the GPU? Does the latter > help at all? Only CPU. The w/a is required for all rings, and the MI_CFLUSH is RCS only. Anyway so far using clflush for gen6/ivb/hsw I haven't yet triggered the failures, but there is a measurably improvement over the I915_READ(). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-11 8:02 ` Dave Gordon 2015-06-11 8:20 ` Chris Wilson @ 2015-06-11 19:14 ` Imre Deak 1 sibling, 0 replies; 29+ messages in thread From: Imre Deak @ 2015-06-11 19:14 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On Thu, 2015-06-11 at 09:02 +0100, Dave Gordon wrote: > On 10/06/15 16:52, Chris Wilson wrote: > > On Wed, Jun 10, 2015 at 06:26:58PM +0300, Imre Deak wrote: > >> On ke, 2015-06-10 at 08:10 -0700, Jesse Barnes wrote: > >>> On 06/10/2015 03:59 AM, Imre Deak wrote: > >>>> I think the discussion here is about two separate things: > >>>> 1. Possible ordering issue between the seqno store and the completion > >>>> interrupt > >>>> 2. Coherency issue that leaves the CPU with a stale view of the seqno > >>>> indefinitely, which this patch works around > >>>> > >>>> I'm confident that in my case the problem is not due to ordering. If it > >>>> was "only" ordering then the value would show up eventually. This is not > >>>> the case though, __wait_for_request will see the stale value > >>>> indefinitely even though it gets woken up periodically afterwards by the > >>>> lost IRQ logic (with hangcheck disabled). > >>> > >>> Yeah, based on your workaround it sounds like the write from the CS is > >>> landing in memory but failing to invalidate the associated CPU > >>> cacheline. I assume mapping the HWSP as uncached also works around this > >>> issue? > >> > >> I assume it would, but it would of course have a bigger overhead. > > Would it though? We shouldn't be repeatedly reading from the HWSP unless > we're waiting for something to change, in which case we absolutely want > the latest value without any caching whatsoever. Normally on GEN6+ reading the cached value always gives you the very latest value, that's the whole point of coherency between GPU and CPU and this also holds for BXT. We use the lazy_coherency flag to work around the relatively rare occasions when this coherency is violated for a reason that is yet to be found (for example some missing GT workaround on BXT). I still prefer this form of the workaround where we do an uncached read only when it's absolutely necessary, to one where we do uncached reads all the time. I imagine that doing such uncached reads in __i915_spin_request() in a loop is definitely worse than cached reads (and again these cached reads will give you the same response time as uncached reads in general). > Since the only thing we want to read from the HWSP is the seqno (via > engine->get_seqno()), and that has a 'lazy_coherency' flag, we could > cache the last value read explicitly in the intel_engine_cs and return > that if lazy_coherency is set, and get it directly from the uncached > HWSP only when lazy_coherency is false (i.e. it would become > very-lazy-coherency). engine->get_seqno() will give you the very latest value even with lazy_coherency=true in general. The exception is when the coherency problem is hit which should be relatively rare. > >> Based > >> on my testing the coherency problem happens only occasionally, so for > >> the rest of the times we still would want to benefit from cached reads. > >> See especially __i915_spin_request(). > > I would have expected __i915_spin_request() NOT to allow lazy coherency, > at least inside the loop. It could do one pre-check with lazy coherency > on, for the presumably-common case where the request has already > completed, but then use fully uncached reads while looping? Based on the above, I think we have the most benefit of coherent cached reads inside such loops. --Imre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 17:08 ` Dave Gordon 2015-06-08 17:12 ` Chris Wilson @ 2015-06-08 17:14 ` Imre Deak 1 sibling, 0 replies; 29+ messages in thread From: Imre Deak @ 2015-06-08 17:14 UTC (permalink / raw) To: Dave Gordon; +Cc: intel-gfx On ma, 2015-06-08 at 18:08 +0100, Dave Gordon wrote: > On 08/06/15 17:28, Imre Deak wrote: > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > problem where the seqno written at GPU command completion time is not > > seen by the CPU. This results in __i915_wait_request seeing the stale > > seqno and not completing the request (not considering the lost > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > of a lost interrupt, or that the command didn't complete somehow: when > > the coherency issue occured I read the seqno via an uncached GTT mapping > > too. While the cached version of the seqno still showed the stale value > > the one read via the uncached mapping was the correct one. > > > > Work around this issue by clflushing the corresponding CPU cacheline > > following any store of the seqno and preceding any reading of it. When > > reading it do this only when the caller expects a coherent view. > > > > Testcase: igt/store_dword_loop_render > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > Not necessarily a cure for this, but BSpec says of MI_STORE_DATA_IMM > (and MI_STORE_DATA_INDEX): > > This command simply initiates the write operation with > command execution proceeding normally. Although the write > operation is guaranteed to complete "eventually", there is > no mechanism to synchronize command execution with the > completion (or even initiation) of these operations. > > So shouldn't we use MI_FLUSH_DW or PIPE_CONTROL to update the seqno in > the HWSP instead? I have tried to use PIPE_CONTROL instead of MI_STORE_DATA_IMM, but that didn't solve this issue. With the new igt/store_dword_batches_loop/cached-mapping you can also verify that this isn't some ordering issue. By waiting enough after the command completed the CPU's view is still stale. Reading it after a clflush results in the correct value on the other hand. --Imre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 16:28 ` [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak 2015-06-08 17:08 ` Dave Gordon @ 2015-06-09 8:21 ` Jani Nikula 2015-06-10 14:07 ` Imre Deak 2015-07-01 13:40 ` Mika Kuoppala 2 siblings, 1 reply; 29+ messages in thread From: Jani Nikula @ 2015-06-09 8:21 UTC (permalink / raw) To: Imre Deak, intel-gfx On Mon, 08 Jun 2015, Imre Deak <imre.deak@intel.com> wrote: > By running igt/store_dword_loop_render on BXT we can hit a coherency > problem where the seqno written at GPU command completion time is not > seen by the CPU. This results in __i915_wait_request seeing the stale > seqno and not completing the request (not considering the lost > interrupt/GPU reset mechanism). I also verified that this isn't a case > of a lost interrupt, or that the command didn't complete somehow: when > the coherency issue occured I read the seqno via an uncached GTT mapping > too. While the cached version of the seqno still showed the stale value > the one read via the uncached mapping was the correct one. > > Work around this issue by clflushing the corresponding CPU cacheline > following any store of the seqno and preceding any reading of it. When > reading it do this only when the caller expects a coherent view. > > Testcase: igt/store_dword_loop_render > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > 2 files changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 9f5485d..88bc5525 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > { > + /* > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > + * storing the completed request's seqno occasionally doesn't > + * invalidate the CPU cache. Work around this by clflushing the > + * corresponding cacheline whenever the caller wants the coherency to > + * be guaranteed. Note that this cacheline is known to be > + * clean at this point, since we only write it in gen8_set_seqno(), > + * where we also do a clflush after the write. So this clflush in > + * practice becomes an invalidate operation. > + */ > + if (IS_BROXTON(ring->dev) & !lazy_coherency) Should be &&. BR, Jani. > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > + > return intel_read_status_page(ring, I915_GEM_HWS_INDEX); > } > > static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) > { > intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); > + > + /* See gen8_get_seqno() explaining the reason for the clflush. */ > + if (IS_BROXTON(ring->dev)) > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > } > > static int gen8_emit_request(struct intel_ringbuffer *ringbuf, > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > index 39f6dfc..224a25b 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, > return idx; > } > > +static inline void > +intel_flush_status_page(struct intel_engine_cs *ring, int reg) > +{ > + drm_clflush_virt_range(&ring->status_page.page_addr[reg], > + sizeof(uint32_t)); > +} > + > static inline u32 > intel_read_status_page(struct intel_engine_cs *ring, > int reg) > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Jani Nikula, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-09 8:21 ` Jani Nikula @ 2015-06-10 14:07 ` Imre Deak 2015-06-10 14:21 ` Chris Wilson 0 siblings, 1 reply; 29+ messages in thread From: Imre Deak @ 2015-06-10 14:07 UTC (permalink / raw) To: Jani Nikula; +Cc: intel-gfx On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > On Mon, 08 Jun 2015, Imre Deak <imre.deak@intel.com> wrote: > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > problem where the seqno written at GPU command completion time is not > > seen by the CPU. This results in __i915_wait_request seeing the stale > > seqno and not completing the request (not considering the lost > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > of a lost interrupt, or that the command didn't complete somehow: when > > the coherency issue occured I read the seqno via an uncached GTT mapping > > too. While the cached version of the seqno still showed the stale value > > the one read via the uncached mapping was the correct one. > > > > Work around this issue by clflushing the corresponding CPU cacheline > > following any store of the seqno and preceding any reading of it. When > > reading it do this only when the caller expects a coherent view. > > > > Testcase: igt/store_dword_loop_render > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > --- > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > 2 files changed, 24 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > index 9f5485d..88bc5525 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > { > > + /* > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > + * storing the completed request's seqno occasionally doesn't > > + * invalidate the CPU cache. Work around this by clflushing the > > + * corresponding cacheline whenever the caller wants the coherency to > > + * be guaranteed. Note that this cacheline is known to be > > + * clean at this point, since we only write it in gen8_set_seqno(), > > + * where we also do a clflush after the write. So this clflush in > > + * practice becomes an invalidate operation. > > + */ > > + if (IS_BROXTON(ring->dev) & !lazy_coherency) > > Should be &&. Thanks for catching it, I'll send a v2 with this fixed if there is no more feedback. > > BR, > Jani. > > > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > > + > > return intel_read_status_page(ring, I915_GEM_HWS_INDEX); > > } > > > > static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) > > { > > intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); > > + > > + /* See gen8_get_seqno() explaining the reason for the clflush. */ > > + if (IS_BROXTON(ring->dev)) > > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > > } > > > > static int gen8_emit_request(struct intel_ringbuffer *ringbuf, > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > > index 39f6dfc..224a25b 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > > @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, > > return idx; > > } > > > > +static inline void > > +intel_flush_status_page(struct intel_engine_cs *ring, int reg) > > +{ > > + drm_clflush_virt_range(&ring->status_page.page_addr[reg], > > + sizeof(uint32_t)); > > +} > > + > > static inline u32 > > intel_read_status_page(struct intel_engine_cs *ring, > > int reg) > > -- > > 2.1.4 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 14:07 ` Imre Deak @ 2015-06-10 14:21 ` Chris Wilson 2015-06-10 14:55 ` Imre Deak 0 siblings, 1 reply; 29+ messages in thread From: Chris Wilson @ 2015-06-10 14:21 UTC (permalink / raw) To: Imre Deak; +Cc: intel-gfx On Wed, Jun 10, 2015 at 05:07:46PM +0300, Imre Deak wrote: > On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > > On Mon, 08 Jun 2015, Imre Deak <imre.deak@intel.com> wrote: > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > problem where the seqno written at GPU command completion time is not > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > seqno and not completing the request (not considering the lost > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > of a lost interrupt, or that the command didn't complete somehow: when > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > too. While the cached version of the seqno still showed the stale value > > > the one read via the uncached mapping was the correct one. > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > following any store of the seqno and preceding any reading of it. When > > > reading it do this only when the caller expects a coherent view. > > > > > > Testcase: igt/store_dword_loop_render > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > > 2 files changed, 24 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > > index 9f5485d..88bc5525 100644 > > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > > { > > > + /* > > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > > + * storing the completed request's seqno occasionally doesn't > > > + * invalidate the CPU cache. Work around this by clflushing the > > > + * corresponding cacheline whenever the caller wants the coherency to > > > + * be guaranteed. Note that this cacheline is known to be > > > + * clean at this point, since we only write it in gen8_set_seqno(), > > > + * where we also do a clflush after the write. So this clflush in > > > + * practice becomes an invalidate operation. Did you compare and contrast with the gen6+ w/a? A clflush may just work out quicker considering that the posting read would involve a spinlock and fw dance. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 14:21 ` Chris Wilson @ 2015-06-10 14:55 ` Imre Deak 2015-06-10 15:00 ` Ville Syrjälä 0 siblings, 1 reply; 29+ messages in thread From: Imre Deak @ 2015-06-10 14:55 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On ke, 2015-06-10 at 15:21 +0100, Chris Wilson wrote: > On Wed, Jun 10, 2015 at 05:07:46PM +0300, Imre Deak wrote: > > On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > > > On Mon, 08 Jun 2015, Imre Deak <imre.deak@intel.com> wrote: > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > > problem where the seqno written at GPU command completion time is not > > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > > seqno and not completing the request (not considering the lost > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > > of a lost interrupt, or that the command didn't complete somehow: when > > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > > too. While the cached version of the seqno still showed the stale value > > > > the one read via the uncached mapping was the correct one. > > > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > > following any store of the seqno and preceding any reading of it. When > > > > reading it do this only when the caller expects a coherent view. > > > > > > > > Testcase: igt/store_dword_loop_render > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > --- > > > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > > > 2 files changed, 24 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > > > index 9f5485d..88bc5525 100644 > > > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > > > { > > > > + /* > > > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > > > + * storing the completed request's seqno occasionally doesn't > > > > + * invalidate the CPU cache. Work around this by clflushing the > > > > + * corresponding cacheline whenever the caller wants the coherency to > > > > + * be guaranteed. Note that this cacheline is known to be > > > > + * clean at this point, since we only write it in gen8_set_seqno(), > > > > + * where we also do a clflush after the write. So this clflush in > > > > + * practice becomes an invalidate operation. > > Did you compare and contrast with the gen6+ w/a? A clflush may just work > out quicker considering that the posting read would involve a spinlock > and fw dance. Actually, I did, but only saw that it only works, didn't benchmark it. I'd also think that clflush would be faster, since it's only a cache invalidate at this point. But I will compare the two things now. > -Chris > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 14:55 ` Imre Deak @ 2015-06-10 15:00 ` Ville Syrjälä 2015-06-10 15:16 ` Imre Deak 0 siblings, 1 reply; 29+ messages in thread From: Ville Syrjälä @ 2015-06-10 15:00 UTC (permalink / raw) To: Imre Deak; +Cc: intel-gfx On Wed, Jun 10, 2015 at 05:55:24PM +0300, Imre Deak wrote: > On ke, 2015-06-10 at 15:21 +0100, Chris Wilson wrote: > > On Wed, Jun 10, 2015 at 05:07:46PM +0300, Imre Deak wrote: > > > On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > > > > On Mon, 08 Jun 2015, Imre Deak <imre.deak@intel.com> wrote: > > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > > > problem where the seqno written at GPU command completion time is not > > > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > > > seqno and not completing the request (not considering the lost > > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > > > of a lost interrupt, or that the command didn't complete somehow: when > > > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > > > too. While the cached version of the seqno still showed the stale value > > > > > the one read via the uncached mapping was the correct one. > > > > > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > > > following any store of the seqno and preceding any reading of it. When > > > > > reading it do this only when the caller expects a coherent view. > > > > > > > > > > Testcase: igt/store_dword_loop_render > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > > --- > > > > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > > > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > > > > 2 files changed, 24 insertions(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > > > > index 9f5485d..88bc5525 100644 > > > > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > > > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > > > > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > > > > { > > > > > + /* > > > > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > > > > + * storing the completed request's seqno occasionally doesn't > > > > > + * invalidate the CPU cache. Work around this by clflushing the > > > > > + * corresponding cacheline whenever the caller wants the coherency to > > > > > + * be guaranteed. Note that this cacheline is known to be > > > > > + * clean at this point, since we only write it in gen8_set_seqno(), > > > > > + * where we also do a clflush after the write. So this clflush in > > > > > + * practice becomes an invalidate operation. > > > > Did you compare and contrast with the gen6+ w/a? A clflush may just work > > out quicker considering that the posting read would involve a spinlock > > and fw dance. > > Actually, I did, but only saw that it only works, didn't benchmark it. > I'd also think that clflush would be faster, since it's only a cache > invalidate at this point. But I will compare the two things now. If an mmio read fixes it then it doesn't feel like a snoop problem after all. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 15:00 ` Ville Syrjälä @ 2015-06-10 15:16 ` Imre Deak 2015-06-10 15:35 ` Chris Wilson 0 siblings, 1 reply; 29+ messages in thread From: Imre Deak @ 2015-06-10 15:16 UTC (permalink / raw) To: Ville Syrjälä; +Cc: intel-gfx On ke, 2015-06-10 at 18:00 +0300, Ville Syrjälä wrote: > On Wed, Jun 10, 2015 at 05:55:24PM +0300, Imre Deak wrote: > > On ke, 2015-06-10 at 15:21 +0100, Chris Wilson wrote: > > > On Wed, Jun 10, 2015 at 05:07:46PM +0300, Imre Deak wrote: > > > > On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > > > > > On Mon, 08 Jun 2015, Imre Deak <imre.deak@intel.com> wrote: > > > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > > > > problem where the seqno written at GPU command completion time is not > > > > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > > > > seqno and not completing the request (not considering the lost > > > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > > > > of a lost interrupt, or that the command didn't complete somehow: when > > > > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > > > > too. While the cached version of the seqno still showed the stale value > > > > > > the one read via the uncached mapping was the correct one. > > > > > > > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > > > > following any store of the seqno and preceding any reading of it. When > > > > > > reading it do this only when the caller expects a coherent view. > > > > > > > > > > > > Testcase: igt/store_dword_loop_render > > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > > > --- > > > > > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > > > > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > > > > > 2 files changed, 24 insertions(+) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > > > > > index 9f5485d..88bc5525 100644 > > > > > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > > > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > > > > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > > > > > > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > > > > > { > > > > > > + /* > > > > > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > > > > > + * storing the completed request's seqno occasionally doesn't > > > > > > + * invalidate the CPU cache. Work around this by clflushing the > > > > > > + * corresponding cacheline whenever the caller wants the coherency to > > > > > > + * be guaranteed. Note that this cacheline is known to be > > > > > > + * clean at this point, since we only write it in gen8_set_seqno(), > > > > > > + * where we also do a clflush after the write. So this clflush in > > > > > > + * practice becomes an invalidate operation. > > > > > > Did you compare and contrast with the gen6+ w/a? A clflush may just work > > > out quicker considering that the posting read would involve a spinlock > > > and fw dance. > > > > Actually, I did, but only saw that it only works, didn't benchmark it. > > I'd also think that clflush would be faster, since it's only a cache > > invalidate at this point. But I will compare the two things now. > > If an mmio read fixes it then it doesn't feel like a snoop problem after > all. Ok, I retract what I just said. I tried now and with the patch below and still see the problem. I must have remembered the testcase where I created a separate GTT mapping for the status page and read the seqno for that. Sorry for the confusion. diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 9f5485d..36e5fd6 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1288,6 +1288,21 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) { + if (!lazy_coherency) { + struct drm_i915_private *dev_priv = ring->dev->dev_private; + POSTING_READ(RING_ACTHD(ring->mmio_base)); + } + return intel_read_status_page(ring, I915_GEM_HWS_INDEX); } _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-10 15:16 ` Imre Deak @ 2015-06-10 15:35 ` Chris Wilson 0 siblings, 0 replies; 29+ messages in thread From: Chris Wilson @ 2015-06-10 15:35 UTC (permalink / raw) To: Imre Deak; +Cc: intel-gfx On Wed, Jun 10, 2015 at 06:16:20PM +0300, Imre Deak wrote: > On ke, 2015-06-10 at 18:00 +0300, Ville Syrjälä wrote: > > On Wed, Jun 10, 2015 at 05:55:24PM +0300, Imre Deak wrote: > > > On ke, 2015-06-10 at 15:21 +0100, Chris Wilson wrote: > > > > On Wed, Jun 10, 2015 at 05:07:46PM +0300, Imre Deak wrote: > > > > > On ti, 2015-06-09 at 11:21 +0300, Jani Nikula wrote: > > > > > > On Mon, 08 Jun 2015, Imre Deak <imre.deak@intel.com> wrote: > > > > > > > By running igt/store_dword_loop_render on BXT we can hit a coherency > > > > > > > problem where the seqno written at GPU command completion time is not > > > > > > > seen by the CPU. This results in __i915_wait_request seeing the stale > > > > > > > seqno and not completing the request (not considering the lost > > > > > > > interrupt/GPU reset mechanism). I also verified that this isn't a case > > > > > > > of a lost interrupt, or that the command didn't complete somehow: when > > > > > > > the coherency issue occured I read the seqno via an uncached GTT mapping > > > > > > > too. While the cached version of the seqno still showed the stale value > > > > > > > the one read via the uncached mapping was the correct one. > > > > > > > > > > > > > > Work around this issue by clflushing the corresponding CPU cacheline > > > > > > > following any store of the seqno and preceding any reading of it. When > > > > > > > reading it do this only when the caller expects a coherent view. > > > > > > > > > > > > > > Testcase: igt/store_dword_loop_render > > > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > > > > --- > > > > > > > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > > > > > > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > > > > > > > 2 files changed, 24 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > > > > > > index 9f5485d..88bc5525 100644 > > > > > > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > > > > > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > > > > > > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > > > > > > > > > > > > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > > > > > > > { > > > > > > > + /* > > > > > > > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > > > > > > > + * storing the completed request's seqno occasionally doesn't > > > > > > > + * invalidate the CPU cache. Work around this by clflushing the > > > > > > > + * corresponding cacheline whenever the caller wants the coherency to > > > > > > > + * be guaranteed. Note that this cacheline is known to be > > > > > > > + * clean at this point, since we only write it in gen8_set_seqno(), > > > > > > > + * where we also do a clflush after the write. So this clflush in > > > > > > > + * practice becomes an invalidate operation. > > > > > > > > Did you compare and contrast with the gen6+ w/a? A clflush may just work > > > > out quicker considering that the posting read would involve a spinlock > > > > and fw dance. > > > > > > Actually, I did, but only saw that it only works, didn't benchmark it. > > > I'd also think that clflush would be faster, since it's only a cache > > > invalidate at this point. But I will compare the two things now. > > > > If an mmio read fixes it then it doesn't feel like a snoop problem after > > all. > > Ok, I retract what I just said. I tried now and with the patch below and > still see the problem. I must have remembered the testcase where I > created a separate GTT mapping for the status page and read the seqno > for that. Sorry for the confusion. Useful to know. Also something else to try is to set_pages_array_wc (or set_memory_wc) for our internal mmaping of the hws. Though clflush is likely to less of a maintenance issue. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-06-08 16:28 ` [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak 2015-06-08 17:08 ` Dave Gordon 2015-06-09 8:21 ` Jani Nikula @ 2015-07-01 13:40 ` Mika Kuoppala 2015-07-01 13:53 ` Mika Kuoppala 2 siblings, 1 reply; 29+ messages in thread From: Mika Kuoppala @ 2015-07-01 13:40 UTC (permalink / raw) To: Imre Deak, intel-gfx Imre Deak <imre.deak@intel.com> writes: > By running igt/store_dword_loop_render on BXT we can hit a coherency > problem where the seqno written at GPU command completion time is not > seen by the CPU. This results in __i915_wait_request seeing the stale > seqno and not completing the request (not considering the lost > interrupt/GPU reset mechanism). I also verified that this isn't a case > of a lost interrupt, or that the command didn't complete somehow: when > the coherency issue occured I read the seqno via an uncached GTT mapping > too. While the cached version of the seqno still showed the stale value > the one read via the uncached mapping was the correct one. > > Work around this issue by clflushing the corresponding CPU cacheline > following any store of the seqno and preceding any reading of it. When > reading it do this only when the caller expects a coherent view. > > Testcase: igt/store_dword_loop_render > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ > 2 files changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 9f5485d..88bc5525 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, > > static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) > { > + /* > + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM > + * storing the completed request's seqno occasionally doesn't > + * invalidate the CPU cache. Work around this by clflushing the > + * corresponding cacheline whenever the caller wants the coherency to > + * be guaranteed. Note that this cacheline is known to be > + * clean at this point, since we only write it in gen8_set_seqno(), > + * where we also do a clflush after the write. So this clflush in > + * practice becomes an invalidate operation. > + */ > + if (IS_BROXTON(ring->dev) & !lazy_coherency) s/&/&& ? -Mika > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > + > return intel_read_status_page(ring, I915_GEM_HWS_INDEX); > } > > static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) > { > intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); > + > + /* See gen8_get_seqno() explaining the reason for the clflush. */ > + if (IS_BROXTON(ring->dev)) > + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); > } > > static int gen8_emit_request(struct intel_ringbuffer *ringbuf, > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > index 39f6dfc..224a25b 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, > return idx; > } > > +static inline void > +intel_flush_status_page(struct intel_engine_cs *ring, int reg) > +{ > + drm_clflush_virt_range(&ring->status_page.page_addr[reg], > + sizeof(uint32_t)); > +} > + > static inline u32 > intel_read_status_page(struct intel_engine_cs *ring, > int reg) > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno 2015-07-01 13:40 ` Mika Kuoppala @ 2015-07-01 13:53 ` Mika Kuoppala 0 siblings, 0 replies; 29+ messages in thread From: Mika Kuoppala @ 2015-07-01 13:53 UTC (permalink / raw) To: Imre Deak, intel-gfx Mika Kuoppala <mika.kuoppala@linux.intel.com> writes: > Imre Deak <imre.deak@intel.com> writes: > >> By running igt/store_dword_loop_render on BXT we can hit a coherency >> problem where the seqno written at GPU command completion time is not >> seen by the CPU. This results in __i915_wait_request seeing the stale >> seqno and not completing the request (not considering the lost >> interrupt/GPU reset mechanism). I also verified that this isn't a case >> of a lost interrupt, or that the command didn't complete somehow: when >> the coherency issue occured I read the seqno via an uncached GTT mapping >> too. While the cached version of the seqno still showed the stale value >> the one read via the uncached mapping was the correct one. >> >> Work around this issue by clflushing the corresponding CPU cacheline >> following any store of the seqno and preceding any reading of it. When >> reading it do this only when the caller expects a coherent view. >> >> Testcase: igt/store_dword_loop_render >> Signed-off-by: Imre Deak <imre.deak@intel.com> >> --- >> drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++ >> drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++++++ >> 2 files changed, 24 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c >> index 9f5485d..88bc5525 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf, >> >> static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) >> { >> + /* >> + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM >> + * storing the completed request's seqno occasionally doesn't >> + * invalidate the CPU cache. Work around this by clflushing the >> + * corresponding cacheline whenever the caller wants the coherency to >> + * be guaranteed. Note that this cacheline is known to be >> + * clean at this point, since we only write it in gen8_set_seqno(), >> + * where we also do a clflush after the write. So this clflush in >> + * practice becomes an invalidate operation. >> + */ >> + if (IS_BROXTON(ring->dev) & !lazy_coherency) > > s/&/&& ? s//Read The Whole Thread Before Replying -Mika > -Mika > >> + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); >> + >> return intel_read_status_page(ring, I915_GEM_HWS_INDEX); >> } >> >> static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno) >> { >> intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno); >> + >> + /* See gen8_get_seqno() explaining the reason for the clflush. */ >> + if (IS_BROXTON(ring->dev)) >> + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); >> } >> >> static int gen8_emit_request(struct intel_ringbuffer *ringbuf, >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h >> index 39f6dfc..224a25b 100644 >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h >> @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring, >> return idx; >> } >> >> +static inline void >> +intel_flush_status_page(struct intel_engine_cs *ring, int reg) >> +{ >> + drm_clflush_virt_range(&ring->status_page.page_addr[reg], >> + sizeof(uint32_t)); >> +} >> + >> static inline u32 >> intel_read_status_page(struct intel_engine_cs *ring, >> int reg) >> -- >> 2.1.4 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings 2015-06-08 16:28 [PATCH 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak 2015-06-08 16:28 ` [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak @ 2015-06-08 16:28 ` Imre Deak 2015-06-13 18:04 ` shuang.he 1 sibling, 1 reply; 29+ messages in thread From: Imre Deak @ 2015-06-08 16:28 UTC (permalink / raw) To: intel-gfx Due to a coherency issue on BXT-A1 we can't guarantee a coherent view of cached GPU mappings, so fall back to uncached mappings. Note that this still won't fix cases where userspace expects a coherent view without synchronizing (via a set domain call). It still makes sense to limit the kernel's notion of the mapping to be uncached, for example for relocations to work properly during execbuffer time. Also in case user space does synchronize the buffer, this will still guarantee that we'll do the proper clflushing for the buffer. Testcast: igt/gem_store_dword_batches_loop/cached-mapping Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index be35f04..212b51c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4062,7 +4062,11 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data, level = I915_CACHE_NONE; break; case I915_CACHING_CACHED: - level = I915_CACHE_LLC; + /* + * On BXT-A1 we can't guarantee a coherent view for cached + * mappings, so fall back to uncached mappings. + */ + level = IS_BROXTON(dev) ? I915_CACHE_NONE : I915_CACHE_LLC; break; case I915_CACHING_DISPLAY: level = HAS_WT(dev) ? I915_CACHE_WT : I915_CACHE_NONE; -- 2.1.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings 2015-06-08 16:28 ` [PATCH 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak @ 2015-06-13 18:04 ` shuang.he 0 siblings, 0 replies; 29+ messages in thread From: shuang.he @ 2015-06-13 18:04 UTC (permalink / raw) To: shuang.he, lei.a.liu, intel-gfx, imre.deak Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com) Task id: 6556 -------------------------------------Summary------------------------------------- Platform Delta drm-intel-nightly Series Applied PNV 276/276 276/276 ILK 303/303 303/303 SNB 312/312 312/312 IVB 343/343 343/343 BYT 287/287 287/287 BDW 321/321 321/321 -------------------------------------Detailed------------------------------------- Platform Test drm-intel-nightly Series Applied Note: You need to pay more attention to line start with '*' _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2015-07-01 13:54 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-08 16:28 [PATCH 0/2] drm/i915/bxt: work around HW coherency issue Imre Deak 2015-06-08 16:28 ` [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno Imre Deak 2015-06-08 17:08 ` Dave Gordon 2015-06-08 17:12 ` Chris Wilson 2015-06-08 17:34 ` Ville Syrjälä 2015-06-08 18:00 ` Chris Wilson 2015-06-08 18:40 ` Ville Syrjälä 2015-06-08 19:33 ` Dave Gordon 2015-06-10 10:59 ` Imre Deak 2015-06-10 15:10 ` Jesse Barnes 2015-06-10 15:26 ` Imre Deak 2015-06-10 15:33 ` Jesse Barnes 2015-06-10 15:55 ` Imre Deak 2015-06-10 15:52 ` Chris Wilson 2015-06-11 8:02 ` Dave Gordon 2015-06-11 8:20 ` Chris Wilson 2015-06-11 19:14 ` Imre Deak 2015-06-08 17:14 ` Imre Deak 2015-06-09 8:21 ` Jani Nikula 2015-06-10 14:07 ` Imre Deak 2015-06-10 14:21 ` Chris Wilson 2015-06-10 14:55 ` Imre Deak 2015-06-10 15:00 ` Ville Syrjälä 2015-06-10 15:16 ` Imre Deak 2015-06-10 15:35 ` Chris Wilson 2015-07-01 13:40 ` Mika Kuoppala 2015-07-01 13:53 ` Mika Kuoppala 2015-06-08 16:28 ` [PATCH 2/2] drm/i915/bxt: work around HW coherency issue for cached GEM mappings Imre Deak 2015-06-13 18:04 ` shuang.he
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.