All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/omap: fix memory barrier bug in DMM driver
@ 2018-03-23 12:57 Peter Ujfalusi
  2018-05-23 10:05 ` Tomi Valkeinen
  0 siblings, 1 reply; 2+ messages in thread
From: Peter Ujfalusi @ 2018-03-23 12:57 UTC (permalink / raw)
  To: tomi.valkeinen, laurent.pinchart; +Cc: jsarha, dri-devel

From: Tomi Valkeinen <tomi.valkeinen@ti.com>

A DMM timeout "timed out waiting for done" has been observed on DRA7
devices. The timeout happens rarely, and only when the system is under
heavy load.

Debugging showed that the timeout can be made to happen much more
frequently by optimizing the DMM driver, so that there's almost no code
between writing the last DMM descriptors to RAM, and writing to DMM
register which starts the DMM transaction.

The current theory is that a wmb() does not properly ensure that the
data written to RAM is observable by all the components in the system.

This DMM timeout has caused interesting (and rare) bugs as the error
handling was not functioning properly (the error handling has been fixed
in previous commits):

 * If a DMM timeout happened when a GEM buffer was being pinned for
   display on the screen, a timeout error would be shown, but the driver
   would continue programming DSS HW with broken buffer, leading to
   SYNCLOST floods and possible crashes.

 * If a DMM timeout happened when other user (say, video decoder) was
   pinning a GEM buffer, a timeout would be shown but if the user
   handled the error properly, no other issues followed.

 * If a DMM timeout happened when a GEM buffer was being released, the
   driver does not even notice the error, leading to crashes or hang
   later.

This patch adds wmb() and readl() calls after the last bit is written to
RAM, which should ensure that the execution proceeds only after the data
is actually in RAM, and thus observable by DMM.

The read-back should not be needed. Further study is required to understand
if DMM is somehow special case and read-back is ok, or if DRA7's memory
barriers do not work correctly.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
index c40f90d2db82..27c67bc36203 100644
--- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
+++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
@@ -410,6 +410,17 @@ static int dmm_txn_commit(struct dmm_txn *txn, bool wait)
 	}
 
 	txn->last_pat->next_pa = 0;
+	/* ensure that the written descriptors are visible to DMM */
+	wmb();
+
+	/*
+	 * NOTE: the wmb() above should be enough, but there seems to be a bug
+	 * in OMAP's memory barrier implementation, which in some rare cases may
+	 * cause the writes not to be observable after wmb().
+	 */
+
+	/* read back to ensure the data is in RAM */
+	readl(&txn->last_pat->next_pa);
 
 	/* write to PAT_DESCR to clear out any pending transaction */
 	dmm_write(dmm, 0x0, reg[PAT_DESCR][engine->id]);
-- 
Peter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] drm/omap: fix memory barrier bug in DMM driver
  2018-03-23 12:57 [PATCH] drm/omap: fix memory barrier bug in DMM driver Peter Ujfalusi
@ 2018-05-23 10:05 ` Tomi Valkeinen
  0 siblings, 0 replies; 2+ messages in thread
From: Tomi Valkeinen @ 2018-05-23 10:05 UTC (permalink / raw)
  To: Peter Ujfalusi, laurent.pinchart; +Cc: jsarha, dri-devel

On 23/03/18 14:57, Peter Ujfalusi wrote:
> From: Tomi Valkeinen <tomi.valkeinen@ti.com>
> 
> A DMM timeout "timed out waiting for done" has been observed on DRA7
> devices. The timeout happens rarely, and only when the system is under
> heavy load.
> 
> Debugging showed that the timeout can be made to happen much more
> frequently by optimizing the DMM driver, so that there's almost no code
> between writing the last DMM descriptors to RAM, and writing to DMM
> register which starts the DMM transaction.
> 
> The current theory is that a wmb() does not properly ensure that the
> data written to RAM is observable by all the components in the system.
> 
> This DMM timeout has caused interesting (and rare) bugs as the error
> handling was not functioning properly (the error handling has been fixed
> in previous commits):
> 
>  * If a DMM timeout happened when a GEM buffer was being pinned for
>    display on the screen, a timeout error would be shown, but the driver
>    would continue programming DSS HW with broken buffer, leading to
>    SYNCLOST floods and possible crashes.
> 
>  * If a DMM timeout happened when other user (say, video decoder) was
>    pinning a GEM buffer, a timeout would be shown but if the user
>    handled the error properly, no other issues followed.
> 
>  * If a DMM timeout happened when a GEM buffer was being released, the
>    driver does not even notice the error, leading to crashes or hang
>    later.
> 
> This patch adds wmb() and readl() calls after the last bit is written to
> RAM, which should ensure that the execution proceeds only after the data
> is actually in RAM, and thus observable by DMM.
> 
> The read-back should not be needed. Further study is required to understand
> if DMM is somehow special case and read-back is ok, or if DRA7's memory
> barriers do not work correctly.
> 
> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> ---
>  drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> index c40f90d2db82..27c67bc36203 100644
> --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> @@ -410,6 +410,17 @@ static int dmm_txn_commit(struct dmm_txn *txn, bool wait)
>  	}
>  
>  	txn->last_pat->next_pa = 0;
> +	/* ensure that the written descriptors are visible to DMM */
> +	wmb();
> +
> +	/*
> +	 * NOTE: the wmb() above should be enough, but there seems to be a bug
> +	 * in OMAP's memory barrier implementation, which in some rare cases may
> +	 * cause the writes not to be observable after wmb().
> +	 */
> +
> +	/* read back to ensure the data is in RAM */
> +	readl(&txn->last_pat->next_pa);
>  
>  	/* write to PAT_DESCR to clear out any pending transaction */
>  	dmm_write(dmm, 0x0, reg[PAT_DESCR][engine->id]);
> 

Thanks, I have picked this up.

 Tomi

-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-05-23 10:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-23 12:57 [PATCH] drm/omap: fix memory barrier bug in DMM driver Peter Ujfalusi
2018-05-23 10:05 ` Tomi Valkeinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.