All of lore.kernel.org
 help / color / mirror / Atom feed
* drm vblank regression fixes for Linux 4.4+
@ 2016-02-08  1:13 Mario Kleiner
  2016-02-08  1:13   ` Mario Kleiner
                   ` (5 more replies)
  0 siblings, 6 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel; +Cc: linux

Here is the series of patches with fixes for regressions in vblank
counting/timestamping caused by the rewrite of drm_update_vblank_count
in Linux 4.4. These are all meant for stable 4.4 and later.

I have tested them on radeon-kms and nouveau-kms by unplugging/replugging
displays, manual dpms off/on, dpms off/on due to screen blanking, system
suspend/resume, and mode setting to different resolutions and refresh rates,
checking the drm.debug logs to confirm that the large vblank counter jumps
no longer happen and the behavior of the vblank counter/ts around dpms and
modesetting is somewhat reasonable.

-mario

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off()
  2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
@ 2016-02-08  1:13   ` Mario Kleiner
  2016-02-08  1:13   ` Mario Kleiner
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

Otherwise if a kms driver calls into drm_vblank_off() more than once
before calling drm_vblank_on() again, the redundant calls to
vblank_disable_and_save() will call drm_update_vblank_count()
while hw vblank counters and vblank timestamping are in a undefined
state during modesets, dpms off etc.

At least with the legacy drm helpers it is not unusual to
get multiple calls to drm_vblank_off and drm_vblank_on, e.g.,
half a dozen calls to drm_vblank_off and two calls to drm_vblank_on
were observed on radeon-kms during dpms-off -> dpms-on transition.

Fixes large jumps of the software maintained vblank counter due to
the hardware vblank counter resetting to zero during dpms off or
modeset, e.g., if radeon-kms is modified to use drm_vblank_off/on
instead of drm_vblank_pre/post_modeset().

This fixes a regression caused by the changes made to
drm_update_vblank_count() in Linux 4.4.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 607f493..bcb8528 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1313,7 +1313,13 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe)
 	spin_lock_irqsave(&dev->event_lock, irqflags);
 
 	spin_lock(&dev->vbl_lock);
-	vblank_disable_and_save(dev, pipe);
+	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
+		      pipe, vblank->enabled, vblank->inmodeset);
+
+	/* Avoid redundant vblank disables without previous drm_vblank_on(). */
+	if (!vblank->inmodeset)
+		vblank_disable_and_save(dev, pipe);
+
 	wake_up(&vblank->queue);
 
 	/*
@@ -1415,6 +1421,9 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
 		return;
 
 	spin_lock_irqsave(&dev->vbl_lock, irqflags);
+	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
+		      pipe, vblank->enabled, vblank->inmodeset);
+
 	/* Drop our private "prevent drm_vblank_get" refcount */
 	if (vblank->inmodeset) {
 		atomic_dec(&vblank->refcount);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off()
@ 2016-02-08  1:13   ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: daniel.vetter, michel, linux, stable, alexander.deucher,
	christian.koenig, vbabka

Otherwise if a kms driver calls into drm_vblank_off() more than once
before calling drm_vblank_on() again, the redundant calls to
vblank_disable_and_save() will call drm_update_vblank_count()
while hw vblank counters and vblank timestamping are in a undefined
state during modesets, dpms off etc.

At least with the legacy drm helpers it is not unusual to
get multiple calls to drm_vblank_off and drm_vblank_on, e.g.,
half a dozen calls to drm_vblank_off and two calls to drm_vblank_on
were observed on radeon-kms during dpms-off -> dpms-on transition.

Fixes large jumps of the software maintained vblank counter due to
the hardware vblank counter resetting to zero during dpms off or
modeset, e.g., if radeon-kms is modified to use drm_vblank_off/on
instead of drm_vblank_pre/post_modeset().

This fixes a regression caused by the changes made to
drm_update_vblank_count() in Linux 4.4.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 607f493..bcb8528 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1313,7 +1313,13 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe)
 	spin_lock_irqsave(&dev->event_lock, irqflags);
 
 	spin_lock(&dev->vbl_lock);
-	vblank_disable_and_save(dev, pipe);
+	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
+		      pipe, vblank->enabled, vblank->inmodeset);
+
+	/* Avoid redundant vblank disables without previous drm_vblank_on(). */
+	if (!vblank->inmodeset)
+		vblank_disable_and_save(dev, pipe);
+
 	wake_up(&vblank->queue);
 
 	/*
@@ -1415,6 +1421,9 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
 		return;
 
 	spin_lock_irqsave(&dev->vbl_lock, irqflags);
+	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
+		      pipe, vblank->enabled, vblank->inmodeset);
+
 	/* Drop our private "prevent drm_vblank_get" refcount */
 	if (vblank->inmodeset) {
 		atomic_dec(&vblank->refcount);
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
  2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
@ 2016-02-08  1:13   ` Mario Kleiner
  2016-02-08  1:13   ` Mario Kleiner
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

This fixes a regression introduced by the new drm_update_vblank_count()
implementation in Linux 4.4:

Restrict the bump of the software vblank counter in drm_update_vblank_count()
to a safe maximum value of +1 whenever there is the possibility that
concurrent readers of vblank timestamps could be active at the moment,
as the current implementation of the timestamp caching and updating is
not safe against concurrent readers for calls to store_vblank() with a
bump of anything but +1. A bump != 1 would very likely return corrupted
timestamps to userspace, because the same slot in the cache could
be concurrently written by store_vblank() and read by one of those
readers in a non-atomic fashion and without the read-retry logic
detecting this collision.

Concurrent readers can exist while drm_update_vblank_count() is called
from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
irq callers. However, all those calls are happening with the vbl_lock
locked thereby preventing a drm_vblank_get(), so the vblank refcount
can't increase while drm_update_vblank_count() is executing. Therefore
a zero vblank refcount during execution of that function signals that
is safe for arbitrary counter bumps if called from outside vblank irq,
whereas a non-zero count is not safe.

Whenever the function is called from vblank irq, we have to assume concurrent
readers could show up any time during its execution, even if the refcount
is currently zero, as vblank irqs are usually only enabled due to the
presence of readers, and because when it is called from vblank irq it
can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
Therefore also restrict bumps to +1 when the function is called from vblank
irq.

Such bumps of more than +1 can happen at other times than reenabling
vblank irqs, e.g., when regular vblank interrupts get delayed by more
than 1 frame due to long held locks, long irq off periods, realtime
preemption on RT kernels, or system management interrupts.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index bcb8528..aa2c74b 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
 	}
 
+	/*
+	 * Restrict the bump of the software vblank counter to a safe maximum
+	 * value of +1 whenever there is the possibility that concurrent readers
+	 * of vblank timestamps could be active at the moment, as the current
+	 * implementation of the timestamp caching and updating is not safe
+	 * against concurrent readers for calls to store_vblank() with a bump
+	 * of anything but +1. A bump != 1 would very likely return corrupted
+	 * timestamps to userspace, because the same slot in the cache could
+	 * be concurrently written by store_vblank() and read by one of those
+	 * readers without the read-retry logic detecting the collision.
+	 *
+	 * Concurrent readers can exist when we are called from the
+	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
+	 * irq callers. However, all those calls to us are happening with the
+	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
+	 * can't increase while we are executing. Therefore a zero refcount at
+	 * this point is safe for arbitrary counter bumps if we are called
+	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
+	 * we must also accept a refcount of 1, as whenever we are called from
+	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
+	 * we must let that one pass through in order to not lose vblank counts
+	 * during vblank irq off - which would completely defeat the whole
+	 * point of this routine.
+	 *
+	 * Whenever we are called from vblank irq, we have to assume concurrent
+	 * readers exist or can show up any time during our execution, even if
+	 * the refcount is currently zero, as vblank irqs are usually only
+	 * enabled due to the presence of readers, and because when we are called
+	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
+	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
+	 * called from vblank irq.
+	 */
+	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
+	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
+			      "refcount %u, vblirq %u\n", pipe, diff,
+			      atomic_read(&vblank->refcount),
+			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
+		diff = 1;
+	}
+
 	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
 		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
 		      pipe, vblank->count, diff, cur_vblank, vblank->last);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
@ 2016-02-08  1:13   ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: daniel.vetter, michel, linux, stable, alexander.deucher,
	christian.koenig, vbabka

This fixes a regression introduced by the new drm_update_vblank_count()
implementation in Linux 4.4:

Restrict the bump of the software vblank counter in drm_update_vblank_count()
to a safe maximum value of +1 whenever there is the possibility that
concurrent readers of vblank timestamps could be active at the moment,
as the current implementation of the timestamp caching and updating is
not safe against concurrent readers for calls to store_vblank() with a
bump of anything but +1. A bump != 1 would very likely return corrupted
timestamps to userspace, because the same slot in the cache could
be concurrently written by store_vblank() and read by one of those
readers in a non-atomic fashion and without the read-retry logic
detecting this collision.

Concurrent readers can exist while drm_update_vblank_count() is called
from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
irq callers. However, all those calls are happening with the vbl_lock
locked thereby preventing a drm_vblank_get(), so the vblank refcount
can't increase while drm_update_vblank_count() is executing. Therefore
a zero vblank refcount during execution of that function signals that
is safe for arbitrary counter bumps if called from outside vblank irq,
whereas a non-zero count is not safe.

Whenever the function is called from vblank irq, we have to assume concurrent
readers could show up any time during its execution, even if the refcount
is currently zero, as vblank irqs are usually only enabled due to the
presence of readers, and because when it is called from vblank irq it
can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
Therefore also restrict bumps to +1 when the function is called from vblank
irq.

Such bumps of more than +1 can happen at other times than reenabling
vblank irqs, e.g., when regular vblank interrupts get delayed by more
than 1 frame due to long held locks, long irq off periods, realtime
preemption on RT kernels, or system management interrupts.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index bcb8528..aa2c74b 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
 	}
 
+	/*
+	 * Restrict the bump of the software vblank counter to a safe maximum
+	 * value of +1 whenever there is the possibility that concurrent readers
+	 * of vblank timestamps could be active at the moment, as the current
+	 * implementation of the timestamp caching and updating is not safe
+	 * against concurrent readers for calls to store_vblank() with a bump
+	 * of anything but +1. A bump != 1 would very likely return corrupted
+	 * timestamps to userspace, because the same slot in the cache could
+	 * be concurrently written by store_vblank() and read by one of those
+	 * readers without the read-retry logic detecting the collision.
+	 *
+	 * Concurrent readers can exist when we are called from the
+	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
+	 * irq callers. However, all those calls to us are happening with the
+	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
+	 * can't increase while we are executing. Therefore a zero refcount at
+	 * this point is safe for arbitrary counter bumps if we are called
+	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
+	 * we must also accept a refcount of 1, as whenever we are called from
+	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
+	 * we must let that one pass through in order to not lose vblank counts
+	 * during vblank irq off - which would completely defeat the whole
+	 * point of this routine.
+	 *
+	 * Whenever we are called from vblank irq, we have to assume concurrent
+	 * readers exist or can show up any time during our execution, even if
+	 * the refcount is currently zero, as vblank irqs are usually only
+	 * enabled due to the presence of readers, and because when we are called
+	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
+	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
+	 * called from vblank irq.
+	 */
+	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
+	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
+			      "refcount %u, vblirq %u\n", pipe, diff,
+			      atomic_read(&vblank->refcount),
+			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
+		diff = 1;
+	}
+
 	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
 		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
 		      pipe, vblank->count, diff, cur_vblank, vblank->last);
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
@ 2016-02-08  1:13   ` Mario Kleiner
  2016-02-08  1:13   ` Mario Kleiner
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

Changes to drm_update_vblank_count() in Linux 4.4 broke the
behaviour of the pre/post modeset functions as the new update
code doesn't deal with hw vblank counter resets inbetween calls
to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
should.

This causes mistreatment of such hw counter resets as counter
wraparound, and thereby large forward jumps of the software
vblank counter which in turn cause vblank event dispatching
and vblank waits to fail/hang --> userspace clients hang.

This symptom was reported on radeon-kms to cause a infinite
hang of KDE Plasma 5 shell's login procedure, preventing users
from logging in.

Fix this by detecting when drm_update_vblank_count() is called
inside a pre->post modeset interval. If so, clamp valid vblank
increments to the safe values 0 and 1, pretty much restoring
the update behavior of the old update code of Linux 4.3 and
earlier. Also reset the last recorded hw vblank count at call
to drm_vblank_post_modeset() to be safe against hw that after
modesetting, dpms on etc. only fires its first vblank irq after
drm_vblank_post_modeset() was already called.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index aa2c74b..5c27ad3 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -222,6 +222,21 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 	}
 
 	/*
+	 * Within a drm_vblank_pre_modeset - drm_vblank_post_modeset
+	 * interval? If so then vblank irqs keep running and it will likely
+	 * happen that the hardware vblank counter is not trustworthy as it
+	 * might reset at some point in that interval and vblank timestamps
+	 * are not trustworthy either in that interval. Iow. this can result
+	 * in a bogus diff >> 1 which must be avoided as it would cause
+	 * random large forward jumps of the software vblank counter.
+	 */
+	if (diff > 1 && (vblank->inmodeset & 0x2)) {
+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u"
+			      " due to pre-modeset.\n", pipe, diff);
+		diff = 1;
+	}
+
+	/*
 	 * Restrict the bump of the software vblank counter to a safe maximum
 	 * value of +1 whenever there is the possibility that concurrent readers
 	 * of vblank timestamps could be active at the moment, as the current
@@ -1573,6 +1588,7 @@ void drm_vblank_post_modeset(struct drm_device *dev, unsigned int pipe)
 	if (vblank->inmodeset) {
 		spin_lock_irqsave(&dev->vbl_lock, irqflags);
 		dev->vblank_disable_allowed = true;
+		drm_reset_vblank_timestamp(dev, pipe);
 		spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
 
 		if (vblank->inmodeset & 0x2)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
@ 2016-02-08  1:13   ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: daniel.vetter, michel, linux, stable, alexander.deucher,
	christian.koenig, vbabka

Changes to drm_update_vblank_count() in Linux 4.4 broke the
behaviour of the pre/post modeset functions as the new update
code doesn't deal with hw vblank counter resets inbetween calls
to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
should.

This causes mistreatment of such hw counter resets as counter
wraparound, and thereby large forward jumps of the software
vblank counter which in turn cause vblank event dispatching
and vblank waits to fail/hang --> userspace clients hang.

This symptom was reported on radeon-kms to cause a infinite
hang of KDE Plasma 5 shell's login procedure, preventing users
from logging in.

Fix this by detecting when drm_update_vblank_count() is called
inside a pre->post modeset interval. If so, clamp valid vblank
increments to the safe values 0 and 1, pretty much restoring
the update behavior of the old update code of Linux 4.3 and
earlier. Also reset the last recorded hw vblank count at call
to drm_vblank_post_modeset() to be safe against hw that after
modesetting, dpms on etc. only fires its first vblank irq after
drm_vblank_post_modeset() was already called.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index aa2c74b..5c27ad3 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -222,6 +222,21 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 	}
 
 	/*
+	 * Within a drm_vblank_pre_modeset - drm_vblank_post_modeset
+	 * interval? If so then vblank irqs keep running and it will likely
+	 * happen that the hardware vblank counter is not trustworthy as it
+	 * might reset at some point in that interval and vblank timestamps
+	 * are not trustworthy either in that interval. Iow. this can result
+	 * in a bogus diff >> 1 which must be avoided as it would cause
+	 * random large forward jumps of the software vblank counter.
+	 */
+	if (diff > 1 && (vblank->inmodeset & 0x2)) {
+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u"
+			      " due to pre-modeset.\n", pipe, diff);
+		diff = 1;
+	}
+
+	/*
 	 * Restrict the bump of the software vblank counter to a safe maximum
 	 * value of +1 whenever there is the possibility that concurrent readers
 	 * of vblank timestamps could be active at the moment, as the current
@@ -1573,6 +1588,7 @@ void drm_vblank_post_modeset(struct drm_device *dev, unsigned int pipe)
 	if (vblank->inmodeset) {
 		spin_lock_irqsave(&dev->vbl_lock, irqflags);
 		dev->vblank_disable_allowed = true;
+		drm_reset_vblank_timestamp(dev, pipe);
 		spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
 
 		if (vblank->inmodeset & 0x2)
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
  2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
@ 2016-02-08  1:13   ` Mario Kleiner
  2016-02-08  1:13   ` Mario Kleiner
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

drm_vblank_offdelay can have three different types of values:

< 0 is to be always treated the same as dev->vblank_disable_immediate
= 0 is to be treated as "never disable vblanks"
> 0 is to be treated as disable immediate if kms driver wants it
    that way via dev->vblank_disable_immediate. Otherwise it is
    a disable timeout in msecs.

This got broken in Linux 3.18+ for the implementation of
drm_vblank_on. If the user specified a value of zero which should
always reenable vblank irqs in this function, a kms driver could
override the users choice by setting vblank_disable_immediate
to true. This patch fixes the regression and keeps the user in
control.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 3.18+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 5c27ad3..fb17c45 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
 	 * re-enable interrupts if there are users left, or the
 	 * user wishes vblank interrupts to be enabled all the time.
 	 */
-	if (atomic_read(&vblank->refcount) != 0 ||
-	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
+	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
+	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
 		WARN_ON(drm_vblank_enable(dev, pipe));
 	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
@ 2016-02-08  1:13   ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

drm_vblank_offdelay can have three different types of values:

< 0 is to be always treated the same as dev->vblank_disable_immediate
= 0 is to be treated as "never disable vblanks"
> 0 is to be treated as disable immediate if kms driver wants it
    that way via dev->vblank_disable_immediate. Otherwise it is
    a disable timeout in msecs.

This got broken in Linux 3.18+ for the implementation of
drm_vblank_on. If the user specified a value of zero which should
always reenable vblank irqs in this function, a kms driver could
override the users choice by setting vblank_disable_immediate
to true. This patch fixes the regression and keeps the user in
control.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 3.18+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 5c27ad3..fb17c45 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
 	 * re-enable interrupts if there are users left, or the
 	 * user wishes vblank interrupts to be enabled all the time.
 	 */
-	if (atomic_read(&vblank->refcount) != 0 ||
-	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
+	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
+	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
 		WARN_ON(drm_vblank_enable(dev, pipe));
 	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
@ 2016-02-08  1:13   ` Mario Kleiner
  2016-02-08  1:13   ` Mario Kleiner
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

The changes to drm_update_vblank_count() in Linux 4.4 added a
method to emulate a hardware vblank counter by use of high
precision vblank timestamps if a kms driver supports those,
but doesn't suppport hw vblank counters.

That method assumes that the old timestamp from a previous
invocation is valid, but that is not always the case. E.g.,
if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
or drm_update_vblank_count() gets called outside vblank irq and
the high precision timestamping can't deliver a precise timestamp,
ie. drm_get_last_vbltimestamp() delivers a return value of false,
then those functions will initialize the old timestamp to zero
to mark it as invalid.

A following call to drm_update_vblank_count() would then calculate
elapsed time with vblank irqs off as current vblank timestamp minus
the zero old timestamp and compute a software vblank counter increment
that corresponds to system uptime, causing a large forward jump of the
software vblank counter. That jump in turn can cause too long waits
in drmWaitVblank and very long delays in delivery of vblank events,
resulting in hangs of userspace clients.

This problem can be observed on nouveau-kms during machine
suspend->resume cycles, where drm_vblank_off is called during
suspend, drm_vblank_on is called during resume and the first
queries to drm_get_last_vbltimestamp() don't deliver high
precision timestamps, resulting in a large harmful counter jump.

Fix this by checking if the old timestamp used for this calculations
is zero == invalid. If so, perform a counter increment of +1 to
prevent large counter jumps and reinitialize the timestamps to
sane values.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index fb17c45..88bdf19 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
 				      " diff_ns = %lld, framedur_ns = %d)\n",
 				      pipe, (long long) diff_ns, framedur_ns);
+
+		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
+		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
+			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
+				      pipe);
+			diff = 1;
+		}
 	} else {
 		/* some kind of default for drivers w/o accurate vbl timestamping */
 		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
@ 2016-02-08  1:13   ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: daniel.vetter, michel, linux, stable, alexander.deucher,
	christian.koenig, vbabka

The changes to drm_update_vblank_count() in Linux 4.4 added a
method to emulate a hardware vblank counter by use of high
precision vblank timestamps if a kms driver supports those,
but doesn't suppport hw vblank counters.

That method assumes that the old timestamp from a previous
invocation is valid, but that is not always the case. E.g.,
if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
or drm_update_vblank_count() gets called outside vblank irq and
the high precision timestamping can't deliver a precise timestamp,
ie. drm_get_last_vbltimestamp() delivers a return value of false,
then those functions will initialize the old timestamp to zero
to mark it as invalid.

A following call to drm_update_vblank_count() would then calculate
elapsed time with vblank irqs off as current vblank timestamp minus
the zero old timestamp and compute a software vblank counter increment
that corresponds to system uptime, causing a large forward jump of the
software vblank counter. That jump in turn can cause too long waits
in drmWaitVblank and very long delays in delivery of vblank events,
resulting in hangs of userspace clients.

This problem can be observed on nouveau-kms during machine
suspend->resume cycles, where drm_vblank_off is called during
suspend, drm_vblank_on is called during resume and the first
queries to drm_get_last_vbltimestamp() don't deliver high
precision timestamps, resulting in a large harmful counter jump.

Fix this by checking if the old timestamp used for this calculations
is zero == invalid. If so, perform a counter increment of +1 to
prevent large counter jumps and reinitialize the timestamps to
sane values.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index fb17c45..88bdf19 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
 				      " diff_ns = %lld, framedur_ns = %d)\n",
 				      pipe, (long long) diff_ns, framedur_ns);
+
+		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
+		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
+			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
+				      pipe);
+			diff = 1;
+		}
 	} else {
 		/* some kind of default for drivers w/o accurate vbl timestamping */
 		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 6/6] drm/radeon/pm: Handle failure of drm_vblank_get.
  2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
                   ` (4 preceding siblings ...)
  2016-02-08  1:13   ` Mario Kleiner
@ 2016-02-08  1:13 ` Mario Kleiner
  2016-02-09 10:10   ` Daniel Vetter
  5 siblings, 1 reply; 52+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel; +Cc: alexander.deucher, linux, michel, christian.koenig

Make sure that drm_vblank_get/put() stay balanced in
case drm_vblank_get fails, by skipping the corresponding
put.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: michel@daenzer.net
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/radeon/radeon_pm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
index 59abebd..339a6c5 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -276,8 +276,12 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev)
 	if (rdev->irq.installed) {
 		for (i = 0; i < rdev->num_crtc; i++) {
 			if (rdev->pm.active_crtcs & (1 << i)) {
-				rdev->pm.req_vblank |= (1 << i);
-				drm_vblank_get(rdev->ddev, i);
+				/* This can fail if a modeset is in progress */
+				if (0 == drm_vblank_get(rdev->ddev, i))
+					rdev->pm.req_vblank |= (1 << i);
+				else
+					DRM_DEBUG_DRIVER("crtc %d no vblank, can glitch\n",
+							 i);
 			}
 		}
 	}
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off()
  2016-02-08  1:13   ` Mario Kleiner
@ 2016-02-09  9:54     ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09  9:54 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On Mon, Feb 08, 2016 at 02:13:24AM +0100, Mario Kleiner wrote:
> Otherwise if a kms driver calls into drm_vblank_off() more than once
> before calling drm_vblank_on() again, the redundant calls to
> vblank_disable_and_save() will call drm_update_vblank_count()
> while hw vblank counters and vblank timestamping are in a undefined
> state during modesets, dpms off etc.
> 
> At least with the legacy drm helpers it is not unusual to
> get multiple calls to drm_vblank_off and drm_vblank_on, e.g.,
> half a dozen calls to drm_vblank_off and two calls to drm_vblank_on
> were observed on radeon-kms during dpms-off -> dpms-on transition.
> 
> Fixes large jumps of the software maintained vblank counter due to
> the hardware vblank counter resetting to zero during dpms off or
> modeset, e.g., if radeon-kms is modified to use drm_vblank_off/on
> instead of drm_vblank_pre/post_modeset().
> 
> This fixes a regression caused by the changes made to
> drm_update_vblank_count() in Linux 4.4.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com
> ---
>  drivers/gpu/drm/drm_irq.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 607f493..bcb8528 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1313,7 +1313,13 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe)
>  	spin_lock_irqsave(&dev->event_lock, irqflags);
>  
>  	spin_lock(&dev->vbl_lock);
> -	vblank_disable_and_save(dev, pipe);
> +	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
> +		      pipe, vblank->enabled, vblank->inmodeset);
> +
> +	/* Avoid redundant vblank disables without previous drm_vblank_on(). */
> +	if (!vblank->inmodeset)

Since atomic drivers shouldn't suck so badly at this, maybe

	if (DRIVER_ATOMIC || !vblank->inmodeset)

instead? That way the flexibility would be constraint to old drivers that
need it because legacy crtc helpers suck. With that change:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +		vblank_disable_and_save(dev, pipe);
> +
>  	wake_up(&vblank->queue);
>  
>  	/*
> @@ -1415,6 +1421,9 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
>  		return;
>  
>  	spin_lock_irqsave(&dev->vbl_lock, irqflags);
> +	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
> +		      pipe, vblank->enabled, vblank->inmodeset);
> +
>  	/* Drop our private "prevent drm_vblank_get" refcount */
>  	if (vblank->inmodeset) {
>  		atomic_dec(&vblank->refcount);
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off()
@ 2016-02-09  9:54     ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09  9:54 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, daniel.vetter, michel, linux, stable,
	alexander.deucher, christian.koenig, vbabka

On Mon, Feb 08, 2016 at 02:13:24AM +0100, Mario Kleiner wrote:
> Otherwise if a kms driver calls into drm_vblank_off() more than once
> before calling drm_vblank_on() again, the redundant calls to
> vblank_disable_and_save() will call drm_update_vblank_count()
> while hw vblank counters and vblank timestamping are in a undefined
> state during modesets, dpms off etc.
> 
> At least with the legacy drm helpers it is not unusual to
> get multiple calls to drm_vblank_off and drm_vblank_on, e.g.,
> half a dozen calls to drm_vblank_off and two calls to drm_vblank_on
> were observed on radeon-kms during dpms-off -> dpms-on transition.
> 
> Fixes large jumps of the software maintained vblank counter due to
> the hardware vblank counter resetting to zero during dpms off or
> modeset, e.g., if radeon-kms is modified to use drm_vblank_off/on
> instead of drm_vblank_pre/post_modeset().
> 
> This fixes a regression caused by the changes made to
> drm_update_vblank_count() in Linux 4.4.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com
> ---
>  drivers/gpu/drm/drm_irq.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 607f493..bcb8528 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1313,7 +1313,13 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe)
>  	spin_lock_irqsave(&dev->event_lock, irqflags);
>  
>  	spin_lock(&dev->vbl_lock);
> -	vblank_disable_and_save(dev, pipe);
> +	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
> +		      pipe, vblank->enabled, vblank->inmodeset);
> +
> +	/* Avoid redundant vblank disables without previous drm_vblank_on(). */
> +	if (!vblank->inmodeset)

Since atomic drivers shouldn't suck so badly at this, maybe

	if (DRIVER_ATOMIC || !vblank->inmodeset)

instead? That way the flexibility would be constraint to old drivers that
need it because legacy crtc helpers suck. With that change:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +		vblank_disable_and_save(dev, pipe);
> +
>  	wake_up(&vblank->queue);
>  
>  	/*
> @@ -1415,6 +1421,9 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
>  		return;
>  
>  	spin_lock_irqsave(&dev->vbl_lock, irqflags);
> +	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
> +		      pipe, vblank->enabled, vblank->inmodeset);
> +
>  	/* Drop our private "prevent drm_vblank_get" refcount */
>  	if (vblank->inmodeset) {
>  		atomic_dec(&vblank->refcount);
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
  2016-02-08  1:13   ` Mario Kleiner
@ 2016-02-09  9:56     ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09  9:56 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> This fixes a regression introduced by the new drm_update_vblank_count()
> implementation in Linux 4.4:
> 
> Restrict the bump of the software vblank counter in drm_update_vblank_count()
> to a safe maximum value of +1 whenever there is the possibility that
> concurrent readers of vblank timestamps could be active at the moment,
> as the current implementation of the timestamp caching and updating is
> not safe against concurrent readers for calls to store_vblank() with a
> bump of anything but +1. A bump != 1 would very likely return corrupted
> timestamps to userspace, because the same slot in the cache could
> be concurrently written by store_vblank() and read by one of those
> readers in a non-atomic fashion and without the read-retry logic
> detecting this collision.
> 
> Concurrent readers can exist while drm_update_vblank_count() is called
> from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> irq callers. However, all those calls are happening with the vbl_lock
> locked thereby preventing a drm_vblank_get(), so the vblank refcount
> can't increase while drm_update_vblank_count() is executing. Therefore
> a zero vblank refcount during execution of that function signals that
> is safe for arbitrary counter bumps if called from outside vblank irq,
> whereas a non-zero count is not safe.
> 
> Whenever the function is called from vblank irq, we have to assume concurrent
> readers could show up any time during its execution, even if the refcount
> is currently zero, as vblank irqs are usually only enabled due to the
> presence of readers, and because when it is called from vblank irq it
> can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> Therefore also restrict bumps to +1 when the function is called from vblank
> irq.
> 
> Such bumps of more than +1 can happen at other times than reenabling
> vblank irqs, e.g., when regular vblank interrupts get delayed by more
> than 1 frame due to long held locks, long irq off periods, realtime
> preemption on RT kernels, or system management interrupts.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com

Imo this is duct-tape. If we want to fix this up properly I think we
should just use a full-blown seqlock instead of our hand-rolled one. And
that could handle any increment at all.
-Daniel

> ---
>  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index bcb8528..aa2c74b 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>  	}
>  
> +	/*
> +	 * Restrict the bump of the software vblank counter to a safe maximum
> +	 * value of +1 whenever there is the possibility that concurrent readers
> +	 * of vblank timestamps could be active at the moment, as the current
> +	 * implementation of the timestamp caching and updating is not safe
> +	 * against concurrent readers for calls to store_vblank() with a bump
> +	 * of anything but +1. A bump != 1 would very likely return corrupted
> +	 * timestamps to userspace, because the same slot in the cache could
> +	 * be concurrently written by store_vblank() and read by one of those
> +	 * readers without the read-retry logic detecting the collision.
> +	 *
> +	 * Concurrent readers can exist when we are called from the
> +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> +	 * irq callers. However, all those calls to us are happening with the
> +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> +	 * can't increase while we are executing. Therefore a zero refcount at
> +	 * this point is safe for arbitrary counter bumps if we are called
> +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> +	 * we must also accept a refcount of 1, as whenever we are called from
> +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> +	 * we must let that one pass through in order to not lose vblank counts
> +	 * during vblank irq off - which would completely defeat the whole
> +	 * point of this routine.
> +	 *
> +	 * Whenever we are called from vblank irq, we have to assume concurrent
> +	 * readers exist or can show up any time during our execution, even if
> +	 * the refcount is currently zero, as vblank irqs are usually only
> +	 * enabled due to the presence of readers, and because when we are called
> +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> +	 * called from vblank irq.
> +	 */
> +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> +			      "refcount %u, vblirq %u\n", pipe, diff,
> +			      atomic_read(&vblank->refcount),
> +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> +		diff = 1;
> +	}
> +
>  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
>  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
>  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
@ 2016-02-09  9:56     ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09  9:56 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, daniel.vetter, michel, linux, stable,
	alexander.deucher, christian.koenig, vbabka

On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> This fixes a regression introduced by the new drm_update_vblank_count()
> implementation in Linux 4.4:
> 
> Restrict the bump of the software vblank counter in drm_update_vblank_count()
> to a safe maximum value of +1 whenever there is the possibility that
> concurrent readers of vblank timestamps could be active at the moment,
> as the current implementation of the timestamp caching and updating is
> not safe against concurrent readers for calls to store_vblank() with a
> bump of anything but +1. A bump != 1 would very likely return corrupted
> timestamps to userspace, because the same slot in the cache could
> be concurrently written by store_vblank() and read by one of those
> readers in a non-atomic fashion and without the read-retry logic
> detecting this collision.
> 
> Concurrent readers can exist while drm_update_vblank_count() is called
> from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> irq callers. However, all those calls are happening with the vbl_lock
> locked thereby preventing a drm_vblank_get(), so the vblank refcount
> can't increase while drm_update_vblank_count() is executing. Therefore
> a zero vblank refcount during execution of that function signals that
> is safe for arbitrary counter bumps if called from outside vblank irq,
> whereas a non-zero count is not safe.
> 
> Whenever the function is called from vblank irq, we have to assume concurrent
> readers could show up any time during its execution, even if the refcount
> is currently zero, as vblank irqs are usually only enabled due to the
> presence of readers, and because when it is called from vblank irq it
> can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> Therefore also restrict bumps to +1 when the function is called from vblank
> irq.
> 
> Such bumps of more than +1 can happen at other times than reenabling
> vblank irqs, e.g., when regular vblank interrupts get delayed by more
> than 1 frame due to long held locks, long irq off periods, realtime
> preemption on RT kernels, or system management interrupts.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com

Imo this is duct-tape. If we want to fix this up properly I think we
should just use a full-blown seqlock instead of our hand-rolled one. And
that could handle any increment at all.
-Daniel

> ---
>  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index bcb8528..aa2c74b 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>  	}
>  
> +	/*
> +	 * Restrict the bump of the software vblank counter to a safe maximum
> +	 * value of +1 whenever there is the possibility that concurrent readers
> +	 * of vblank timestamps could be active at the moment, as the current
> +	 * implementation of the timestamp caching and updating is not safe
> +	 * against concurrent readers for calls to store_vblank() with a bump
> +	 * of anything but +1. A bump != 1 would very likely return corrupted
> +	 * timestamps to userspace, because the same slot in the cache could
> +	 * be concurrently written by store_vblank() and read by one of those
> +	 * readers without the read-retry logic detecting the collision.
> +	 *
> +	 * Concurrent readers can exist when we are called from the
> +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> +	 * irq callers. However, all those calls to us are happening with the
> +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> +	 * can't increase while we are executing. Therefore a zero refcount at
> +	 * this point is safe for arbitrary counter bumps if we are called
> +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> +	 * we must also accept a refcount of 1, as whenever we are called from
> +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> +	 * we must let that one pass through in order to not lose vblank counts
> +	 * during vblank irq off - which would completely defeat the whole
> +	 * point of this routine.
> +	 *
> +	 * Whenever we are called from vblank irq, we have to assume concurrent
> +	 * readers exist or can show up any time during our execution, even if
> +	 * the refcount is currently zero, as vblank irqs are usually only
> +	 * enabled due to the presence of readers, and because when we are called
> +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> +	 * called from vblank irq.
> +	 */
> +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> +			      "refcount %u, vblirq %u\n", pipe, diff,
> +			      atomic_read(&vblank->refcount),
> +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> +		diff = 1;
> +	}
> +
>  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
>  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
>  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-08  1:13   ` Mario Kleiner
  (?)
@ 2016-02-09 10:00   ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:00 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On Mon, Feb 08, 2016 at 02:13:26AM +0100, Mario Kleiner wrote:
> Changes to drm_update_vblank_count() in Linux 4.4 broke the
> behaviour of the pre/post modeset functions as the new update
> code doesn't deal with hw vblank counter resets inbetween calls
> to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
> should.
> 
> This causes mistreatment of such hw counter resets as counter
> wraparound, and thereby large forward jumps of the software
> vblank counter which in turn cause vblank event dispatching
> and vblank waits to fail/hang --> userspace clients hang.
> 
> This symptom was reported on radeon-kms to cause a infinite
> hang of KDE Plasma 5 shell's login procedure, preventing users
> from logging in.
> 
> Fix this by detecting when drm_update_vblank_count() is called
> inside a pre->post modeset interval. If so, clamp valid vblank
> increments to the safe values 0 and 1, pretty much restoring
> the update behavior of the old update code of Linux 4.3 and
> earlier. Also reset the last recorded hw vblank count at call
> to drm_vblank_post_modeset() to be safe against hw that after
> modesetting, dpms on etc. only fires its first vblank irq after
> drm_vblank_post_modeset() was already called.
> 
> Reported-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com

We need to untangle the new vblank stuff that assumes solide (atomic)
drivers and all the old stuff much more. But that can be done later on.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/drm_irq.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index aa2c74b..5c27ad3 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -222,6 +222,21 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  	}
>  
>  	/*
> +	 * Within a drm_vblank_pre_modeset - drm_vblank_post_modeset
> +	 * interval? If so then vblank irqs keep running and it will likely
> +	 * happen that the hardware vblank counter is not trustworthy as it
> +	 * might reset at some point in that interval and vblank timestamps
> +	 * are not trustworthy either in that interval. Iow. this can result
> +	 * in a bogus diff >> 1 which must be avoided as it would cause
> +	 * random large forward jumps of the software vblank counter.
> +	 */
> +	if (diff > 1 && (vblank->inmodeset & 0x2)) {
> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u"
> +			      " due to pre-modeset.\n", pipe, diff);
> +		diff = 1;
> +	}
> +
> +	/*
>  	 * Restrict the bump of the software vblank counter to a safe maximum
>  	 * value of +1 whenever there is the possibility that concurrent readers
>  	 * of vblank timestamps could be active at the moment, as the current
> @@ -1573,6 +1588,7 @@ void drm_vblank_post_modeset(struct drm_device *dev, unsigned int pipe)
>  	if (vblank->inmodeset) {
>  		spin_lock_irqsave(&dev->vbl_lock, irqflags);
>  		dev->vblank_disable_allowed = true;
> +		drm_reset_vblank_timestamp(dev, pipe);
>  		spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
>  
>  		if (vblank->inmodeset & 0x2)
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
  2016-02-08  1:13   ` Mario Kleiner
@ 2016-02-09 10:06     ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:06 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> drm_vblank_offdelay can have three different types of values:
> 
> < 0 is to be always treated the same as dev->vblank_disable_immediate
> = 0 is to be treated as "never disable vblanks"
> > 0 is to be treated as disable immediate if kms driver wants it
>     that way via dev->vblank_disable_immediate. Otherwise it is
>     a disable timeout in msecs.
> 
> This got broken in Linux 3.18+ for the implementation of
> drm_vblank_on. If the user specified a value of zero which should
> always reenable vblank irqs in this function, a kms driver could
> override the users choice by setting vblank_disable_immediate
> to true. This patch fixes the regression and keeps the user in
> control.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 3.18+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com
> ---
>  drivers/gpu/drm/drm_irq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 5c27ad3..fb17c45 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
>  	 * re-enable interrupts if there are users left, or the
>  	 * user wishes vblank interrupts to be enabled all the time.
>  	 */
> -	if (atomic_read(&vblank->refcount) != 0 ||
> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))

Hm, shouldn't we change this to only enable the vblank irq if we need it,
i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
it superflously after a modeset, if userspace didn't yet ask for vblank
timestamps. But then is was specifically added by Ville in cd19e52aee922,
so I guess someone really wants this.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>  		WARN_ON(drm_vblank_enable(dev, pipe));
>  	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
>  }
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
@ 2016-02-09 10:06     ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:06 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, daniel.vetter, michel, linux, stable,
	alexander.deucher, christian.koenig, vbabka

On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> drm_vblank_offdelay can have three different types of values:
> 
> < 0 is to be always treated the same as dev->vblank_disable_immediate
> = 0 is to be treated as "never disable vblanks"
> > 0 is to be treated as disable immediate if kms driver wants it
>     that way via dev->vblank_disable_immediate. Otherwise it is
>     a disable timeout in msecs.
> 
> This got broken in Linux 3.18+ for the implementation of
> drm_vblank_on. If the user specified a value of zero which should
> always reenable vblank irqs in this function, a kms driver could
> override the users choice by setting vblank_disable_immediate
> to true. This patch fixes the regression and keeps the user in
> control.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 3.18+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com
> ---
>  drivers/gpu/drm/drm_irq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 5c27ad3..fb17c45 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
>  	 * re-enable interrupts if there are users left, or the
>  	 * user wishes vblank interrupts to be enabled all the time.
>  	 */
> -	if (atomic_read(&vblank->refcount) != 0 ||
> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))

Hm, shouldn't we change this to only enable the vblank irq if we need it,
i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
it superflously after a modeset, if userspace didn't yet ask for vblank
timestamps. But then is was specifically added by Ville in cd19e52aee922,
so I guess someone really wants this.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>  		WARN_ON(drm_vblank_enable(dev, pipe));
>  	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
>  }
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
  2016-02-09  9:56     ` Daniel Vetter
@ 2016-02-09 10:07       ` Ville Syrjälä
  -1 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 10:07 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mario Kleiner, dri-devel, linux, stable, michel, vbabka,
	daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
> On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> > This fixes a regression introduced by the new drm_update_vblank_count()
> > implementation in Linux 4.4:
> > 
> > Restrict the bump of the software vblank counter in drm_update_vblank_count()
> > to a safe maximum value of +1 whenever there is the possibility that
> > concurrent readers of vblank timestamps could be active at the moment,
> > as the current implementation of the timestamp caching and updating is
> > not safe against concurrent readers for calls to store_vblank() with a
> > bump of anything but +1. A bump != 1 would very likely return corrupted
> > timestamps to userspace, because the same slot in the cache could
> > be concurrently written by store_vblank() and read by one of those
> > readers in a non-atomic fashion and without the read-retry logic
> > detecting this collision.
> > 
> > Concurrent readers can exist while drm_update_vblank_count() is called
> > from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> > irq callers. However, all those calls are happening with the vbl_lock
> > locked thereby preventing a drm_vblank_get(), so the vblank refcount
> > can't increase while drm_update_vblank_count() is executing. Therefore
> > a zero vblank refcount during execution of that function signals that
> > is safe for arbitrary counter bumps if called from outside vblank irq,
> > whereas a non-zero count is not safe.
> > 
> > Whenever the function is called from vblank irq, we have to assume concurrent
> > readers could show up any time during its execution, even if the refcount
> > is currently zero, as vblank irqs are usually only enabled due to the
> > presence of readers, and because when it is called from vblank irq it
> > can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> > Therefore also restrict bumps to +1 when the function is called from vblank
> > irq.
> > 
> > Such bumps of more than +1 can happen at other times than reenabling
> > vblank irqs, e.g., when regular vblank interrupts get delayed by more
> > than 1 frame due to long held locks, long irq off periods, realtime
> > preemption on RT kernels, or system management interrupts.
> > 
> > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > Cc: <stable@vger.kernel.org> # 4.4+
> > Cc: michel@daenzer.net
> > Cc: vbabka@suse.cz
> > Cc: ville.syrjala@linux.intel.com
> > Cc: daniel.vetter@ffwll.ch
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: alexander.deucher@amd.com
> > Cc: christian.koenig@amd.com
> 
> Imo this is duct-tape. If we want to fix this up properly I think we
> should just use a full-blown seqlock instead of our hand-rolled one. And
> that could handle any increment at all.

And I even fixed this [1] almost a half a year ago when I sent the
original series, but that part got held hostage to the same seqlock
argument. Perfect is the enemy of good.

[1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 41 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > index bcb8528..aa2c74b 100644
> > --- a/drivers/gpu/drm/drm_irq.c
> > +++ b/drivers/gpu/drm/drm_irq.c
> > @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >  	}
> >  
> > +	/*
> > +	 * Restrict the bump of the software vblank counter to a safe maximum
> > +	 * value of +1 whenever there is the possibility that concurrent readers
> > +	 * of vblank timestamps could be active at the moment, as the current
> > +	 * implementation of the timestamp caching and updating is not safe
> > +	 * against concurrent readers for calls to store_vblank() with a bump
> > +	 * of anything but +1. A bump != 1 would very likely return corrupted
> > +	 * timestamps to userspace, because the same slot in the cache could
> > +	 * be concurrently written by store_vblank() and read by one of those
> > +	 * readers without the read-retry logic detecting the collision.
> > +	 *
> > +	 * Concurrent readers can exist when we are called from the
> > +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> > +	 * irq callers. However, all those calls to us are happening with the
> > +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> > +	 * can't increase while we are executing. Therefore a zero refcount at
> > +	 * this point is safe for arbitrary counter bumps if we are called
> > +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> > +	 * we must also accept a refcount of 1, as whenever we are called from
> > +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> > +	 * we must let that one pass through in order to not lose vblank counts
> > +	 * during vblank irq off - which would completely defeat the whole
> > +	 * point of this routine.
> > +	 *
> > +	 * Whenever we are called from vblank irq, we have to assume concurrent
> > +	 * readers exist or can show up any time during our execution, even if
> > +	 * the refcount is currently zero, as vblank irqs are usually only
> > +	 * enabled due to the presence of readers, and because when we are called
> > +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> > +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> > +	 * called from vblank irq.
> > +	 */
> > +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> > +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> > +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> > +			      "refcount %u, vblirq %u\n", pipe, diff,
> > +			      atomic_read(&vblank->refcount),
> > +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> > +		diff = 1;
> > +	}
> > +
> >  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> >  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
> >  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> > -- 
> > 1.9.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
@ 2016-02-09 10:07       ` Ville Syrjälä
  0 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 10:07 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, linux, stable, dri-devel,
	alexander.deucher, christian.koenig, vbabka

On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
> On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> > This fixes a regression introduced by the new drm_update_vblank_count()
> > implementation in Linux 4.4:
> > 
> > Restrict the bump of the software vblank counter in drm_update_vblank_count()
> > to a safe maximum value of +1 whenever there is the possibility that
> > concurrent readers of vblank timestamps could be active at the moment,
> > as the current implementation of the timestamp caching and updating is
> > not safe against concurrent readers for calls to store_vblank() with a
> > bump of anything but +1. A bump != 1 would very likely return corrupted
> > timestamps to userspace, because the same slot in the cache could
> > be concurrently written by store_vblank() and read by one of those
> > readers in a non-atomic fashion and without the read-retry logic
> > detecting this collision.
> > 
> > Concurrent readers can exist while drm_update_vblank_count() is called
> > from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> > irq callers. However, all those calls are happening with the vbl_lock
> > locked thereby preventing a drm_vblank_get(), so the vblank refcount
> > can't increase while drm_update_vblank_count() is executing. Therefore
> > a zero vblank refcount during execution of that function signals that
> > is safe for arbitrary counter bumps if called from outside vblank irq,
> > whereas a non-zero count is not safe.
> > 
> > Whenever the function is called from vblank irq, we have to assume concurrent
> > readers could show up any time during its execution, even if the refcount
> > is currently zero, as vblank irqs are usually only enabled due to the
> > presence of readers, and because when it is called from vblank irq it
> > can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> > Therefore also restrict bumps to +1 when the function is called from vblank
> > irq.
> > 
> > Such bumps of more than +1 can happen at other times than reenabling
> > vblank irqs, e.g., when regular vblank interrupts get delayed by more
> > than 1 frame due to long held locks, long irq off periods, realtime
> > preemption on RT kernels, or system management interrupts.
> > 
> > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > Cc: <stable@vger.kernel.org> # 4.4+
> > Cc: michel@daenzer.net
> > Cc: vbabka@suse.cz
> > Cc: ville.syrjala@linux.intel.com
> > Cc: daniel.vetter@ffwll.ch
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: alexander.deucher@amd.com
> > Cc: christian.koenig@amd.com
> 
> Imo this is duct-tape. If we want to fix this up properly I think we
> should just use a full-blown seqlock instead of our hand-rolled one. And
> that could handle any increment at all.

And I even fixed this [1] almost a half a year ago when I sent the
original series, but that part got held hostage to the same seqlock
argument. Perfect is the enemy of good.

[1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 41 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > index bcb8528..aa2c74b 100644
> > --- a/drivers/gpu/drm/drm_irq.c
> > +++ b/drivers/gpu/drm/drm_irq.c
> > @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >  	}
> >  
> > +	/*
> > +	 * Restrict the bump of the software vblank counter to a safe maximum
> > +	 * value of +1 whenever there is the possibility that concurrent readers
> > +	 * of vblank timestamps could be active at the moment, as the current
> > +	 * implementation of the timestamp caching and updating is not safe
> > +	 * against concurrent readers for calls to store_vblank() with a bump
> > +	 * of anything but +1. A bump != 1 would very likely return corrupted
> > +	 * timestamps to userspace, because the same slot in the cache could
> > +	 * be concurrently written by store_vblank() and read by one of those
> > +	 * readers without the read-retry logic detecting the collision.
> > +	 *
> > +	 * Concurrent readers can exist when we are called from the
> > +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> > +	 * irq callers. However, all those calls to us are happening with the
> > +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> > +	 * can't increase while we are executing. Therefore a zero refcount at
> > +	 * this point is safe for arbitrary counter bumps if we are called
> > +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> > +	 * we must also accept a refcount of 1, as whenever we are called from
> > +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> > +	 * we must let that one pass through in order to not lose vblank counts
> > +	 * during vblank irq off - which would completely defeat the whole
> > +	 * point of this routine.
> > +	 *
> > +	 * Whenever we are called from vblank irq, we have to assume concurrent
> > +	 * readers exist or can show up any time during our execution, even if
> > +	 * the refcount is currently zero, as vblank irqs are usually only
> > +	 * enabled due to the presence of readers, and because when we are called
> > +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> > +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> > +	 * called from vblank irq.
> > +	 */
> > +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> > +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> > +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> > +			      "refcount %u, vblirq %u\n", pipe, diff,
> > +			      atomic_read(&vblank->refcount),
> > +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> > +		diff = 1;
> > +	}
> > +
> >  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> >  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
> >  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> > -- 
> > 1.9.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-08  1:13   ` Mario Kleiner
  (?)
@ 2016-02-09 10:09   ` Daniel Vetter
  2016-02-09 13:53     ` Mario Kleiner
  -1 siblings, 1 reply; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:09 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
> The changes to drm_update_vblank_count() in Linux 4.4 added a
> method to emulate a hardware vblank counter by use of high
> precision vblank timestamps if a kms driver supports those,
> but doesn't suppport hw vblank counters.
> 
> That method assumes that the old timestamp from a previous
> invocation is valid, but that is not always the case. E.g.,
> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
> or drm_update_vblank_count() gets called outside vblank irq and
> the high precision timestamping can't deliver a precise timestamp,
> ie. drm_get_last_vbltimestamp() delivers a return value of false,
> then those functions will initialize the old timestamp to zero
> to mark it as invalid.
> 
> A following call to drm_update_vblank_count() would then calculate
> elapsed time with vblank irqs off as current vblank timestamp minus
> the zero old timestamp and compute a software vblank counter increment
> that corresponds to system uptime, causing a large forward jump of the
> software vblank counter. That jump in turn can cause too long waits
> in drmWaitVblank and very long delays in delivery of vblank events,
> resulting in hangs of userspace clients.
> 
> This problem can be observed on nouveau-kms during machine
> suspend->resume cycles, where drm_vblank_off is called during
> suspend, drm_vblank_on is called during resume and the first
> queries to drm_get_last_vbltimestamp() don't deliver high
> precision timestamps, resulting in a large harmful counter jump.

Why does nouveau enable vblank interrupts before it can get valid
timestamps? That sounds like the core bug here, and papering over that in
the vblank code feels very wrong to me. Maybe we should instead just not
sample the vblank at all if the timestamp fails and splat a big WARN_ON
about this, to give driver writers a chance to fix up their mess?
-Daniel

> 
> Fix this by checking if the old timestamp used for this calculations
> is zero == invalid. If so, perform a counter increment of +1 to
> prevent large counter jumps and reinitialize the timestamps to
> sane values.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com
> ---
>  drivers/gpu/drm/drm_irq.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index fb17c45..88bdf19 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>  				      " diff_ns = %lld, framedur_ns = %d)\n",
>  				      pipe, (long long) diff_ns, framedur_ns);
> +
> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
> +				      pipe);
> +			diff = 1;
> +		}
>  	} else {
>  		/* some kind of default for drivers w/o accurate vbl timestamping */
>  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 6/6] drm/radeon/pm: Handle failure of drm_vblank_get.
  2016-02-08  1:13 ` [PATCH 6/6] drm/radeon/pm: Handle failure of drm_vblank_get Mario Kleiner
@ 2016-02-09 10:10   ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:10 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: alexander.deucher, michel, linux, christian.koenig, dri-devel

On Mon, Feb 08, 2016 at 02:13:29AM +0100, Mario Kleiner wrote:
> Make sure that drm_vblank_get/put() stay balanced in
> case drm_vblank_get fails, by skipping the corresponding
> put.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: michel@daenzer.net
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/radeon/radeon_pm.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
> index 59abebd..339a6c5 100644
> --- a/drivers/gpu/drm/radeon/radeon_pm.c
> +++ b/drivers/gpu/drm/radeon/radeon_pm.c
> @@ -276,8 +276,12 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev)
>  	if (rdev->irq.installed) {
>  		for (i = 0; i < rdev->num_crtc; i++) {
>  			if (rdev->pm.active_crtcs & (1 << i)) {
> -				rdev->pm.req_vblank |= (1 << i);
> -				drm_vblank_get(rdev->ddev, i);
> +				/* This can fail if a modeset is in progress */
> +				if (0 == drm_vblank_get(rdev->ddev, i))
> +					rdev->pm.req_vblank |= (1 << i);
> +				else
> +					DRM_DEBUG_DRIVER("crtc %d no vblank, can glitch\n",
> +							 i);
>  			}
>  		}
>  	}
> -- 
> 1.9.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
  2016-02-09 10:07       ` Ville Syrjälä
@ 2016-02-09 10:23         ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:23 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Daniel Vetter, Mario Kleiner, dri-devel, linux, stable, michel,
	vbabka, daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrj�l� wrote:
> On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
> > On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> > > This fixes a regression introduced by the new drm_update_vblank_count()
> > > implementation in Linux 4.4:
> > > 
> > > Restrict the bump of the software vblank counter in drm_update_vblank_count()
> > > to a safe maximum value of +1 whenever there is the possibility that
> > > concurrent readers of vblank timestamps could be active at the moment,
> > > as the current implementation of the timestamp caching and updating is
> > > not safe against concurrent readers for calls to store_vblank() with a
> > > bump of anything but +1. A bump != 1 would very likely return corrupted
> > > timestamps to userspace, because the same slot in the cache could
> > > be concurrently written by store_vblank() and read by one of those
> > > readers in a non-atomic fashion and without the read-retry logic
> > > detecting this collision.
> > > 
> > > Concurrent readers can exist while drm_update_vblank_count() is called
> > > from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> > > irq callers. However, all those calls are happening with the vbl_lock
> > > locked thereby preventing a drm_vblank_get(), so the vblank refcount
> > > can't increase while drm_update_vblank_count() is executing. Therefore
> > > a zero vblank refcount during execution of that function signals that
> > > is safe for arbitrary counter bumps if called from outside vblank irq,
> > > whereas a non-zero count is not safe.
> > > 
> > > Whenever the function is called from vblank irq, we have to assume concurrent
> > > readers could show up any time during its execution, even if the refcount
> > > is currently zero, as vblank irqs are usually only enabled due to the
> > > presence of readers, and because when it is called from vblank irq it
> > > can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> > > Therefore also restrict bumps to +1 when the function is called from vblank
> > > irq.
> > > 
> > > Such bumps of more than +1 can happen at other times than reenabling
> > > vblank irqs, e.g., when regular vblank interrupts get delayed by more
> > > than 1 frame due to long held locks, long irq off periods, realtime
> > > preemption on RT kernels, or system management interrupts.
> > > 
> > > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > > Cc: <stable@vger.kernel.org> # 4.4+
> > > Cc: michel@daenzer.net
> > > Cc: vbabka@suse.cz
> > > Cc: ville.syrjala@linux.intel.com
> > > Cc: daniel.vetter@ffwll.ch
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: alexander.deucher@amd.com
> > > Cc: christian.koenig@amd.com
> > 
> > Imo this is duct-tape. If we want to fix this up properly I think we
> > should just use a full-blown seqlock instead of our hand-rolled one. And
> > that could handle any increment at all.
> 
> And I even fixed this [1] almost a half a year ago when I sent the
> original series, but that part got held hostage to the same seqlock
> argument. Perfect is the enemy of good.
> 
> [1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html

Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
patch over Mario's hack here tbh. Your patch with seqlock would be even
more awesome.
-Daniel

> 
> > -Daniel
> > 
> > > ---
> > >  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 41 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > > index bcb8528..aa2c74b 100644
> > > --- a/drivers/gpu/drm/drm_irq.c
> > > +++ b/drivers/gpu/drm/drm_irq.c
> > > @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> > >  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> > >  	}
> > >  
> > > +	/*
> > > +	 * Restrict the bump of the software vblank counter to a safe maximum
> > > +	 * value of +1 whenever there is the possibility that concurrent readers
> > > +	 * of vblank timestamps could be active at the moment, as the current
> > > +	 * implementation of the timestamp caching and updating is not safe
> > > +	 * against concurrent readers for calls to store_vblank() with a bump
> > > +	 * of anything but +1. A bump != 1 would very likely return corrupted
> > > +	 * timestamps to userspace, because the same slot in the cache could
> > > +	 * be concurrently written by store_vblank() and read by one of those
> > > +	 * readers without the read-retry logic detecting the collision.
> > > +	 *
> > > +	 * Concurrent readers can exist when we are called from the
> > > +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> > > +	 * irq callers. However, all those calls to us are happening with the
> > > +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> > > +	 * can't increase while we are executing. Therefore a zero refcount at
> > > +	 * this point is safe for arbitrary counter bumps if we are called
> > > +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> > > +	 * we must also accept a refcount of 1, as whenever we are called from
> > > +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> > > +	 * we must let that one pass through in order to not lose vblank counts
> > > +	 * during vblank irq off - which would completely defeat the whole
> > > +	 * point of this routine.
> > > +	 *
> > > +	 * Whenever we are called from vblank irq, we have to assume concurrent
> > > +	 * readers exist or can show up any time during our execution, even if
> > > +	 * the refcount is currently zero, as vblank irqs are usually only
> > > +	 * enabled due to the presence of readers, and because when we are called
> > > +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> > > +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> > > +	 * called from vblank irq.
> > > +	 */
> > > +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> > > +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> > > +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> > > +			      "refcount %u, vblirq %u\n", pipe, diff,
> > > +			      atomic_read(&vblank->refcount),
> > > +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> > > +		diff = 1;
> > > +	}
> > > +
> > >  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> > >  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
> > >  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> > > -- 
> > > 1.9.1
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> -- 
> Ville Syrj�l�
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
@ 2016-02-09 10:23         ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:23 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: daniel.vetter, michel, linux, stable, dri-devel,
	alexander.deucher, christian.koenig, vbabka

On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrjälä wrote:
> On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
> > On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> > > This fixes a regression introduced by the new drm_update_vblank_count()
> > > implementation in Linux 4.4:
> > > 
> > > Restrict the bump of the software vblank counter in drm_update_vblank_count()
> > > to a safe maximum value of +1 whenever there is the possibility that
> > > concurrent readers of vblank timestamps could be active at the moment,
> > > as the current implementation of the timestamp caching and updating is
> > > not safe against concurrent readers for calls to store_vblank() with a
> > > bump of anything but +1. A bump != 1 would very likely return corrupted
> > > timestamps to userspace, because the same slot in the cache could
> > > be concurrently written by store_vblank() and read by one of those
> > > readers in a non-atomic fashion and without the read-retry logic
> > > detecting this collision.
> > > 
> > > Concurrent readers can exist while drm_update_vblank_count() is called
> > > from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> > > irq callers. However, all those calls are happening with the vbl_lock
> > > locked thereby preventing a drm_vblank_get(), so the vblank refcount
> > > can't increase while drm_update_vblank_count() is executing. Therefore
> > > a zero vblank refcount during execution of that function signals that
> > > is safe for arbitrary counter bumps if called from outside vblank irq,
> > > whereas a non-zero count is not safe.
> > > 
> > > Whenever the function is called from vblank irq, we have to assume concurrent
> > > readers could show up any time during its execution, even if the refcount
> > > is currently zero, as vblank irqs are usually only enabled due to the
> > > presence of readers, and because when it is called from vblank irq it
> > > can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> > > Therefore also restrict bumps to +1 when the function is called from vblank
> > > irq.
> > > 
> > > Such bumps of more than +1 can happen at other times than reenabling
> > > vblank irqs, e.g., when regular vblank interrupts get delayed by more
> > > than 1 frame due to long held locks, long irq off periods, realtime
> > > preemption on RT kernels, or system management interrupts.
> > > 
> > > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > > Cc: <stable@vger.kernel.org> # 4.4+
> > > Cc: michel@daenzer.net
> > > Cc: vbabka@suse.cz
> > > Cc: ville.syrjala@linux.intel.com
> > > Cc: daniel.vetter@ffwll.ch
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: alexander.deucher@amd.com
> > > Cc: christian.koenig@amd.com
> > 
> > Imo this is duct-tape. If we want to fix this up properly I think we
> > should just use a full-blown seqlock instead of our hand-rolled one. And
> > that could handle any increment at all.
> 
> And I even fixed this [1] almost a half a year ago when I sent the
> original series, but that part got held hostage to the same seqlock
> argument. Perfect is the enemy of good.
> 
> [1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html

Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
patch over Mario's hack here tbh. Your patch with seqlock would be even
more awesome.
-Daniel

> 
> > -Daniel
> > 
> > > ---
> > >  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 41 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > > index bcb8528..aa2c74b 100644
> > > --- a/drivers/gpu/drm/drm_irq.c
> > > +++ b/drivers/gpu/drm/drm_irq.c
> > > @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> > >  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> > >  	}
> > >  
> > > +	/*
> > > +	 * Restrict the bump of the software vblank counter to a safe maximum
> > > +	 * value of +1 whenever there is the possibility that concurrent readers
> > > +	 * of vblank timestamps could be active at the moment, as the current
> > > +	 * implementation of the timestamp caching and updating is not safe
> > > +	 * against concurrent readers for calls to store_vblank() with a bump
> > > +	 * of anything but +1. A bump != 1 would very likely return corrupted
> > > +	 * timestamps to userspace, because the same slot in the cache could
> > > +	 * be concurrently written by store_vblank() and read by one of those
> > > +	 * readers without the read-retry logic detecting the collision.
> > > +	 *
> > > +	 * Concurrent readers can exist when we are called from the
> > > +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> > > +	 * irq callers. However, all those calls to us are happening with the
> > > +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> > > +	 * can't increase while we are executing. Therefore a zero refcount at
> > > +	 * this point is safe for arbitrary counter bumps if we are called
> > > +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> > > +	 * we must also accept a refcount of 1, as whenever we are called from
> > > +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> > > +	 * we must let that one pass through in order to not lose vblank counts
> > > +	 * during vblank irq off - which would completely defeat the whole
> > > +	 * point of this routine.
> > > +	 *
> > > +	 * Whenever we are called from vblank irq, we have to assume concurrent
> > > +	 * readers exist or can show up any time during our execution, even if
> > > +	 * the refcount is currently zero, as vblank irqs are usually only
> > > +	 * enabled due to the presence of readers, and because when we are called
> > > +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> > > +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> > > +	 * called from vblank irq.
> > > +	 */
> > > +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> > > +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> > > +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> > > +			      "refcount %u, vblirq %u\n", pipe, diff,
> > > +			      atomic_read(&vblank->refcount),
> > > +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> > > +		diff = 1;
> > > +	}
> > > +
> > >  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> > >  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
> > >  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> > > -- 
> > > 1.9.1
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
  2016-02-09 10:06     ` Daniel Vetter
@ 2016-02-09 11:10       ` Ville Syrjälä
  -1 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 11:10 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mario Kleiner, dri-devel, linux, stable, michel, vbabka,
	daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> > drm_vblank_offdelay can have three different types of values:
> > 
> > < 0 is to be always treated the same as dev->vblank_disable_immediate
> > = 0 is to be treated as "never disable vblanks"
> > > 0 is to be treated as disable immediate if kms driver wants it
> >     that way via dev->vblank_disable_immediate. Otherwise it is
> >     a disable timeout in msecs.
> > 
> > This got broken in Linux 3.18+ for the implementation of
> > drm_vblank_on. If the user specified a value of zero which should
> > always reenable vblank irqs in this function, a kms driver could
> > override the users choice by setting vblank_disable_immediate
> > to true. This patch fixes the regression and keeps the user in
> > control.
> > 
> > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > Cc: <stable@vger.kernel.org> # 3.18+
> > Cc: michel@daenzer.net
> > Cc: vbabka@suse.cz
> > Cc: ville.syrjala@linux.intel.com
> > Cc: daniel.vetter@ffwll.ch
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: alexander.deucher@amd.com
> > Cc: christian.koenig@amd.com
> > ---
> >  drivers/gpu/drm/drm_irq.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > index 5c27ad3..fb17c45 100644
> > --- a/drivers/gpu/drm/drm_irq.c
> > +++ b/drivers/gpu/drm/drm_irq.c
> > @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
> >  	 * re-enable interrupts if there are users left, or the
> >  	 * user wishes vblank interrupts to be enabled all the time.
> >  	 */
> > -	if (atomic_read(&vblank->refcount) != 0 ||
> > -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> > +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> > +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
> 
> Hm, shouldn't we change this to only enable the vblank irq if we need it,
> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
> it superflously after a modeset, if userspace didn't yet ask for vblank
> timestamps. But then is was specifically added by Ville in cd19e52aee922,
> so I guess someone really wants this.

IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
case. I think it just ended up as a mess due to changing some of the
semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the 
review of the series. So yeah, given how drm_vblank_put() works now, I'd
just make this check for offdelay==0.

> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> >  		WARN_ON(drm_vblank_enable(dev, pipe));
> >  	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> >  }
> > -- 
> > 1.9.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
@ 2016-02-09 11:10       ` Ville Syrjälä
  0 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 11:10 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mario Kleiner, dri-devel, linux, stable, michel, vbabka,
	daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> > drm_vblank_offdelay can have three different types of values:
> > 
> > < 0 is to be always treated the same as dev->vblank_disable_immediate
> > = 0 is to be treated as "never disable vblanks"
> > > 0 is to be treated as disable immediate if kms driver wants it
> >     that way via dev->vblank_disable_immediate. Otherwise it is
> >     a disable timeout in msecs.
> > 
> > This got broken in Linux 3.18+ for the implementation of
> > drm_vblank_on. If the user specified a value of zero which should
> > always reenable vblank irqs in this function, a kms driver could
> > override the users choice by setting vblank_disable_immediate
> > to true. This patch fixes the regression and keeps the user in
> > control.
> > 
> > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > Cc: <stable@vger.kernel.org> # 3.18+
> > Cc: michel@daenzer.net
> > Cc: vbabka@suse.cz
> > Cc: ville.syrjala@linux.intel.com
> > Cc: daniel.vetter@ffwll.ch
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: alexander.deucher@amd.com
> > Cc: christian.koenig@amd.com
> > ---
> >  drivers/gpu/drm/drm_irq.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > index 5c27ad3..fb17c45 100644
> > --- a/drivers/gpu/drm/drm_irq.c
> > +++ b/drivers/gpu/drm/drm_irq.c
> > @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
> >  	 * re-enable interrupts if there are users left, or the
> >  	 * user wishes vblank interrupts to be enabled all the time.
> >  	 */
> > -	if (atomic_read(&vblank->refcount) != 0 ||
> > -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> > +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> > +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
> 
> Hm, shouldn't we change this to only enable the vblank irq if we need it,
> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
> it superflously after a modeset, if userspace didn't yet ask for vblank
> timestamps. But then is was specifically added by Ville in cd19e52aee922,
> so I guess someone really wants this.

IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
case. I think it just ended up as a mess due to changing some of the
semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the 
review of the series. So yeah, given how drm_vblank_put() works now, I'd
just make this check for offdelay==0.

> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> >  		WARN_ON(drm_vblank_enable(dev, pipe));
> >  	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> >  }
> > -- 
> > 1.9.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off()
  2016-02-09  9:54     ` Daniel Vetter
  (?)
@ 2016-02-09 13:27     ` Mario Kleiner
  -1 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 13:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On 02/09/2016 10:54 AM, Daniel Vetter wrote:
> On Mon, Feb 08, 2016 at 02:13:24AM +0100, Mario Kleiner wrote:
>> Otherwise if a kms driver calls into drm_vblank_off() more than once
>> before calling drm_vblank_on() again, the redundant calls to
>> vblank_disable_and_save() will call drm_update_vblank_count()
>> while hw vblank counters and vblank timestamping are in a undefined
>> state during modesets, dpms off etc.
>>
>> At least with the legacy drm helpers it is not unusual to
>> get multiple calls to drm_vblank_off and drm_vblank_on, e.g.,
>> half a dozen calls to drm_vblank_off and two calls to drm_vblank_on
>> were observed on radeon-kms during dpms-off -> dpms-on transition.
>>
>> Fixes large jumps of the software maintained vblank counter due to
>> the hardware vblank counter resetting to zero during dpms off or
>> modeset, e.g., if radeon-kms is modified to use drm_vblank_off/on
>> instead of drm_vblank_pre/post_modeset().
>>
>> This fixes a regression caused by the changes made to
>> drm_update_vblank_count() in Linux 4.4.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>> Cc: <stable@vger.kernel.org> # 4.4+
>> Cc: michel@daenzer.net
>> Cc: vbabka@suse.cz
>> Cc: ville.syrjala@linux.intel.com
>> Cc: daniel.vetter@ffwll.ch
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: alexander.deucher@amd.com
>> Cc: christian.koenig@amd.com
>> ---
>>   drivers/gpu/drm/drm_irq.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 607f493..bcb8528 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -1313,7 +1313,13 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe)
>>   	spin_lock_irqsave(&dev->event_lock, irqflags);
>>
>>   	spin_lock(&dev->vbl_lock);
>> -	vblank_disable_and_save(dev, pipe);
>> +	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
>> +		      pipe, vblank->enabled, vblank->inmodeset);
>> +
>> +	/* Avoid redundant vblank disables without previous drm_vblank_on(). */
>> +	if (!vblank->inmodeset)
>
> Since atomic drivers shouldn't suck so badly at this, maybe
>
> 	if (DRIVER_ATOMIC || !vblank->inmodeset)
>
> instead? That way the flexibility would be constraint to old drivers that
> need it because legacy crtc helpers suck. With that change:
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>

Ok, thanks, will change that.
-mario




>> +		vblank_disable_and_save(dev, pipe);
>> +
>>   	wake_up(&vblank->queue);
>>
>>   	/*
>> @@ -1415,6 +1421,9 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
>>   		return;
>>
>>   	spin_lock_irqsave(&dev->vbl_lock, irqflags);
>> +	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
>> +		      pipe, vblank->enabled, vblank->inmodeset);
>> +
>>   	/* Drop our private "prevent drm_vblank_get" refcount */
>>   	if (vblank->inmodeset) {
>>   		atomic_dec(&vblank->refcount);
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
  2016-02-09 11:10       ` Ville Syrjälä
@ 2016-02-09 13:29         ` Mario Kleiner
  -1 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 13:29 UTC (permalink / raw)
  To: Ville Syrjälä, Daniel Vetter
  Cc: dri-devel, linux, stable, michel, vbabka, daniel.vetter,
	alexander.deucher, christian.koenig

On 02/09/2016 12:10 PM, Ville Syrj�l� wrote:
> On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
>> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
>>> drm_vblank_offdelay can have three different types of values:
>>>
>>> < 0 is to be always treated the same as dev->vblank_disable_immediate
>>> = 0 is to be treated as "never disable vblanks"
>>>> 0 is to be treated as disable immediate if kms driver wants it
>>>      that way via dev->vblank_disable_immediate. Otherwise it is
>>>      a disable timeout in msecs.
>>>
>>> This got broken in Linux 3.18+ for the implementation of
>>> drm_vblank_on. If the user specified a value of zero which should
>>> always reenable vblank irqs in this function, a kms driver could
>>> override the users choice by setting vblank_disable_immediate
>>> to true. This patch fixes the regression and keeps the user in
>>> control.
>>>
>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>> Cc: <stable@vger.kernel.org> # 3.18+
>>> Cc: michel@daenzer.net
>>> Cc: vbabka@suse.cz
>>> Cc: ville.syrjala@linux.intel.com
>>> Cc: daniel.vetter@ffwll.ch
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: alexander.deucher@amd.com
>>> Cc: christian.koenig@amd.com
>>> ---
>>>   drivers/gpu/drm/drm_irq.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index 5c27ad3..fb17c45 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
>>>   	 * re-enable interrupts if there are users left, or the
>>>   	 * user wishes vblank interrupts to be enabled all the time.
>>>   	 */
>>> -	if (atomic_read(&vblank->refcount) != 0 ||
>>> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
>>> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
>>> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
>>
>> Hm, shouldn't we change this to only enable the vblank irq if we need it,
>> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
>> it superflously after a modeset, if userspace didn't yet ask for vblank
>> timestamps. But then is was specifically added by Ville in cd19e52aee922,
>> so I guess someone really wants this.
>
> IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
> case. I think it just ended up as a mess due to changing some of the
> semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the
> review of the series. So yeah, given how drm_vblank_put() works now, I'd
> just make this check for offdelay==0.
>
>>
>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>

I can change that to offdelay==0 only, if you want. It was mostly about 
preserving what's there while at the same time fixing the important 
offdelay==0 user override.

-mario

>>>   		WARN_ON(drm_vblank_enable(dev, pipe));
>>>   	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
>>>   }
>>> --
>>> 1.9.1
>>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
@ 2016-02-09 13:29         ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 13:29 UTC (permalink / raw)
  To: Ville Syrjälä, Daniel Vetter
  Cc: dri-devel, linux, stable, michel, vbabka, daniel.vetter,
	alexander.deucher, christian.koenig

On 02/09/2016 12:10 PM, Ville Syrjälä wrote:
> On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
>> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
>>> drm_vblank_offdelay can have three different types of values:
>>>
>>> < 0 is to be always treated the same as dev->vblank_disable_immediate
>>> = 0 is to be treated as "never disable vblanks"
>>>> 0 is to be treated as disable immediate if kms driver wants it
>>>      that way via dev->vblank_disable_immediate. Otherwise it is
>>>      a disable timeout in msecs.
>>>
>>> This got broken in Linux 3.18+ for the implementation of
>>> drm_vblank_on. If the user specified a value of zero which should
>>> always reenable vblank irqs in this function, a kms driver could
>>> override the users choice by setting vblank_disable_immediate
>>> to true. This patch fixes the regression and keeps the user in
>>> control.
>>>
>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>> Cc: <stable@vger.kernel.org> # 3.18+
>>> Cc: michel@daenzer.net
>>> Cc: vbabka@suse.cz
>>> Cc: ville.syrjala@linux.intel.com
>>> Cc: daniel.vetter@ffwll.ch
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: alexander.deucher@amd.com
>>> Cc: christian.koenig@amd.com
>>> ---
>>>   drivers/gpu/drm/drm_irq.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index 5c27ad3..fb17c45 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
>>>   	 * re-enable interrupts if there are users left, or the
>>>   	 * user wishes vblank interrupts to be enabled all the time.
>>>   	 */
>>> -	if (atomic_read(&vblank->refcount) != 0 ||
>>> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
>>> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
>>> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
>>
>> Hm, shouldn't we change this to only enable the vblank irq if we need it,
>> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
>> it superflously after a modeset, if userspace didn't yet ask for vblank
>> timestamps. But then is was specifically added by Ville in cd19e52aee922,
>> so I guess someone really wants this.
>
> IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
> case. I think it just ended up as a mess due to changing some of the
> semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the
> review of the series. So yeah, given how drm_vblank_put() works now, I'd
> just make this check for offdelay==0.
>
>>
>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>

I can change that to offdelay==0 only, if you want. It was mostly about 
preserving what's there while at the same time fixing the important 
offdelay==0 user override.

-mario

>>>   		WARN_ON(drm_vblank_enable(dev, pipe));
>>>   	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
>>>   }
>>> --
>>> 1.9.1
>>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
  2016-02-09 10:23         ` Daniel Vetter
@ 2016-02-09 13:39           ` Mario Kleiner
  -1 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 13:39 UTC (permalink / raw)
  To: Daniel Vetter, Ville Syrjälä
  Cc: dri-devel, linux, stable, michel, vbabka, daniel.vetter,
	alexander.deucher, christian.koenig

On 02/09/2016 11:23 AM, Daniel Vetter wrote:
> On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrj�l� wrote:
>> On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
>>> On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
>>>> This fixes a regression introduced by the new drm_update_vblank_count()
>>>> implementation in Linux 4.4:
>>>>
>>>> Restrict the bump of the software vblank counter in drm_update_vblank_count()
>>>> to a safe maximum value of +1 whenever there is the possibility that
>>>> concurrent readers of vblank timestamps could be active at the moment,
>>>> as the current implementation of the timestamp caching and updating is
>>>> not safe against concurrent readers for calls to store_vblank() with a
>>>> bump of anything but +1. A bump != 1 would very likely return corrupted
>>>> timestamps to userspace, because the same slot in the cache could
>>>> be concurrently written by store_vblank() and read by one of those
>>>> readers in a non-atomic fashion and without the read-retry logic
>>>> detecting this collision.
>>>>
>>>> Concurrent readers can exist while drm_update_vblank_count() is called
>>>> from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
>>>> irq callers. However, all those calls are happening with the vbl_lock
>>>> locked thereby preventing a drm_vblank_get(), so the vblank refcount
>>>> can't increase while drm_update_vblank_count() is executing. Therefore
>>>> a zero vblank refcount during execution of that function signals that
>>>> is safe for arbitrary counter bumps if called from outside vblank irq,
>>>> whereas a non-zero count is not safe.
>>>>
>>>> Whenever the function is called from vblank irq, we have to assume concurrent
>>>> readers could show up any time during its execution, even if the refcount
>>>> is currently zero, as vblank irqs are usually only enabled due to the
>>>> presence of readers, and because when it is called from vblank irq it
>>>> can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
>>>> Therefore also restrict bumps to +1 when the function is called from vblank
>>>> irq.
>>>>
>>>> Such bumps of more than +1 can happen at other times than reenabling
>>>> vblank irqs, e.g., when regular vblank interrupts get delayed by more
>>>> than 1 frame due to long held locks, long irq off periods, realtime
>>>> preemption on RT kernels, or system management interrupts.
>>>>
>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>> Cc: <stable@vger.kernel.org> # 4.4+
>>>> Cc: michel@daenzer.net
>>>> Cc: vbabka@suse.cz
>>>> Cc: ville.syrjala@linux.intel.com
>>>> Cc: daniel.vetter@ffwll.ch
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> Cc: alexander.deucher@amd.com
>>>> Cc: christian.koenig@amd.com
>>>
>>> Imo this is duct-tape. If we want to fix this up properly I think we
>>> should just use a full-blown seqlock instead of our hand-rolled one. And
>>> that could handle any increment at all.
>>
>> And I even fixed this [1] almost a half a year ago when I sent the
>> original series, but that part got held hostage to the same seqlock
>> argument. Perfect is the enemy of good.
>>
>> [1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html
>
> Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
> patch over Mario's hack here tbh. Your patch with seqlock would be even
> more awesome.
> -Daniel
>

I agree that my hack is only duct-tape. That's why the long code comment 
to let people know under which condition they could remove it.

Using seqlocks would be the robust long term solution. But as this is 
supposed to be a fix for both 4.4 and 4.5 i thought that such a rewrite 
would be too intrusive as a change, compared to this one-liner?

The original "roll our own" seqlock look alike implementation was meant 
to avoid/minimize taking locks, esp. with _irqsave that are taken by 
both userspace and timing sensitive vblank irq handling code. There were 
various locking changes since then and that advantage might have been 
lost already quite a long time ago, so maybe switching to full seqlocks 
wouldn't pose some new performance problems there, but i haven't looked 
into this.

-mario

>>
>>> -Daniel
>>>
>>>> ---
>>>>   drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 41 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>> index bcb8528..aa2c74b 100644
>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>> @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>   	}
>>>>
>>>> +	/*
>>>> +	 * Restrict the bump of the software vblank counter to a safe maximum
>>>> +	 * value of +1 whenever there is the possibility that concurrent readers
>>>> +	 * of vblank timestamps could be active at the moment, as the current
>>>> +	 * implementation of the timestamp caching and updating is not safe
>>>> +	 * against concurrent readers for calls to store_vblank() with a bump
>>>> +	 * of anything but +1. A bump != 1 would very likely return corrupted
>>>> +	 * timestamps to userspace, because the same slot in the cache could
>>>> +	 * be concurrently written by store_vblank() and read by one of those
>>>> +	 * readers without the read-retry logic detecting the collision.
>>>> +	 *
>>>> +	 * Concurrent readers can exist when we are called from the
>>>> +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
>>>> +	 * irq callers. However, all those calls to us are happening with the
>>>> +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
>>>> +	 * can't increase while we are executing. Therefore a zero refcount at
>>>> +	 * this point is safe for arbitrary counter bumps if we are called
>>>> +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
>>>> +	 * we must also accept a refcount of 1, as whenever we are called from
>>>> +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
>>>> +	 * we must let that one pass through in order to not lose vblank counts
>>>> +	 * during vblank irq off - which would completely defeat the whole
>>>> +	 * point of this routine.
>>>> +	 *
>>>> +	 * Whenever we are called from vblank irq, we have to assume concurrent
>>>> +	 * readers exist or can show up any time during our execution, even if
>>>> +	 * the refcount is currently zero, as vblank irqs are usually only
>>>> +	 * enabled due to the presence of readers, and because when we are called
>>>> +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
>>>> +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
>>>> +	 * called from vblank irq.
>>>> +	 */
>>>> +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
>>>> +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
>>>> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
>>>> +			      "refcount %u, vblirq %u\n", pipe, diff,
>>>> +			      atomic_read(&vblank->refcount),
>>>> +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
>>>> +		diff = 1;
>>>> +	}
>>>> +
>>>>   	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
>>>>   		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
>>>>   		      pipe, vblank->count, diff, cur_vblank, vblank->last);
>>>> --
>>>> 1.9.1
>>>>
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>>
>> --
>> Ville Syrj�l�
>> Intel OTC
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
@ 2016-02-09 13:39           ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 13:39 UTC (permalink / raw)
  To: Daniel Vetter, Ville Syrjälä
  Cc: daniel.vetter, michel, linux, stable, dri-devel,
	alexander.deucher, christian.koenig, vbabka

On 02/09/2016 11:23 AM, Daniel Vetter wrote:
> On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrjälä wrote:
>> On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
>>> On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
>>>> This fixes a regression introduced by the new drm_update_vblank_count()
>>>> implementation in Linux 4.4:
>>>>
>>>> Restrict the bump of the software vblank counter in drm_update_vblank_count()
>>>> to a safe maximum value of +1 whenever there is the possibility that
>>>> concurrent readers of vblank timestamps could be active at the moment,
>>>> as the current implementation of the timestamp caching and updating is
>>>> not safe against concurrent readers for calls to store_vblank() with a
>>>> bump of anything but +1. A bump != 1 would very likely return corrupted
>>>> timestamps to userspace, because the same slot in the cache could
>>>> be concurrently written by store_vblank() and read by one of those
>>>> readers in a non-atomic fashion and without the read-retry logic
>>>> detecting this collision.
>>>>
>>>> Concurrent readers can exist while drm_update_vblank_count() is called
>>>> from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
>>>> irq callers. However, all those calls are happening with the vbl_lock
>>>> locked thereby preventing a drm_vblank_get(), so the vblank refcount
>>>> can't increase while drm_update_vblank_count() is executing. Therefore
>>>> a zero vblank refcount during execution of that function signals that
>>>> is safe for arbitrary counter bumps if called from outside vblank irq,
>>>> whereas a non-zero count is not safe.
>>>>
>>>> Whenever the function is called from vblank irq, we have to assume concurrent
>>>> readers could show up any time during its execution, even if the refcount
>>>> is currently zero, as vblank irqs are usually only enabled due to the
>>>> presence of readers, and because when it is called from vblank irq it
>>>> can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
>>>> Therefore also restrict bumps to +1 when the function is called from vblank
>>>> irq.
>>>>
>>>> Such bumps of more than +1 can happen at other times than reenabling
>>>> vblank irqs, e.g., when regular vblank interrupts get delayed by more
>>>> than 1 frame due to long held locks, long irq off periods, realtime
>>>> preemption on RT kernels, or system management interrupts.
>>>>
>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>> Cc: <stable@vger.kernel.org> # 4.4+
>>>> Cc: michel@daenzer.net
>>>> Cc: vbabka@suse.cz
>>>> Cc: ville.syrjala@linux.intel.com
>>>> Cc: daniel.vetter@ffwll.ch
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> Cc: alexander.deucher@amd.com
>>>> Cc: christian.koenig@amd.com
>>>
>>> Imo this is duct-tape. If we want to fix this up properly I think we
>>> should just use a full-blown seqlock instead of our hand-rolled one. And
>>> that could handle any increment at all.
>>
>> And I even fixed this [1] almost a half a year ago when I sent the
>> original series, but that part got held hostage to the same seqlock
>> argument. Perfect is the enemy of good.
>>
>> [1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html
>
> Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
> patch over Mario's hack here tbh. Your patch with seqlock would be even
> more awesome.
> -Daniel
>

I agree that my hack is only duct-tape. That's why the long code comment 
to let people know under which condition they could remove it.

Using seqlocks would be the robust long term solution. But as this is 
supposed to be a fix for both 4.4 and 4.5 i thought that such a rewrite 
would be too intrusive as a change, compared to this one-liner?

The original "roll our own" seqlock look alike implementation was meant 
to avoid/minimize taking locks, esp. with _irqsave that are taken by 
both userspace and timing sensitive vblank irq handling code. There were 
various locking changes since then and that advantage might have been 
lost already quite a long time ago, so maybe switching to full seqlocks 
wouldn't pose some new performance problems there, but i haven't looked 
into this.

-mario

>>
>>> -Daniel
>>>
>>>> ---
>>>>   drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 41 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>> index bcb8528..aa2c74b 100644
>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>> @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>   	}
>>>>
>>>> +	/*
>>>> +	 * Restrict the bump of the software vblank counter to a safe maximum
>>>> +	 * value of +1 whenever there is the possibility that concurrent readers
>>>> +	 * of vblank timestamps could be active at the moment, as the current
>>>> +	 * implementation of the timestamp caching and updating is not safe
>>>> +	 * against concurrent readers for calls to store_vblank() with a bump
>>>> +	 * of anything but +1. A bump != 1 would very likely return corrupted
>>>> +	 * timestamps to userspace, because the same slot in the cache could
>>>> +	 * be concurrently written by store_vblank() and read by one of those
>>>> +	 * readers without the read-retry logic detecting the collision.
>>>> +	 *
>>>> +	 * Concurrent readers can exist when we are called from the
>>>> +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
>>>> +	 * irq callers. However, all those calls to us are happening with the
>>>> +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
>>>> +	 * can't increase while we are executing. Therefore a zero refcount at
>>>> +	 * this point is safe for arbitrary counter bumps if we are called
>>>> +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
>>>> +	 * we must also accept a refcount of 1, as whenever we are called from
>>>> +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
>>>> +	 * we must let that one pass through in order to not lose vblank counts
>>>> +	 * during vblank irq off - which would completely defeat the whole
>>>> +	 * point of this routine.
>>>> +	 *
>>>> +	 * Whenever we are called from vblank irq, we have to assume concurrent
>>>> +	 * readers exist or can show up any time during our execution, even if
>>>> +	 * the refcount is currently zero, as vblank irqs are usually only
>>>> +	 * enabled due to the presence of readers, and because when we are called
>>>> +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
>>>> +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
>>>> +	 * called from vblank irq.
>>>> +	 */
>>>> +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
>>>> +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
>>>> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
>>>> +			      "refcount %u, vblirq %u\n", pipe, diff,
>>>> +			      atomic_read(&vblank->refcount),
>>>> +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
>>>> +		diff = 1;
>>>> +	}
>>>> +
>>>>   	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
>>>>   		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
>>>>   		      pipe, vblank->count, diff, cur_vblank, vblank->last);
>>>> --
>>>> 1.9.1
>>>>
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>>
>> --
>> Ville Syrjälä
>> Intel OTC
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
  2016-02-09 13:29         ` Mario Kleiner
@ 2016-02-09 13:41           ` Ville Syrjälä
  -1 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 13:41 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Daniel Vetter, dri-devel, linux, stable, michel, vbabka,
	daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 02:29:49PM +0100, Mario Kleiner wrote:
> On 02/09/2016 12:10 PM, Ville Syrj�l� wrote:
> > On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
> >> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> >>> drm_vblank_offdelay can have three different types of values:
> >>>
> >>> < 0 is to be always treated the same as dev->vblank_disable_immediate
> >>> = 0 is to be treated as "never disable vblanks"
> >>>> 0 is to be treated as disable immediate if kms driver wants it
> >>>      that way via dev->vblank_disable_immediate. Otherwise it is
> >>>      a disable timeout in msecs.
> >>>
> >>> This got broken in Linux 3.18+ for the implementation of
> >>> drm_vblank_on. If the user specified a value of zero which should
> >>> always reenable vblank irqs in this function, a kms driver could
> >>> override the users choice by setting vblank_disable_immediate
> >>> to true. This patch fixes the regression and keeps the user in
> >>> control.
> >>>
> >>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >>> Cc: <stable@vger.kernel.org> # 3.18+
> >>> Cc: michel@daenzer.net
> >>> Cc: vbabka@suse.cz
> >>> Cc: ville.syrjala@linux.intel.com
> >>> Cc: daniel.vetter@ffwll.ch
> >>> Cc: dri-devel@lists.freedesktop.org
> >>> Cc: alexander.deucher@amd.com
> >>> Cc: christian.koenig@amd.com
> >>> ---
> >>>   drivers/gpu/drm/drm_irq.c | 4 ++--
> >>>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >>> index 5c27ad3..fb17c45 100644
> >>> --- a/drivers/gpu/drm/drm_irq.c
> >>> +++ b/drivers/gpu/drm/drm_irq.c
> >>> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
> >>>   	 * re-enable interrupts if there are users left, or the
> >>>   	 * user wishes vblank interrupts to be enabled all the time.
> >>>   	 */
> >>> -	if (atomic_read(&vblank->refcount) != 0 ||
> >>> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> >>> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> >>> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
> >>
> >> Hm, shouldn't we change this to only enable the vblank irq if we need it,
> >> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
> >> it superflously after a modeset, if userspace didn't yet ask for vblank
> >> timestamps. But then is was specifically added by Ville in cd19e52aee922,
> >> so I guess someone really wants this.
> >
> > IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
> > case. I think it just ended up as a mess due to changing some of the
> > semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the
> > review of the series. So yeah, given how drm_vblank_put() works now, I'd
> > just make this check for offdelay==0.
> >
> >>
> >> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>
> 
> I can change that to offdelay==0 only, if you want. It was mostly about 
> preserving what's there while at the same time fixing the important 
> offdelay==0 user override.

Yeah, just offdelay==0 seems best. Otherwise I think we could actually
leave the interrupt enabled indefinitely w/ offdelay>0 since there's not
going to be a drm_vblank_put() to arm the disable timer.

> 
> -mario
> 
> >>>   		WARN_ON(drm_vblank_enable(dev, pipe));
> >>>   	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> >>>   }
> >>> --
> >>> 1.9.1
> >>>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
> >

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
@ 2016-02-09 13:41           ` Ville Syrjälä
  0 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 13:41 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Daniel Vetter, dri-devel, linux, stable, michel, vbabka,
	daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 02:29:49PM +0100, Mario Kleiner wrote:
> On 02/09/2016 12:10 PM, Ville Syrjälä wrote:
> > On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
> >> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> >>> drm_vblank_offdelay can have three different types of values:
> >>>
> >>> < 0 is to be always treated the same as dev->vblank_disable_immediate
> >>> = 0 is to be treated as "never disable vblanks"
> >>>> 0 is to be treated as disable immediate if kms driver wants it
> >>>      that way via dev->vblank_disable_immediate. Otherwise it is
> >>>      a disable timeout in msecs.
> >>>
> >>> This got broken in Linux 3.18+ for the implementation of
> >>> drm_vblank_on. If the user specified a value of zero which should
> >>> always reenable vblank irqs in this function, a kms driver could
> >>> override the users choice by setting vblank_disable_immediate
> >>> to true. This patch fixes the regression and keeps the user in
> >>> control.
> >>>
> >>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >>> Cc: <stable@vger.kernel.org> # 3.18+
> >>> Cc: michel@daenzer.net
> >>> Cc: vbabka@suse.cz
> >>> Cc: ville.syrjala@linux.intel.com
> >>> Cc: daniel.vetter@ffwll.ch
> >>> Cc: dri-devel@lists.freedesktop.org
> >>> Cc: alexander.deucher@amd.com
> >>> Cc: christian.koenig@amd.com
> >>> ---
> >>>   drivers/gpu/drm/drm_irq.c | 4 ++--
> >>>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >>> index 5c27ad3..fb17c45 100644
> >>> --- a/drivers/gpu/drm/drm_irq.c
> >>> +++ b/drivers/gpu/drm/drm_irq.c
> >>> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
> >>>   	 * re-enable interrupts if there are users left, or the
> >>>   	 * user wishes vblank interrupts to be enabled all the time.
> >>>   	 */
> >>> -	if (atomic_read(&vblank->refcount) != 0 ||
> >>> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> >>> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> >>> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
> >>
> >> Hm, shouldn't we change this to only enable the vblank irq if we need it,
> >> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
> >> it superflously after a modeset, if userspace didn't yet ask for vblank
> >> timestamps. But then is was specifically added by Ville in cd19e52aee922,
> >> so I guess someone really wants this.
> >
> > IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
> > case. I think it just ended up as a mess due to changing some of the
> > semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the
> > review of the series. So yeah, given how drm_vblank_put() works now, I'd
> > just make this check for offdelay==0.
> >
> >>
> >> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>
> 
> I can change that to offdelay==0 only, if you want. It was mostly about 
> preserving what's there while at the same time fixing the important 
> offdelay==0 user override.

Yeah, just offdelay==0 seems best. Otherwise I think we could actually
leave the interrupt enabled indefinitely w/ offdelay>0 since there's not
going to be a drm_vblank_put() to arm the disable timer.

> 
> -mario
> 
> >>>   		WARN_ON(drm_vblank_enable(dev, pipe));
> >>>   	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> >>>   }
> >>> --
> >>> 1.9.1
> >>>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
> >

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-09 10:09   ` Daniel Vetter
@ 2016-02-09 13:53     ` Mario Kleiner
  2016-02-09 14:11         ` Ville Syrjälä
  0 siblings, 1 reply; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 13:53 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On 02/09/2016 11:09 AM, Daniel Vetter wrote:
> On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
>> The changes to drm_update_vblank_count() in Linux 4.4 added a
>> method to emulate a hardware vblank counter by use of high
>> precision vblank timestamps if a kms driver supports those,
>> but doesn't suppport hw vblank counters.
>>
>> That method assumes that the old timestamp from a previous
>> invocation is valid, but that is not always the case. E.g.,
>> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
>> or drm_update_vblank_count() gets called outside vblank irq and
>> the high precision timestamping can't deliver a precise timestamp,
>> ie. drm_get_last_vbltimestamp() delivers a return value of false,
>> then those functions will initialize the old timestamp to zero
>> to mark it as invalid.
>>
>> A following call to drm_update_vblank_count() would then calculate
>> elapsed time with vblank irqs off as current vblank timestamp minus
>> the zero old timestamp and compute a software vblank counter increment
>> that corresponds to system uptime, causing a large forward jump of the
>> software vblank counter. That jump in turn can cause too long waits
>> in drmWaitVblank and very long delays in delivery of vblank events,
>> resulting in hangs of userspace clients.
>>
>> This problem can be observed on nouveau-kms during machine
>> suspend->resume cycles, where drm_vblank_off is called during
>> suspend, drm_vblank_on is called during resume and the first
>> queries to drm_get_last_vbltimestamp() don't deliver high
>> precision timestamps, resulting in a large harmful counter jump.
>
> Why does nouveau enable vblank interrupts before it can get valid
> timestamps? That sounds like the core bug here, and papering over that in
> the vblank code feels very wrong to me. Maybe we should instead just not
> sample the vblank at all if the timestamp fails and splat a big WARN_ON
> about this, to give driver writers a chance to fix up their mess?
> -Daniel
>

The high precision timestamping is allowed to fail for a kms driver 
under some conditions which are not implementation errors of the driver, 
or getting disabled by user override, so we should handle that robustly. 
We handle it robustly everywhere else in the drm, so we should do it 
here as well.

E.g., nouveau generally supports timestamping/scanout position queries, 
but can't support it on old pre NV-50 hardware with some output type 
(either on analog VGA, or DVI-D, can't remember atm. which one is 
unsupported).

There are also new Soc drivers showing up which support those methods, 
but at least i didn't verify or test if their implementations are good 
enough for the needs of the new drm_udpate_vblank_count() which is more 
sensitive to kms drivers being even slightly off.

-mario

>>
>> Fix this by checking if the old timestamp used for this calculations
>> is zero == invalid. If so, perform a counter increment of +1 to
>> prevent large counter jumps and reinitialize the timestamps to
>> sane values.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>> Cc: <stable@vger.kernel.org> # 4.4+
>> Cc: michel@daenzer.net
>> Cc: vbabka@suse.cz
>> Cc: ville.syrjala@linux.intel.com
>> Cc: daniel.vetter@ffwll.ch
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: alexander.deucher@amd.com
>> Cc: christian.koenig@amd.com
>> ---
>>   drivers/gpu/drm/drm_irq.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index fb17c45..88bdf19 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>   			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>   				      " diff_ns = %lld, framedur_ns = %d)\n",
>>   				      pipe, (long long) diff_ns, framedur_ns);
>> +
>> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
>> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
>> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
>> +				      pipe);
>> +			diff = 1;
>> +		}
>>   	} else {
>>   		/* some kind of default for drivers w/o accurate vbl timestamping */
>>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-09 13:53     ` Mario Kleiner
@ 2016-02-09 14:11         ` Ville Syrjälä
  0 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 14:11 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Daniel Vetter, dri-devel, linux, stable, michel, vbabka,
	daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 02:53:30PM +0100, Mario Kleiner wrote:
> On 02/09/2016 11:09 AM, Daniel Vetter wrote:
> > On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
> >> The changes to drm_update_vblank_count() in Linux 4.4 added a
> >> method to emulate a hardware vblank counter by use of high
> >> precision vblank timestamps if a kms driver supports those,
> >> but doesn't suppport hw vblank counters.
> >>
> >> That method assumes that the old timestamp from a previous
> >> invocation is valid, but that is not always the case. E.g.,
> >> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
> >> or drm_update_vblank_count() gets called outside vblank irq and
> >> the high precision timestamping can't deliver a precise timestamp,
> >> ie. drm_get_last_vbltimestamp() delivers a return value of false,
> >> then those functions will initialize the old timestamp to zero
> >> to mark it as invalid.
> >>
> >> A following call to drm_update_vblank_count() would then calculate
> >> elapsed time with vblank irqs off as current vblank timestamp minus
> >> the zero old timestamp and compute a software vblank counter increment
> >> that corresponds to system uptime, causing a large forward jump of the
> >> software vblank counter. That jump in turn can cause too long waits
> >> in drmWaitVblank and very long delays in delivery of vblank events,
> >> resulting in hangs of userspace clients.
> >>
> >> This problem can be observed on nouveau-kms during machine
> >> suspend->resume cycles, where drm_vblank_off is called during
> >> suspend, drm_vblank_on is called during resume and the first
> >> queries to drm_get_last_vbltimestamp() don't deliver high
> >> precision timestamps, resulting in a large harmful counter jump.
> >
> > Why does nouveau enable vblank interrupts before it can get valid
> > timestamps? That sounds like the core bug here, and papering over that in
> > the vblank code feels very wrong to me. Maybe we should instead just not
> > sample the vblank at all if the timestamp fails and splat a big WARN_ON
> > about this, to give driver writers a chance to fix up their mess?
> > -Daniel
> >
> 
> The high precision timestamping is allowed to fail for a kms driver 
> under some conditions which are not implementation errors of the driver, 
> or getting disabled by user override, so we should handle that robustly. 
> We handle it robustly everywhere else in the drm, so we should do it 
> here as well.
> 
> E.g., nouveau generally supports timestamping/scanout position queries, 
> but can't support it on old pre NV-50 hardware with some output type 
> (either on analog VGA, or DVI-D, can't remember atm. which one is 
> unsupported).

I think the surprising thing here is that it fails first and then
succeeds on the same crtc/output combo presumably.

I guess in theory the driver could fail during random times for whatever
reason, though I tend to think that the driver should either make it
robust or not even pretend to support it.

I suppose one failure mode the driver can't really do anything about is
some random SMI crap or something stalling the timestamp queries enough
that we fail the precisions tests and exhaust the retry limits. So yeah,
making it robust against that kind of nastyness sounds line it might be
a good idea. Though perhaps it should be something a bit more severe
than a DRM_DEBUG since I think it really shouldn't happen when the
driver and system are otherwise sane. Of course if it routinely fails
with some driver then simply making it spew errors into dmesg isn't
so nice, unless the driver also gets fixed.

> 
> There are also new Soc drivers showing up which support those methods, 
> but at least i didn't verify or test if their implementations are good 
> enough for the needs of the new drm_udpate_vblank_count() which is more 
> sensitive to kms drivers being even slightly off.
> 
> -mario
> 
> >>
> >> Fix this by checking if the old timestamp used for this calculations
> >> is zero == invalid. If so, perform a counter increment of +1 to
> >> prevent large counter jumps and reinitialize the timestamps to
> >> sane values.
> >>
> >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >> Cc: <stable@vger.kernel.org> # 4.4+
> >> Cc: michel@daenzer.net
> >> Cc: vbabka@suse.cz
> >> Cc: ville.syrjala@linux.intel.com
> >> Cc: daniel.vetter@ffwll.ch
> >> Cc: dri-devel@lists.freedesktop.org
> >> Cc: alexander.deucher@amd.com
> >> Cc: christian.koenig@amd.com
> >> ---
> >>   drivers/gpu/drm/drm_irq.c | 7 +++++++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >> index fb17c45..88bdf19 100644
> >> --- a/drivers/gpu/drm/drm_irq.c
> >> +++ b/drivers/gpu/drm/drm_irq.c
> >> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>   			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>   				      " diff_ns = %lld, framedur_ns = %d)\n",
> >>   				      pipe, (long long) diff_ns, framedur_ns);
> >> +
> >> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
> >> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
> >> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
> >> +				      pipe);
> >> +			diff = 1;
> >> +		}
> >>   	} else {
> >>   		/* some kind of default for drivers w/o accurate vbl timestamping */
> >>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >> --
> >> 1.9.1
> >>
> >

-- 
Ville Syrj�l�
Intel OTC

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
@ 2016-02-09 14:11         ` Ville Syrjälä
  0 siblings, 0 replies; 52+ messages in thread
From: Ville Syrjälä @ 2016-02-09 14:11 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Daniel Vetter, dri-devel, linux, stable, michel, vbabka,
	daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 02:53:30PM +0100, Mario Kleiner wrote:
> On 02/09/2016 11:09 AM, Daniel Vetter wrote:
> > On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
> >> The changes to drm_update_vblank_count() in Linux 4.4 added a
> >> method to emulate a hardware vblank counter by use of high
> >> precision vblank timestamps if a kms driver supports those,
> >> but doesn't suppport hw vblank counters.
> >>
> >> That method assumes that the old timestamp from a previous
> >> invocation is valid, but that is not always the case. E.g.,
> >> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
> >> or drm_update_vblank_count() gets called outside vblank irq and
> >> the high precision timestamping can't deliver a precise timestamp,
> >> ie. drm_get_last_vbltimestamp() delivers a return value of false,
> >> then those functions will initialize the old timestamp to zero
> >> to mark it as invalid.
> >>
> >> A following call to drm_update_vblank_count() would then calculate
> >> elapsed time with vblank irqs off as current vblank timestamp minus
> >> the zero old timestamp and compute a software vblank counter increment
> >> that corresponds to system uptime, causing a large forward jump of the
> >> software vblank counter. That jump in turn can cause too long waits
> >> in drmWaitVblank and very long delays in delivery of vblank events,
> >> resulting in hangs of userspace clients.
> >>
> >> This problem can be observed on nouveau-kms during machine
> >> suspend->resume cycles, where drm_vblank_off is called during
> >> suspend, drm_vblank_on is called during resume and the first
> >> queries to drm_get_last_vbltimestamp() don't deliver high
> >> precision timestamps, resulting in a large harmful counter jump.
> >
> > Why does nouveau enable vblank interrupts before it can get valid
> > timestamps? That sounds like the core bug here, and papering over that in
> > the vblank code feels very wrong to me. Maybe we should instead just not
> > sample the vblank at all if the timestamp fails and splat a big WARN_ON
> > about this, to give driver writers a chance to fix up their mess?
> > -Daniel
> >
> 
> The high precision timestamping is allowed to fail for a kms driver 
> under some conditions which are not implementation errors of the driver, 
> or getting disabled by user override, so we should handle that robustly. 
> We handle it robustly everywhere else in the drm, so we should do it 
> here as well.
> 
> E.g., nouveau generally supports timestamping/scanout position queries, 
> but can't support it on old pre NV-50 hardware with some output type 
> (either on analog VGA, or DVI-D, can't remember atm. which one is 
> unsupported).

I think the surprising thing here is that it fails first and then
succeeds on the same crtc/output combo presumably.

I guess in theory the driver could fail during random times for whatever
reason, though I tend to think that the driver should either make it
robust or not even pretend to support it.

I suppose one failure mode the driver can't really do anything about is
some random SMI crap or something stalling the timestamp queries enough
that we fail the precisions tests and exhaust the retry limits. So yeah,
making it robust against that kind of nastyness sounds line it might be
a good idea. Though perhaps it should be something a bit more severe
than a DRM_DEBUG since I think it really shouldn't happen when the
driver and system are otherwise sane. Of course if it routinely fails
with some driver then simply making it spew errors into dmesg isn't
so nice, unless the driver also gets fixed.

> 
> There are also new Soc drivers showing up which support those methods, 
> but at least i didn't verify or test if their implementations are good 
> enough for the needs of the new drm_udpate_vblank_count() which is more 
> sensitive to kms drivers being even slightly off.
> 
> -mario
> 
> >>
> >> Fix this by checking if the old timestamp used for this calculations
> >> is zero == invalid. If so, perform a counter increment of +1 to
> >> prevent large counter jumps and reinitialize the timestamps to
> >> sane values.
> >>
> >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >> Cc: <stable@vger.kernel.org> # 4.4+
> >> Cc: michel@daenzer.net
> >> Cc: vbabka@suse.cz
> >> Cc: ville.syrjala@linux.intel.com
> >> Cc: daniel.vetter@ffwll.ch
> >> Cc: dri-devel@lists.freedesktop.org
> >> Cc: alexander.deucher@amd.com
> >> Cc: christian.koenig@amd.com
> >> ---
> >>   drivers/gpu/drm/drm_irq.c | 7 +++++++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >> index fb17c45..88bdf19 100644
> >> --- a/drivers/gpu/drm/drm_irq.c
> >> +++ b/drivers/gpu/drm/drm_irq.c
> >> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>   			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>   				      " diff_ns = %lld, framedur_ns = %d)\n",
> >>   				      pipe, (long long) diff_ns, framedur_ns);
> >> +
> >> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
> >> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
> >> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
> >> +				      pipe);
> >> +			diff = 1;
> >> +		}
> >>   	} else {
> >>   		/* some kind of default for drivers w/o accurate vbl timestamping */
> >>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >> --
> >> 1.9.1
> >>
> >

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
  2016-02-09 13:39           ` Mario Kleiner
@ 2016-02-09 14:29             ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 14:29 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Daniel Vetter, Ville Syrjälä,
	dri-devel, linux, stable, michel, vbabka, daniel.vetter,
	alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 02:39:44PM +0100, Mario Kleiner wrote:
> On 02/09/2016 11:23 AM, Daniel Vetter wrote:
> >On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrj�l� wrote:
> >>On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
> >>>On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> >>>>This fixes a regression introduced by the new drm_update_vblank_count()
> >>>>implementation in Linux 4.4:
> >>>>
> >>>>Restrict the bump of the software vblank counter in drm_update_vblank_count()
> >>>>to a safe maximum value of +1 whenever there is the possibility that
> >>>>concurrent readers of vblank timestamps could be active at the moment,
> >>>>as the current implementation of the timestamp caching and updating is
> >>>>not safe against concurrent readers for calls to store_vblank() with a
> >>>>bump of anything but +1. A bump != 1 would very likely return corrupted
> >>>>timestamps to userspace, because the same slot in the cache could
> >>>>be concurrently written by store_vblank() and read by one of those
> >>>>readers in a non-atomic fashion and without the read-retry logic
> >>>>detecting this collision.
> >>>>
> >>>>Concurrent readers can exist while drm_update_vblank_count() is called
> >>>>from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> >>>>irq callers. However, all those calls are happening with the vbl_lock
> >>>>locked thereby preventing a drm_vblank_get(), so the vblank refcount
> >>>>can't increase while drm_update_vblank_count() is executing. Therefore
> >>>>a zero vblank refcount during execution of that function signals that
> >>>>is safe for arbitrary counter bumps if called from outside vblank irq,
> >>>>whereas a non-zero count is not safe.
> >>>>
> >>>>Whenever the function is called from vblank irq, we have to assume concurrent
> >>>>readers could show up any time during its execution, even if the refcount
> >>>>is currently zero, as vblank irqs are usually only enabled due to the
> >>>>presence of readers, and because when it is called from vblank irq it
> >>>>can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> >>>>Therefore also restrict bumps to +1 when the function is called from vblank
> >>>>irq.
> >>>>
> >>>>Such bumps of more than +1 can happen at other times than reenabling
> >>>>vblank irqs, e.g., when regular vblank interrupts get delayed by more
> >>>>than 1 frame due to long held locks, long irq off periods, realtime
> >>>>preemption on RT kernels, or system management interrupts.
> >>>>
> >>>>Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >>>>Cc: <stable@vger.kernel.org> # 4.4+
> >>>>Cc: michel@daenzer.net
> >>>>Cc: vbabka@suse.cz
> >>>>Cc: ville.syrjala@linux.intel.com
> >>>>Cc: daniel.vetter@ffwll.ch
> >>>>Cc: dri-devel@lists.freedesktop.org
> >>>>Cc: alexander.deucher@amd.com
> >>>>Cc: christian.koenig@amd.com
> >>>
> >>>Imo this is duct-tape. If we want to fix this up properly I think we
> >>>should just use a full-blown seqlock instead of our hand-rolled one. And
> >>>that could handle any increment at all.
> >>
> >>And I even fixed this [1] almost a half a year ago when I sent the
> >>original series, but that part got held hostage to the same seqlock
> >>argument. Perfect is the enemy of good.
> >>
> >>[1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html
> >
> >Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
> >patch over Mario's hack here tbh. Your patch with seqlock would be even
> >more awesome.
> >-Daniel
> >
> 
> I agree that my hack is only duct-tape. That's why the long code comment to
> let people know under which condition they could remove it.
> 
> Using seqlocks would be the robust long term solution. But as this is
> supposed to be a fix for both 4.4 and 4.5 i thought that such a rewrite
> would be too intrusive as a change, compared to this one-liner?
> 
> The original "roll our own" seqlock look alike implementation was meant to
> avoid/minimize taking locks, esp. with _irqsave that are taken by both
> userspace and timing sensitive vblank irq handling code. There were various
> locking changes since then and that advantage might have been lost already
> quite a long time ago, so maybe switching to full seqlocks wouldn't pose
> some new performance problems there, but i haven't looked into this.

Last time I've checked we've already reinvented seqlocks completely,
except buggy since ours can't take an increment > 1. I don't expect you'll
be able to measure anything if we switch.

Agree that it might be better to delay this for 4.6. So if you add a big
"FIMXE: Need to replace this hack with proper seqlocks." a the top of your
big comment (or just as a replacement for it), then

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

But currently it looks like this is a proper long-term solution, which it
imo isn't.
-Daniel


> 
> -mario
> 
> >>
> >>>-Daniel
> >>>
> >>>>---
> >>>>  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 41 insertions(+)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >>>>index bcb8528..aa2c74b 100644
> >>>>--- a/drivers/gpu/drm/drm_irq.c
> >>>>+++ b/drivers/gpu/drm/drm_irq.c
> >>>>@@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >>>>  	}
> >>>>
> >>>>+	/*
> >>>>+	 * Restrict the bump of the software vblank counter to a safe maximum
> >>>>+	 * value of +1 whenever there is the possibility that concurrent readers
> >>>>+	 * of vblank timestamps could be active at the moment, as the current
> >>>>+	 * implementation of the timestamp caching and updating is not safe
> >>>>+	 * against concurrent readers for calls to store_vblank() with a bump
> >>>>+	 * of anything but +1. A bump != 1 would very likely return corrupted
> >>>>+	 * timestamps to userspace, because the same slot in the cache could
> >>>>+	 * be concurrently written by store_vblank() and read by one of those
> >>>>+	 * readers without the read-retry logic detecting the collision.
> >>>>+	 *
> >>>>+	 * Concurrent readers can exist when we are called from the
> >>>>+	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> >>>>+	 * irq callers. However, all those calls to us are happening with the
> >>>>+	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> >>>>+	 * can't increase while we are executing. Therefore a zero refcount at
> >>>>+	 * this point is safe for arbitrary counter bumps if we are called
> >>>>+	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> >>>>+	 * we must also accept a refcount of 1, as whenever we are called from
> >>>>+	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> >>>>+	 * we must let that one pass through in order to not lose vblank counts
> >>>>+	 * during vblank irq off - which would completely defeat the whole
> >>>>+	 * point of this routine.
> >>>>+	 *
> >>>>+	 * Whenever we are called from vblank irq, we have to assume concurrent
> >>>>+	 * readers exist or can show up any time during our execution, even if
> >>>>+	 * the refcount is currently zero, as vblank irqs are usually only
> >>>>+	 * enabled due to the presence of readers, and because when we are called
> >>>>+	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> >>>>+	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> >>>>+	 * called from vblank irq.
> >>>>+	 */
> >>>>+	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> >>>>+	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> >>>>+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> >>>>+			      "refcount %u, vblirq %u\n", pipe, diff,
> >>>>+			      atomic_read(&vblank->refcount),
> >>>>+			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> >>>>+		diff = 1;
> >>>>+	}
> >>>>+
> >>>>  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> >>>>  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
> >>>>  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> >>>>--
> >>>>1.9.1
> >>>>
> >>>
> >>>--
> >>>Daniel Vetter
> >>>Software Engineer, Intel Corporation
> >>>http://blog.ffwll.ch
> >>
> >>--
> >>Ville Syrj�l�
> >>Intel OTC
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
@ 2016-02-09 14:29             ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 14:29 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Daniel Vetter, Ville Syrjälä,
	dri-devel, linux, stable, michel, vbabka, daniel.vetter,
	alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 02:39:44PM +0100, Mario Kleiner wrote:
> On 02/09/2016 11:23 AM, Daniel Vetter wrote:
> >On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrjälä wrote:
> >>On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
> >>>On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
> >>>>This fixes a regression introduced by the new drm_update_vblank_count()
> >>>>implementation in Linux 4.4:
> >>>>
> >>>>Restrict the bump of the software vblank counter in drm_update_vblank_count()
> >>>>to a safe maximum value of +1 whenever there is the possibility that
> >>>>concurrent readers of vblank timestamps could be active at the moment,
> >>>>as the current implementation of the timestamp caching and updating is
> >>>>not safe against concurrent readers for calls to store_vblank() with a
> >>>>bump of anything but +1. A bump != 1 would very likely return corrupted
> >>>>timestamps to userspace, because the same slot in the cache could
> >>>>be concurrently written by store_vblank() and read by one of those
> >>>>readers in a non-atomic fashion and without the read-retry logic
> >>>>detecting this collision.
> >>>>
> >>>>Concurrent readers can exist while drm_update_vblank_count() is called
> >>>>from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
> >>>>irq callers. However, all those calls are happening with the vbl_lock
> >>>>locked thereby preventing a drm_vblank_get(), so the vblank refcount
> >>>>can't increase while drm_update_vblank_count() is executing. Therefore
> >>>>a zero vblank refcount during execution of that function signals that
> >>>>is safe for arbitrary counter bumps if called from outside vblank irq,
> >>>>whereas a non-zero count is not safe.
> >>>>
> >>>>Whenever the function is called from vblank irq, we have to assume concurrent
> >>>>readers could show up any time during its execution, even if the refcount
> >>>>is currently zero, as vblank irqs are usually only enabled due to the
> >>>>presence of readers, and because when it is called from vblank irq it
> >>>>can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
> >>>>Therefore also restrict bumps to +1 when the function is called from vblank
> >>>>irq.
> >>>>
> >>>>Such bumps of more than +1 can happen at other times than reenabling
> >>>>vblank irqs, e.g., when regular vblank interrupts get delayed by more
> >>>>than 1 frame due to long held locks, long irq off periods, realtime
> >>>>preemption on RT kernels, or system management interrupts.
> >>>>
> >>>>Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >>>>Cc: <stable@vger.kernel.org> # 4.4+
> >>>>Cc: michel@daenzer.net
> >>>>Cc: vbabka@suse.cz
> >>>>Cc: ville.syrjala@linux.intel.com
> >>>>Cc: daniel.vetter@ffwll.ch
> >>>>Cc: dri-devel@lists.freedesktop.org
> >>>>Cc: alexander.deucher@amd.com
> >>>>Cc: christian.koenig@amd.com
> >>>
> >>>Imo this is duct-tape. If we want to fix this up properly I think we
> >>>should just use a full-blown seqlock instead of our hand-rolled one. And
> >>>that could handle any increment at all.
> >>
> >>And I even fixed this [1] almost a half a year ago when I sent the
> >>original series, but that part got held hostage to the same seqlock
> >>argument. Perfect is the enemy of good.
> >>
> >>[1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html
> >
> >Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
> >patch over Mario's hack here tbh. Your patch with seqlock would be even
> >more awesome.
> >-Daniel
> >
> 
> I agree that my hack is only duct-tape. That's why the long code comment to
> let people know under which condition they could remove it.
> 
> Using seqlocks would be the robust long term solution. But as this is
> supposed to be a fix for both 4.4 and 4.5 i thought that such a rewrite
> would be too intrusive as a change, compared to this one-liner?
> 
> The original "roll our own" seqlock look alike implementation was meant to
> avoid/minimize taking locks, esp. with _irqsave that are taken by both
> userspace and timing sensitive vblank irq handling code. There were various
> locking changes since then and that advantage might have been lost already
> quite a long time ago, so maybe switching to full seqlocks wouldn't pose
> some new performance problems there, but i haven't looked into this.

Last time I've checked we've already reinvented seqlocks completely,
except buggy since ours can't take an increment > 1. I don't expect you'll
be able to measure anything if we switch.

Agree that it might be better to delay this for 4.6. So if you add a big
"FIMXE: Need to replace this hack with proper seqlocks." a the top of your
big comment (or just as a replacement for it), then

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

But currently it looks like this is a proper long-term solution, which it
imo isn't.
-Daniel


> 
> -mario
> 
> >>
> >>>-Daniel
> >>>
> >>>>---
> >>>>  drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 41 insertions(+)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >>>>index bcb8528..aa2c74b 100644
> >>>>--- a/drivers/gpu/drm/drm_irq.c
> >>>>+++ b/drivers/gpu/drm/drm_irq.c
> >>>>@@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >>>>  	}
> >>>>
> >>>>+	/*
> >>>>+	 * Restrict the bump of the software vblank counter to a safe maximum
> >>>>+	 * value of +1 whenever there is the possibility that concurrent readers
> >>>>+	 * of vblank timestamps could be active at the moment, as the current
> >>>>+	 * implementation of the timestamp caching and updating is not safe
> >>>>+	 * against concurrent readers for calls to store_vblank() with a bump
> >>>>+	 * of anything but +1. A bump != 1 would very likely return corrupted
> >>>>+	 * timestamps to userspace, because the same slot in the cache could
> >>>>+	 * be concurrently written by store_vblank() and read by one of those
> >>>>+	 * readers without the read-retry logic detecting the collision.
> >>>>+	 *
> >>>>+	 * Concurrent readers can exist when we are called from the
> >>>>+	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
> >>>>+	 * irq callers. However, all those calls to us are happening with the
> >>>>+	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
> >>>>+	 * can't increase while we are executing. Therefore a zero refcount at
> >>>>+	 * this point is safe for arbitrary counter bumps if we are called
> >>>>+	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
> >>>>+	 * we must also accept a refcount of 1, as whenever we are called from
> >>>>+	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
> >>>>+	 * we must let that one pass through in order to not lose vblank counts
> >>>>+	 * during vblank irq off - which would completely defeat the whole
> >>>>+	 * point of this routine.
> >>>>+	 *
> >>>>+	 * Whenever we are called from vblank irq, we have to assume concurrent
> >>>>+	 * readers exist or can show up any time during our execution, even if
> >>>>+	 * the refcount is currently zero, as vblank irqs are usually only
> >>>>+	 * enabled due to the presence of readers, and because when we are called
> >>>>+	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
> >>>>+	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
> >>>>+	 * called from vblank irq.
> >>>>+	 */
> >>>>+	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
> >>>>+	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
> >>>>+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
> >>>>+			      "refcount %u, vblirq %u\n", pipe, diff,
> >>>>+			      atomic_read(&vblank->refcount),
> >>>>+			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
> >>>>+		diff = 1;
> >>>>+	}
> >>>>+
> >>>>  	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> >>>>  		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
> >>>>  		      pipe, vblank->count, diff, cur_vblank, vblank->last);
> >>>>--
> >>>>1.9.1
> >>>>
> >>>
> >>>--
> >>>Daniel Vetter
> >>>Software Engineer, Intel Corporation
> >>>http://blog.ffwll.ch
> >>
> >>--
> >>Ville Syrjälä
> >>Intel OTC
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
  2016-02-09 13:41           ` Ville Syrjälä
@ 2016-02-09 14:31             ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 14:31 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Mario Kleiner, Daniel Vetter, dri-devel, linux, stable, michel,
	vbabka, daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 03:41:28PM +0200, Ville Syrj�l� wrote:
> On Tue, Feb 09, 2016 at 02:29:49PM +0100, Mario Kleiner wrote:
> > On 02/09/2016 12:10 PM, Ville Syrj�l� wrote:
> > > On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
> > >> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> > >>> drm_vblank_offdelay can have three different types of values:
> > >>>
> > >>> < 0 is to be always treated the same as dev->vblank_disable_immediate
> > >>> = 0 is to be treated as "never disable vblanks"
> > >>>> 0 is to be treated as disable immediate if kms driver wants it
> > >>>      that way via dev->vblank_disable_immediate. Otherwise it is
> > >>>      a disable timeout in msecs.
> > >>>
> > >>> This got broken in Linux 3.18+ for the implementation of
> > >>> drm_vblank_on. If the user specified a value of zero which should
> > >>> always reenable vblank irqs in this function, a kms driver could
> > >>> override the users choice by setting vblank_disable_immediate
> > >>> to true. This patch fixes the regression and keeps the user in
> > >>> control.
> > >>>
> > >>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > >>> Cc: <stable@vger.kernel.org> # 3.18+
> > >>> Cc: michel@daenzer.net
> > >>> Cc: vbabka@suse.cz
> > >>> Cc: ville.syrjala@linux.intel.com
> > >>> Cc: daniel.vetter@ffwll.ch
> > >>> Cc: dri-devel@lists.freedesktop.org
> > >>> Cc: alexander.deucher@amd.com
> > >>> Cc: christian.koenig@amd.com
> > >>> ---
> > >>>   drivers/gpu/drm/drm_irq.c | 4 ++--
> > >>>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > >>> index 5c27ad3..fb17c45 100644
> > >>> --- a/drivers/gpu/drm/drm_irq.c
> > >>> +++ b/drivers/gpu/drm/drm_irq.c
> > >>> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
> > >>>   	 * re-enable interrupts if there are users left, or the
> > >>>   	 * user wishes vblank interrupts to be enabled all the time.
> > >>>   	 */
> > >>> -	if (atomic_read(&vblank->refcount) != 0 ||
> > >>> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> > >>> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> > >>> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
> > >>
> > >> Hm, shouldn't we change this to only enable the vblank irq if we need it,
> > >> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
> > >> it superflously after a modeset, if userspace didn't yet ask for vblank
> > >> timestamps. But then is was specifically added by Ville in cd19e52aee922,
> > >> so I guess someone really wants this.
> > >
> > > IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
> > > case. I think it just ended up as a mess due to changing some of the
> > > semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the
> > > review of the series. So yeah, given how drm_vblank_put() works now, I'd
> > > just make this check for offdelay==0.
> > >
> > >>
> > >> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > >>
> > 
> > I can change that to offdelay==0 only, if you want. It was mostly about 
> > preserving what's there while at the same time fixing the important 
> > offdelay==0 user override.
> 
> Yeah, just offdelay==0 seems best. Otherwise I think we could actually
> leave the interrupt enabled indefinitely w/ offdelay>0 since there's not
> going to be a drm_vblank_put() to arm the disable timer.

Yeah, r-b on that, after Ville clarified the intention of his commit. When
you respin pls cite that commit, and explain why intention/implementation
diverged quickly.

Thanks, Daniel

> 
> > 
> > -mario
> > 
> > >>>   		WARN_ON(drm_vblank_enable(dev, pipe));
> > >>>   	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> > >>>   }
> > >>> --
> > >>> 1.9.1
> > >>>
> > >>
> > >> --
> > >> Daniel Vetter
> > >> Software Engineer, Intel Corporation
> > >> http://blog.ffwll.ch
> > >
> 
> -- 
> Ville Syrj�l�
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on()
@ 2016-02-09 14:31             ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 14:31 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: daniel.vetter, michel, linux, stable, dri-devel,
	alexander.deucher, christian.koenig, vbabka

On Tue, Feb 09, 2016 at 03:41:28PM +0200, Ville Syrjälä wrote:
> On Tue, Feb 09, 2016 at 02:29:49PM +0100, Mario Kleiner wrote:
> > On 02/09/2016 12:10 PM, Ville Syrjälä wrote:
> > > On Tue, Feb 09, 2016 at 11:06:18AM +0100, Daniel Vetter wrote:
> > >> On Mon, Feb 08, 2016 at 02:13:27AM +0100, Mario Kleiner wrote:
> > >>> drm_vblank_offdelay can have three different types of values:
> > >>>
> > >>> < 0 is to be always treated the same as dev->vblank_disable_immediate
> > >>> = 0 is to be treated as "never disable vblanks"
> > >>>> 0 is to be treated as disable immediate if kms driver wants it
> > >>>      that way via dev->vblank_disable_immediate. Otherwise it is
> > >>>      a disable timeout in msecs.
> > >>>
> > >>> This got broken in Linux 3.18+ for the implementation of
> > >>> drm_vblank_on. If the user specified a value of zero which should
> > >>> always reenable vblank irqs in this function, a kms driver could
> > >>> override the users choice by setting vblank_disable_immediate
> > >>> to true. This patch fixes the regression and keeps the user in
> > >>> control.
> > >>>
> > >>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > >>> Cc: <stable@vger.kernel.org> # 3.18+
> > >>> Cc: michel@daenzer.net
> > >>> Cc: vbabka@suse.cz
> > >>> Cc: ville.syrjala@linux.intel.com
> > >>> Cc: daniel.vetter@ffwll.ch
> > >>> Cc: dri-devel@lists.freedesktop.org
> > >>> Cc: alexander.deucher@amd.com
> > >>> Cc: christian.koenig@amd.com
> > >>> ---
> > >>>   drivers/gpu/drm/drm_irq.c | 4 ++--
> > >>>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > >>> index 5c27ad3..fb17c45 100644
> > >>> --- a/drivers/gpu/drm/drm_irq.c
> > >>> +++ b/drivers/gpu/drm/drm_irq.c
> > >>> @@ -1492,8 +1492,8 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
> > >>>   	 * re-enable interrupts if there are users left, or the
> > >>>   	 * user wishes vblank interrupts to be enabled all the time.
> > >>>   	 */
> > >>> -	if (atomic_read(&vblank->refcount) != 0 ||
> > >>> -	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
> > >>> +	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0 ||
> > >>> +	    (!dev->vblank_disable_immediate && drm_vblank_offdelay > 0))
> > >>
> > >> Hm, shouldn't we change this to only enable the vblank irq if we need it,
> > >> i.e. offdelay == 0? For delayed disabling there's kinda no need to enable
> > >> it superflously after a modeset, if userspace didn't yet ask for vblank
> > >> timestamps. But then is was specifically added by Ville in cd19e52aee922,
> > >> so I guess someone really wants this.
> > >
> > > IIRC what I wanted was to just re-enable the interrupt for the offdelay==0
> > > case. I think it just ended up as a mess due to changing some of the
> > > semantics of offdelay<0 vs. offdelay==0 vs. disable_immediate during the
> > > review of the series. So yeah, given how drm_vblank_put() works now, I'd
> > > just make this check for offdelay==0.
> > >
> > >>
> > >> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > >>
> > 
> > I can change that to offdelay==0 only, if you want. It was mostly about 
> > preserving what's there while at the same time fixing the important 
> > offdelay==0 user override.
> 
> Yeah, just offdelay==0 seems best. Otherwise I think we could actually
> leave the interrupt enabled indefinitely w/ offdelay>0 since there's not
> going to be a drm_vblank_put() to arm the disable timer.

Yeah, r-b on that, after Ville clarified the intention of his commit. When
you respin pls cite that commit, and explain why intention/implementation
diverged quickly.

Thanks, Daniel

> 
> > 
> > -mario
> > 
> > >>>   		WARN_ON(drm_vblank_enable(dev, pipe));
> > >>>   	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> > >>>   }
> > >>> --
> > >>> 1.9.1
> > >>>
> > >>
> > >> --
> > >> Daniel Vetter
> > >> Software Engineer, Intel Corporation
> > >> http://blog.ffwll.ch
> > >
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-09 14:11         ` Ville Syrjälä
@ 2016-02-09 15:03           ` Daniel Vetter
  -1 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 15:03 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Mario Kleiner, Daniel Vetter, dri-devel, linux, stable, michel,
	vbabka, daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 04:11:49PM +0200, Ville Syrj�l� wrote:
> On Tue, Feb 09, 2016 at 02:53:30PM +0100, Mario Kleiner wrote:
> > On 02/09/2016 11:09 AM, Daniel Vetter wrote:
> > > On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
> > >> The changes to drm_update_vblank_count() in Linux 4.4 added a
> > >> method to emulate a hardware vblank counter by use of high
> > >> precision vblank timestamps if a kms driver supports those,
> > >> but doesn't suppport hw vblank counters.
> > >>
> > >> That method assumes that the old timestamp from a previous
> > >> invocation is valid, but that is not always the case. E.g.,
> > >> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
> > >> or drm_update_vblank_count() gets called outside vblank irq and
> > >> the high precision timestamping can't deliver a precise timestamp,
> > >> ie. drm_get_last_vbltimestamp() delivers a return value of false,
> > >> then those functions will initialize the old timestamp to zero
> > >> to mark it as invalid.
> > >>
> > >> A following call to drm_update_vblank_count() would then calculate
> > >> elapsed time with vblank irqs off as current vblank timestamp minus
> > >> the zero old timestamp and compute a software vblank counter increment
> > >> that corresponds to system uptime, causing a large forward jump of the
> > >> software vblank counter. That jump in turn can cause too long waits
> > >> in drmWaitVblank and very long delays in delivery of vblank events,
> > >> resulting in hangs of userspace clients.
> > >>
> > >> This problem can be observed on nouveau-kms during machine
> > >> suspend->resume cycles, where drm_vblank_off is called during
> > >> suspend, drm_vblank_on is called during resume and the first
> > >> queries to drm_get_last_vbltimestamp() don't deliver high
> > >> precision timestamps, resulting in a large harmful counter jump.
> > >
> > > Why does nouveau enable vblank interrupts before it can get valid
> > > timestamps? That sounds like the core bug here, and papering over that in
> > > the vblank code feels very wrong to me. Maybe we should instead just not
> > > sample the vblank at all if the timestamp fails and splat a big WARN_ON
> > > about this, to give driver writers a chance to fix up their mess?
> > > -Daniel
> > >
> > 
> > The high precision timestamping is allowed to fail for a kms driver 
> > under some conditions which are not implementation errors of the driver, 
> > or getting disabled by user override, so we should handle that robustly. 
> > We handle it robustly everywhere else in the drm, so we should do it 
> > here as well.
> > 
> > E.g., nouveau generally supports timestamping/scanout position queries, 
> > but can't support it on old pre NV-50 hardware with some output type 
> > (either on analog VGA, or DVI-D, can't remember atm. which one is 
> > unsupported).
> 
> I think the surprising thing here is that it fails first and then
> succeeds on the same crtc/output combo presumably.

Yeah exactly this. Failing consistently is ok imo (and probably should be
handled). Failing first and then later on working (without changing the
mode config in between) seems suspicous. That shouldn't ever happen really
and seems like a driver bug (like enabling vblanks too early, before the
pipe is fully up&running).
-Daniel

> 
> I guess in theory the driver could fail during random times for whatever
> reason, though I tend to think that the driver should either make it
> robust or not even pretend to support it.
> 
> I suppose one failure mode the driver can't really do anything about is
> some random SMI crap or something stalling the timestamp queries enough
> that we fail the precisions tests and exhaust the retry limits. So yeah,
> making it robust against that kind of nastyness sounds line it might be
> a good idea. Though perhaps it should be something a bit more severe
> than a DRM_DEBUG since I think it really shouldn't happen when the
> driver and system are otherwise sane. Of course if it routinely fails
> with some driver then simply making it spew errors into dmesg isn't
> so nice, unless the driver also gets fixed.
> 
> > 
> > There are also new Soc drivers showing up which support those methods, 
> > but at least i didn't verify or test if their implementations are good 
> > enough for the needs of the new drm_udpate_vblank_count() which is more 
> > sensitive to kms drivers being even slightly off.
> > 
> > -mario
> > 
> > >>
> > >> Fix this by checking if the old timestamp used for this calculations
> > >> is zero == invalid. If so, perform a counter increment of +1 to
> > >> prevent large counter jumps and reinitialize the timestamps to
> > >> sane values.
> > >>
> > >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > >> Cc: <stable@vger.kernel.org> # 4.4+
> > >> Cc: michel@daenzer.net
> > >> Cc: vbabka@suse.cz
> > >> Cc: ville.syrjala@linux.intel.com
> > >> Cc: daniel.vetter@ffwll.ch
> > >> Cc: dri-devel@lists.freedesktop.org
> > >> Cc: alexander.deucher@amd.com
> > >> Cc: christian.koenig@amd.com
> > >> ---
> > >>   drivers/gpu/drm/drm_irq.c | 7 +++++++
> > >>   1 file changed, 7 insertions(+)
> > >>
> > >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > >> index fb17c45..88bdf19 100644
> > >> --- a/drivers/gpu/drm/drm_irq.c
> > >> +++ b/drivers/gpu/drm/drm_irq.c
> > >> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> > >>   			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> > >>   				      " diff_ns = %lld, framedur_ns = %d)\n",
> > >>   				      pipe, (long long) diff_ns, framedur_ns);
> > >> +
> > >> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
> > >> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
> > >> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
> > >> +				      pipe);
> > >> +			diff = 1;
> > >> +		}
> > >>   	} else {
> > >>   		/* some kind of default for drivers w/o accurate vbl timestamping */
> > >>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> > >> --
> > >> 1.9.1
> > >>
> > >
> 
> -- 
> Ville Syrj�l�
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
@ 2016-02-09 15:03           ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-09 15:03 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Mario Kleiner, Daniel Vetter, dri-devel, linux, stable, michel,
	vbabka, daniel.vetter, alexander.deucher, christian.koenig

On Tue, Feb 09, 2016 at 04:11:49PM +0200, Ville Syrjälä wrote:
> On Tue, Feb 09, 2016 at 02:53:30PM +0100, Mario Kleiner wrote:
> > On 02/09/2016 11:09 AM, Daniel Vetter wrote:
> > > On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
> > >> The changes to drm_update_vblank_count() in Linux 4.4 added a
> > >> method to emulate a hardware vblank counter by use of high
> > >> precision vblank timestamps if a kms driver supports those,
> > >> but doesn't suppport hw vblank counters.
> > >>
> > >> That method assumes that the old timestamp from a previous
> > >> invocation is valid, but that is not always the case. E.g.,
> > >> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
> > >> or drm_update_vblank_count() gets called outside vblank irq and
> > >> the high precision timestamping can't deliver a precise timestamp,
> > >> ie. drm_get_last_vbltimestamp() delivers a return value of false,
> > >> then those functions will initialize the old timestamp to zero
> > >> to mark it as invalid.
> > >>
> > >> A following call to drm_update_vblank_count() would then calculate
> > >> elapsed time with vblank irqs off as current vblank timestamp minus
> > >> the zero old timestamp and compute a software vblank counter increment
> > >> that corresponds to system uptime, causing a large forward jump of the
> > >> software vblank counter. That jump in turn can cause too long waits
> > >> in drmWaitVblank and very long delays in delivery of vblank events,
> > >> resulting in hangs of userspace clients.
> > >>
> > >> This problem can be observed on nouveau-kms during machine
> > >> suspend->resume cycles, where drm_vblank_off is called during
> > >> suspend, drm_vblank_on is called during resume and the first
> > >> queries to drm_get_last_vbltimestamp() don't deliver high
> > >> precision timestamps, resulting in a large harmful counter jump.
> > >
> > > Why does nouveau enable vblank interrupts before it can get valid
> > > timestamps? That sounds like the core bug here, and papering over that in
> > > the vblank code feels very wrong to me. Maybe we should instead just not
> > > sample the vblank at all if the timestamp fails and splat a big WARN_ON
> > > about this, to give driver writers a chance to fix up their mess?
> > > -Daniel
> > >
> > 
> > The high precision timestamping is allowed to fail for a kms driver 
> > under some conditions which are not implementation errors of the driver, 
> > or getting disabled by user override, so we should handle that robustly. 
> > We handle it robustly everywhere else in the drm, so we should do it 
> > here as well.
> > 
> > E.g., nouveau generally supports timestamping/scanout position queries, 
> > but can't support it on old pre NV-50 hardware with some output type 
> > (either on analog VGA, or DVI-D, can't remember atm. which one is 
> > unsupported).
> 
> I think the surprising thing here is that it fails first and then
> succeeds on the same crtc/output combo presumably.

Yeah exactly this. Failing consistently is ok imo (and probably should be
handled). Failing first and then later on working (without changing the
mode config in between) seems suspicous. That shouldn't ever happen really
and seems like a driver bug (like enabling vblanks too early, before the
pipe is fully up&running).
-Daniel

> 
> I guess in theory the driver could fail during random times for whatever
> reason, though I tend to think that the driver should either make it
> robust or not even pretend to support it.
> 
> I suppose one failure mode the driver can't really do anything about is
> some random SMI crap or something stalling the timestamp queries enough
> that we fail the precisions tests and exhaust the retry limits. So yeah,
> making it robust against that kind of nastyness sounds line it might be
> a good idea. Though perhaps it should be something a bit more severe
> than a DRM_DEBUG since I think it really shouldn't happen when the
> driver and system are otherwise sane. Of course if it routinely fails
> with some driver then simply making it spew errors into dmesg isn't
> so nice, unless the driver also gets fixed.
> 
> > 
> > There are also new Soc drivers showing up which support those methods, 
> > but at least i didn't verify or test if their implementations are good 
> > enough for the needs of the new drm_udpate_vblank_count() which is more 
> > sensitive to kms drivers being even slightly off.
> > 
> > -mario
> > 
> > >>
> > >> Fix this by checking if the old timestamp used for this calculations
> > >> is zero == invalid. If so, perform a counter increment of +1 to
> > >> prevent large counter jumps and reinitialize the timestamps to
> > >> sane values.
> > >>
> > >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > >> Cc: <stable@vger.kernel.org> # 4.4+
> > >> Cc: michel@daenzer.net
> > >> Cc: vbabka@suse.cz
> > >> Cc: ville.syrjala@linux.intel.com
> > >> Cc: daniel.vetter@ffwll.ch
> > >> Cc: dri-devel@lists.freedesktop.org
> > >> Cc: alexander.deucher@amd.com
> > >> Cc: christian.koenig@amd.com
> > >> ---
> > >>   drivers/gpu/drm/drm_irq.c | 7 +++++++
> > >>   1 file changed, 7 insertions(+)
> > >>
> > >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > >> index fb17c45..88bdf19 100644
> > >> --- a/drivers/gpu/drm/drm_irq.c
> > >> +++ b/drivers/gpu/drm/drm_irq.c
> > >> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> > >>   			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> > >>   				      " diff_ns = %lld, framedur_ns = %d)\n",
> > >>   				      pipe, (long long) diff_ns, framedur_ns);
> > >> +
> > >> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
> > >> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
> > >> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
> > >> +				      pipe);
> > >> +			diff = 1;
> > >> +		}
> > >>   	} else {
> > >>   		/* some kind of default for drivers w/o accurate vbl timestamping */
> > >>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> > >> --
> > >> 1.9.1
> > >>
> > >
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
  2016-02-09 14:29             ` Daniel Vetter
@ 2016-02-09 16:18               ` Mario Kleiner
  -1 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 16:18 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ville Syrjälä,
	dri-devel, linux, stable, michel, vbabka, daniel.vetter,
	alexander.deucher, christian.koenig



On 02/09/2016 03:29 PM, Daniel Vetter wrote:
> On Tue, Feb 09, 2016 at 02:39:44PM +0100, Mario Kleiner wrote:
>> On 02/09/2016 11:23 AM, Daniel Vetter wrote:
>>> On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrj�l� wrote:
>>>> On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
>>>>> On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
>>>>>> This fixes a regression introduced by the new drm_update_vblank_count()
>>>>>> implementation in Linux 4.4:
>>>>>>
>>>>>> Restrict the bump of the software vblank counter in drm_update_vblank_count()
>>>>>> to a safe maximum value of +1 whenever there is the possibility that
>>>>>> concurrent readers of vblank timestamps could be active at the moment,
>>>>>> as the current implementation of the timestamp caching and updating is
>>>>>> not safe against concurrent readers for calls to store_vblank() with a
>>>>>> bump of anything but +1. A bump != 1 would very likely return corrupted
>>>>>> timestamps to userspace, because the same slot in the cache could
>>>>>> be concurrently written by store_vblank() and read by one of those
>>>>>> readers in a non-atomic fashion and without the read-retry logic
>>>>>> detecting this collision.
>>>>>>
>>>>>> Concurrent readers can exist while drm_update_vblank_count() is called
>>>>> >from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
>>>>>> irq callers. However, all those calls are happening with the vbl_lock
>>>>>> locked thereby preventing a drm_vblank_get(), so the vblank refcount
>>>>>> can't increase while drm_update_vblank_count() is executing. Therefore
>>>>>> a zero vblank refcount during execution of that function signals that
>>>>>> is safe for arbitrary counter bumps if called from outside vblank irq,
>>>>>> whereas a non-zero count is not safe.
>>>>>>
>>>>>> Whenever the function is called from vblank irq, we have to assume concurrent
>>>>>> readers could show up any time during its execution, even if the refcount
>>>>>> is currently zero, as vblank irqs are usually only enabled due to the
>>>>>> presence of readers, and because when it is called from vblank irq it
>>>>>> can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
>>>>>> Therefore also restrict bumps to +1 when the function is called from vblank
>>>>>> irq.
>>>>>>
>>>>>> Such bumps of more than +1 can happen at other times than reenabling
>>>>>> vblank irqs, e.g., when regular vblank interrupts get delayed by more
>>>>>> than 1 frame due to long held locks, long irq off periods, realtime
>>>>>> preemption on RT kernels, or system management interrupts.
>>>>>>
>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>>> Cc: <stable@vger.kernel.org> # 4.4+
>>>>>> Cc: michel@daenzer.net
>>>>>> Cc: vbabka@suse.cz
>>>>>> Cc: ville.syrjala@linux.intel.com
>>>>>> Cc: daniel.vetter@ffwll.ch
>>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>>> Cc: alexander.deucher@amd.com
>>>>>> Cc: christian.koenig@amd.com
>>>>>
>>>>> Imo this is duct-tape. If we want to fix this up properly I think we
>>>>> should just use a full-blown seqlock instead of our hand-rolled one. And
>>>>> that could handle any increment at all.
>>>>
>>>> And I even fixed this [1] almost a half a year ago when I sent the
>>>> original series, but that part got held hostage to the same seqlock
>>>> argument. Perfect is the enemy of good.
>>>>
>>>> [1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html
>>>
>>> Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
>>> patch over Mario's hack here tbh. Your patch with seqlock would be even
>>> more awesome.
>>> -Daniel
>>>
>>
>> I agree that my hack is only duct-tape. That's why the long code comment to
>> let people know under which condition they could remove it.
>>
>> Using seqlocks would be the robust long term solution. But as this is
>> supposed to be a fix for both 4.4 and 4.5 i thought that such a rewrite
>> would be too intrusive as a change, compared to this one-liner?
>>
>> The original "roll our own" seqlock look alike implementation was meant to
>> avoid/minimize taking locks, esp. with _irqsave that are taken by both
>> userspace and timing sensitive vblank irq handling code. There were various
>> locking changes since then and that advantage might have been lost already
>> quite a long time ago, so maybe switching to full seqlocks wouldn't pose
>> some new performance problems there, but i haven't looked into this.
>
> Last time I've checked we've already reinvented seqlocks completely,
> except buggy since ours can't take an increment > 1. I don't expect you'll
> be able to measure anything if we switch.
>
> Agree that it might be better to delay this for 4.6. So if you add a big
> "FIMXE: Need to replace this hack with proper seqlocks." a the top of your
> big comment (or just as a replacement for it), then
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> But currently it looks like this is a proper long-term solution, which it
> imo isn't.
> -Daniel
>

Ok, will do.
-mario

>
>>
>> -mario
>>
>>>>
>>>>> -Daniel
>>>>>
>>>>>> ---
>>>>>>   drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>>>   1 file changed, 41 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>>>> index bcb8528..aa2c74b 100644
>>>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>>> @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>>>   	}
>>>>>>
>>>>>> +	/*
>>>>>> +	 * Restrict the bump of the software vblank counter to a safe maximum
>>>>>> +	 * value of +1 whenever there is the possibility that concurrent readers
>>>>>> +	 * of vblank timestamps could be active at the moment, as the current
>>>>>> +	 * implementation of the timestamp caching and updating is not safe
>>>>>> +	 * against concurrent readers for calls to store_vblank() with a bump
>>>>>> +	 * of anything but +1. A bump != 1 would very likely return corrupted
>>>>>> +	 * timestamps to userspace, because the same slot in the cache could
>>>>>> +	 * be concurrently written by store_vblank() and read by one of those
>>>>>> +	 * readers without the read-retry logic detecting the collision.
>>>>>> +	 *
>>>>>> +	 * Concurrent readers can exist when we are called from the
>>>>>> +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
>>>>>> +	 * irq callers. However, all those calls to us are happening with the
>>>>>> +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
>>>>>> +	 * can't increase while we are executing. Therefore a zero refcount at
>>>>>> +	 * this point is safe for arbitrary counter bumps if we are called
>>>>>> +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
>>>>>> +	 * we must also accept a refcount of 1, as whenever we are called from
>>>>>> +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
>>>>>> +	 * we must let that one pass through in order to not lose vblank counts
>>>>>> +	 * during vblank irq off - which would completely defeat the whole
>>>>>> +	 * point of this routine.
>>>>>> +	 *
>>>>>> +	 * Whenever we are called from vblank irq, we have to assume concurrent
>>>>>> +	 * readers exist or can show up any time during our execution, even if
>>>>>> +	 * the refcount is currently zero, as vblank irqs are usually only
>>>>>> +	 * enabled due to the presence of readers, and because when we are called
>>>>>> +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
>>>>>> +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
>>>>>> +	 * called from vblank irq.
>>>>>> +	 */
>>>>>> +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
>>>>>> +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
>>>>>> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
>>>>>> +			      "refcount %u, vblirq %u\n", pipe, diff,
>>>>>> +			      atomic_read(&vblank->refcount),
>>>>>> +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
>>>>>> +		diff = 1;
>>>>>> +	}
>>>>>> +
>>>>>>   	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
>>>>>>   		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
>>>>>>   		      pipe, vblank->count, diff, cur_vblank, vblank->last);
>>>>>> --
>>>>>> 1.9.1
>>>>>>
>>>>>
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> http://blog.ffwll.ch
>>>>
>>>> --
>>>> Ville Syrj�l�
>>>> Intel OTC
>>>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients.
@ 2016-02-09 16:18               ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-09 16:18 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, linux, stable, dri-devel,
	alexander.deucher, christian.koenig, vbabka



On 02/09/2016 03:29 PM, Daniel Vetter wrote:
> On Tue, Feb 09, 2016 at 02:39:44PM +0100, Mario Kleiner wrote:
>> On 02/09/2016 11:23 AM, Daniel Vetter wrote:
>>> On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrjälä wrote:
>>>> On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote:
>>>>> On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote:
>>>>>> This fixes a regression introduced by the new drm_update_vblank_count()
>>>>>> implementation in Linux 4.4:
>>>>>>
>>>>>> Restrict the bump of the software vblank counter in drm_update_vblank_count()
>>>>>> to a safe maximum value of +1 whenever there is the possibility that
>>>>>> concurrent readers of vblank timestamps could be active at the moment,
>>>>>> as the current implementation of the timestamp caching and updating is
>>>>>> not safe against concurrent readers for calls to store_vblank() with a
>>>>>> bump of anything but +1. A bump != 1 would very likely return corrupted
>>>>>> timestamps to userspace, because the same slot in the cache could
>>>>>> be concurrently written by store_vblank() and read by one of those
>>>>>> readers in a non-atomic fashion and without the read-retry logic
>>>>>> detecting this collision.
>>>>>>
>>>>>> Concurrent readers can exist while drm_update_vblank_count() is called
>>>>> >from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
>>>>>> irq callers. However, all those calls are happening with the vbl_lock
>>>>>> locked thereby preventing a drm_vblank_get(), so the vblank refcount
>>>>>> can't increase while drm_update_vblank_count() is executing. Therefore
>>>>>> a zero vblank refcount during execution of that function signals that
>>>>>> is safe for arbitrary counter bumps if called from outside vblank irq,
>>>>>> whereas a non-zero count is not safe.
>>>>>>
>>>>>> Whenever the function is called from vblank irq, we have to assume concurrent
>>>>>> readers could show up any time during its execution, even if the refcount
>>>>>> is currently zero, as vblank irqs are usually only enabled due to the
>>>>>> presence of readers, and because when it is called from vblank irq it
>>>>>> can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
>>>>>> Therefore also restrict bumps to +1 when the function is called from vblank
>>>>>> irq.
>>>>>>
>>>>>> Such bumps of more than +1 can happen at other times than reenabling
>>>>>> vblank irqs, e.g., when regular vblank interrupts get delayed by more
>>>>>> than 1 frame due to long held locks, long irq off periods, realtime
>>>>>> preemption on RT kernels, or system management interrupts.
>>>>>>
>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>>> Cc: <stable@vger.kernel.org> # 4.4+
>>>>>> Cc: michel@daenzer.net
>>>>>> Cc: vbabka@suse.cz
>>>>>> Cc: ville.syrjala@linux.intel.com
>>>>>> Cc: daniel.vetter@ffwll.ch
>>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>>> Cc: alexander.deucher@amd.com
>>>>>> Cc: christian.koenig@amd.com
>>>>>
>>>>> Imo this is duct-tape. If we want to fix this up properly I think we
>>>>> should just use a full-blown seqlock instead of our hand-rolled one. And
>>>>> that could handle any increment at all.
>>>>
>>>> And I even fixed this [1] almost a half a year ago when I sent the
>>>> original series, but that part got held hostage to the same seqlock
>>>> argument. Perfect is the enemy of good.
>>>>
>>>> [1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html
>>>
>>> Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your
>>> patch over Mario's hack here tbh. Your patch with seqlock would be even
>>> more awesome.
>>> -Daniel
>>>
>>
>> I agree that my hack is only duct-tape. That's why the long code comment to
>> let people know under which condition they could remove it.
>>
>> Using seqlocks would be the robust long term solution. But as this is
>> supposed to be a fix for both 4.4 and 4.5 i thought that such a rewrite
>> would be too intrusive as a change, compared to this one-liner?
>>
>> The original "roll our own" seqlock look alike implementation was meant to
>> avoid/minimize taking locks, esp. with _irqsave that are taken by both
>> userspace and timing sensitive vblank irq handling code. There were various
>> locking changes since then and that advantage might have been lost already
>> quite a long time ago, so maybe switching to full seqlocks wouldn't pose
>> some new performance problems there, but i haven't looked into this.
>
> Last time I've checked we've already reinvented seqlocks completely,
> except buggy since ours can't take an increment > 1. I don't expect you'll
> be able to measure anything if we switch.
>
> Agree that it might be better to delay this for 4.6. So if you add a big
> "FIMXE: Need to replace this hack with proper seqlocks." a the top of your
> big comment (or just as a replacement for it), then
>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> But currently it looks like this is a proper long-term solution, which it
> imo isn't.
> -Daniel
>

Ok, will do.
-mario

>
>>
>> -mario
>>
>>>>
>>>>> -Daniel
>>>>>
>>>>>> ---
>>>>>>   drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>>>   1 file changed, 41 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>>>> index bcb8528..aa2c74b 100644
>>>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>>> @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>>>   	}
>>>>>>
>>>>>> +	/*
>>>>>> +	 * Restrict the bump of the software vblank counter to a safe maximum
>>>>>> +	 * value of +1 whenever there is the possibility that concurrent readers
>>>>>> +	 * of vblank timestamps could be active at the moment, as the current
>>>>>> +	 * implementation of the timestamp caching and updating is not safe
>>>>>> +	 * against concurrent readers for calls to store_vblank() with a bump
>>>>>> +	 * of anything but +1. A bump != 1 would very likely return corrupted
>>>>>> +	 * timestamps to userspace, because the same slot in the cache could
>>>>>> +	 * be concurrently written by store_vblank() and read by one of those
>>>>>> +	 * readers without the read-retry logic detecting the collision.
>>>>>> +	 *
>>>>>> +	 * Concurrent readers can exist when we are called from the
>>>>>> +	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
>>>>>> +	 * irq callers. However, all those calls to us are happening with the
>>>>>> +	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
>>>>>> +	 * can't increase while we are executing. Therefore a zero refcount at
>>>>>> +	 * this point is safe for arbitrary counter bumps if we are called
>>>>>> +	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
>>>>>> +	 * we must also accept a refcount of 1, as whenever we are called from
>>>>>> +	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
>>>>>> +	 * we must let that one pass through in order to not lose vblank counts
>>>>>> +	 * during vblank irq off - which would completely defeat the whole
>>>>>> +	 * point of this routine.
>>>>>> +	 *
>>>>>> +	 * Whenever we are called from vblank irq, we have to assume concurrent
>>>>>> +	 * readers exist or can show up any time during our execution, even if
>>>>>> +	 * the refcount is currently zero, as vblank irqs are usually only
>>>>>> +	 * enabled due to the presence of readers, and because when we are called
>>>>>> +	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
>>>>>> +	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
>>>>>> +	 * called from vblank irq.
>>>>>> +	 */
>>>>>> +	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
>>>>>> +	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
>>>>>> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
>>>>>> +			      "refcount %u, vblirq %u\n", pipe, diff,
>>>>>> +			      atomic_read(&vblank->refcount),
>>>>>> +			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
>>>>>> +		diff = 1;
>>>>>> +	}
>>>>>> +
>>>>>>   	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
>>>>>>   		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
>>>>>>   		      pipe, vblank->count, diff, cur_vblank, vblank->last);
>>>>>> --
>>>>>> 1.9.1
>>>>>>
>>>>>
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> http://blog.ffwll.ch
>>>>
>>>> --
>>>> Ville Syrjälä
>>>> Intel OTC
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-09 15:03           ` Daniel Vetter
@ 2016-02-10 16:28             ` Mario Kleiner
  -1 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-10 16:28 UTC (permalink / raw)
  To: Daniel Vetter, Ville Syrjälä
  Cc: dri-devel, linux, stable, michel, vbabka, daniel.vetter,
	alexander.deucher, christian.koenig

On 02/09/2016 04:03 PM, Daniel Vetter wrote:
> On Tue, Feb 09, 2016 at 04:11:49PM +0200, Ville Syrj�l� wrote:
>> On Tue, Feb 09, 2016 at 02:53:30PM +0100, Mario Kleiner wrote:
>>> On 02/09/2016 11:09 AM, Daniel Vetter wrote:
>>>> On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
>>>>> The changes to drm_update_vblank_count() in Linux 4.4 added a
>>>>> method to emulate a hardware vblank counter by use of high
>>>>> precision vblank timestamps if a kms driver supports those,
>>>>> but doesn't suppport hw vblank counters.
>>>>>
>>>>> That method assumes that the old timestamp from a previous
>>>>> invocation is valid, but that is not always the case. E.g.,
>>>>> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
>>>>> or drm_update_vblank_count() gets called outside vblank irq and
>>>>> the high precision timestamping can't deliver a precise timestamp,
>>>>> ie. drm_get_last_vbltimestamp() delivers a return value of false,
>>>>> then those functions will initialize the old timestamp to zero
>>>>> to mark it as invalid.
>>>>>
>>>>> A following call to drm_update_vblank_count() would then calculate
>>>>> elapsed time with vblank irqs off as current vblank timestamp minus
>>>>> the zero old timestamp and compute a software vblank counter increment
>>>>> that corresponds to system uptime, causing a large forward jump of the
>>>>> software vblank counter. That jump in turn can cause too long waits
>>>>> in drmWaitVblank and very long delays in delivery of vblank events,
>>>>> resulting in hangs of userspace clients.
>>>>>
>>>>> This problem can be observed on nouveau-kms during machine
>>>>> suspend->resume cycles, where drm_vblank_off is called during
>>>>> suspend, drm_vblank_on is called during resume and the first
>>>>> queries to drm_get_last_vbltimestamp() don't deliver high
>>>>> precision timestamps, resulting in a large harmful counter jump.
>>>>
>>>> Why does nouveau enable vblank interrupts before it can get valid
>>>> timestamps? That sounds like the core bug here, and papering over that in
>>>> the vblank code feels very wrong to me. Maybe we should instead just not
>>>> sample the vblank at all if the timestamp fails and splat a big WARN_ON
>>>> about this, to give driver writers a chance to fix up their mess?
>>>> -Daniel
>>>>
>>>
>>> The high precision timestamping is allowed to fail for a kms driver
>>> under some conditions which are not implementation errors of the driver,
>>> or getting disabled by user override, so we should handle that robustly.
>>> We handle it robustly everywhere else in the drm, so we should do it
>>> here as well.
>>>
>>> E.g., nouveau generally supports timestamping/scanout position queries,
>>> but can't support it on old pre NV-50 hardware with some output type
>>> (either on analog VGA, or DVI-D, can't remember atm. which one is
>>> unsupported).
>>
>> I think the surprising thing here is that it fails first and then
>> succeeds on the same crtc/output combo presumably.
>
> Yeah exactly this. Failing consistently is ok imo (and probably should be
> handled). Failing first and then later on working (without changing the
> mode config in between) seems suspicous. That shouldn't ever happen really
> and seems like a driver bug (like enabling vblanks too early, before the
> pipe is fully up&running).
> -Daniel
>
>>
>> I guess in theory the driver could fail during random times for whatever
>> reason, though I tend to think that the driver should either make it
>> robust or not even pretend to support it.
>>
>> I suppose one failure mode the driver can't really do anything about is
>> some random SMI crap or something stalling the timestamp queries enough
>> that we fail the precisions tests and exhaust the retry limits. So yeah,
>> making it robust against that kind of nastyness sounds line it might be
>> a good idea. Though perhaps it should be something a bit more severe
>> than a DRM_DEBUG since I think it really shouldn't happen when the
>> driver and system are otherwise sane. Of course if it routinely fails
>> with some driver then simply making it spew errors into dmesg isn't
>> so nice, unless the driver also gets fixed.
>>

I think i have an idea what might go wrong with nouveau, so i'll see if 
i can add a fixup patch.

There's another scenario where this zero-ts case can be hit. If the 
driver drm_vblank_init()'s - setting all timestamps to zero - and then 
code starts using vblanks (drm_vblank_get()) without drm_vblank_on 
beforehand, which is afaics the case with nouveau. Unless that's 
considered an error as well, we'd need to init the timestamps to 
something non-zero but harmless like 1 usecs at drm_vblank_init() time?
What makes sense as output here? DRM_WARN_ONCE?

-mario

>>>
>>> There are also new Soc drivers showing up which support those methods,
>>> but at least i didn't verify or test if their implementations are good
>>> enough for the needs of the new drm_udpate_vblank_count() which is more
>>> sensitive to kms drivers being even slightly off.
>>>
>>> -mario
>>>
>>>>>
>>>>> Fix this by checking if the old timestamp used for this calculations
>>>>> is zero == invalid. If so, perform a counter increment of +1 to
>>>>> prevent large counter jumps and reinitialize the timestamps to
>>>>> sane values.
>>>>>
>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>> Cc: <stable@vger.kernel.org> # 4.4+
>>>>> Cc: michel@daenzer.net
>>>>> Cc: vbabka@suse.cz
>>>>> Cc: ville.syrjala@linux.intel.com
>>>>> Cc: daniel.vetter@ffwll.ch
>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>> Cc: alexander.deucher@amd.com
>>>>> Cc: christian.koenig@amd.com
>>>>> ---
>>>>>    drivers/gpu/drm/drm_irq.c | 7 +++++++
>>>>>    1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>>> index fb17c45..88bdf19 100644
>>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>    			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>    				      " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>    				      pipe, (long long) diff_ns, framedur_ns);
>>>>> +
>>>>> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
>>>>> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
>>>>> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
>>>>> +				      pipe);
>>>>> +			diff = 1;
>>>>> +		}
>>>>>    	} else {
>>>>>    		/* some kind of default for drivers w/o accurate vbl timestamping */
>>>>>    		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>> --
>>>>> 1.9.1
>>>>>
>>>>
>>
>> --
>> Ville Syrj�l�
>> Intel OTC
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
@ 2016-02-10 16:28             ` Mario Kleiner
  0 siblings, 0 replies; 52+ messages in thread
From: Mario Kleiner @ 2016-02-10 16:28 UTC (permalink / raw)
  To: Daniel Vetter, Ville Syrjälä
  Cc: daniel.vetter, michel, linux, stable, dri-devel,
	alexander.deucher, christian.koenig, vbabka

On 02/09/2016 04:03 PM, Daniel Vetter wrote:
> On Tue, Feb 09, 2016 at 04:11:49PM +0200, Ville Syrjälä wrote:
>> On Tue, Feb 09, 2016 at 02:53:30PM +0100, Mario Kleiner wrote:
>>> On 02/09/2016 11:09 AM, Daniel Vetter wrote:
>>>> On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
>>>>> The changes to drm_update_vblank_count() in Linux 4.4 added a
>>>>> method to emulate a hardware vblank counter by use of high
>>>>> precision vblank timestamps if a kms driver supports those,
>>>>> but doesn't suppport hw vblank counters.
>>>>>
>>>>> That method assumes that the old timestamp from a previous
>>>>> invocation is valid, but that is not always the case. E.g.,
>>>>> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
>>>>> or drm_update_vblank_count() gets called outside vblank irq and
>>>>> the high precision timestamping can't deliver a precise timestamp,
>>>>> ie. drm_get_last_vbltimestamp() delivers a return value of false,
>>>>> then those functions will initialize the old timestamp to zero
>>>>> to mark it as invalid.
>>>>>
>>>>> A following call to drm_update_vblank_count() would then calculate
>>>>> elapsed time with vblank irqs off as current vblank timestamp minus
>>>>> the zero old timestamp and compute a software vblank counter increment
>>>>> that corresponds to system uptime, causing a large forward jump of the
>>>>> software vblank counter. That jump in turn can cause too long waits
>>>>> in drmWaitVblank and very long delays in delivery of vblank events,
>>>>> resulting in hangs of userspace clients.
>>>>>
>>>>> This problem can be observed on nouveau-kms during machine
>>>>> suspend->resume cycles, where drm_vblank_off is called during
>>>>> suspend, drm_vblank_on is called during resume and the first
>>>>> queries to drm_get_last_vbltimestamp() don't deliver high
>>>>> precision timestamps, resulting in a large harmful counter jump.
>>>>
>>>> Why does nouveau enable vblank interrupts before it can get valid
>>>> timestamps? That sounds like the core bug here, and papering over that in
>>>> the vblank code feels very wrong to me. Maybe we should instead just not
>>>> sample the vblank at all if the timestamp fails and splat a big WARN_ON
>>>> about this, to give driver writers a chance to fix up their mess?
>>>> -Daniel
>>>>
>>>
>>> The high precision timestamping is allowed to fail for a kms driver
>>> under some conditions which are not implementation errors of the driver,
>>> or getting disabled by user override, so we should handle that robustly.
>>> We handle it robustly everywhere else in the drm, so we should do it
>>> here as well.
>>>
>>> E.g., nouveau generally supports timestamping/scanout position queries,
>>> but can't support it on old pre NV-50 hardware with some output type
>>> (either on analog VGA, or DVI-D, can't remember atm. which one is
>>> unsupported).
>>
>> I think the surprising thing here is that it fails first and then
>> succeeds on the same crtc/output combo presumably.
>
> Yeah exactly this. Failing consistently is ok imo (and probably should be
> handled). Failing first and then later on working (without changing the
> mode config in between) seems suspicous. That shouldn't ever happen really
> and seems like a driver bug (like enabling vblanks too early, before the
> pipe is fully up&running).
> -Daniel
>
>>
>> I guess in theory the driver could fail during random times for whatever
>> reason, though I tend to think that the driver should either make it
>> robust or not even pretend to support it.
>>
>> I suppose one failure mode the driver can't really do anything about is
>> some random SMI crap or something stalling the timestamp queries enough
>> that we fail the precisions tests and exhaust the retry limits. So yeah,
>> making it robust against that kind of nastyness sounds line it might be
>> a good idea. Though perhaps it should be something a bit more severe
>> than a DRM_DEBUG since I think it really shouldn't happen when the
>> driver and system are otherwise sane. Of course if it routinely fails
>> with some driver then simply making it spew errors into dmesg isn't
>> so nice, unless the driver also gets fixed.
>>

I think i have an idea what might go wrong with nouveau, so i'll see if 
i can add a fixup patch.

There's another scenario where this zero-ts case can be hit. If the 
driver drm_vblank_init()'s - setting all timestamps to zero - and then 
code starts using vblanks (drm_vblank_get()) without drm_vblank_on 
beforehand, which is afaics the case with nouveau. Unless that's 
considered an error as well, we'd need to init the timestamps to 
something non-zero but harmless like 1 usecs at drm_vblank_init() time?
What makes sense as output here? DRM_WARN_ONCE?

-mario

>>>
>>> There are also new Soc drivers showing up which support those methods,
>>> but at least i didn't verify or test if their implementations are good
>>> enough for the needs of the new drm_udpate_vblank_count() which is more
>>> sensitive to kms drivers being even slightly off.
>>>
>>> -mario
>>>
>>>>>
>>>>> Fix this by checking if the old timestamp used for this calculations
>>>>> is zero == invalid. If so, perform a counter increment of +1 to
>>>>> prevent large counter jumps and reinitialize the timestamps to
>>>>> sane values.
>>>>>
>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>> Cc: <stable@vger.kernel.org> # 4.4+
>>>>> Cc: michel@daenzer.net
>>>>> Cc: vbabka@suse.cz
>>>>> Cc: ville.syrjala@linux.intel.com
>>>>> Cc: daniel.vetter@ffwll.ch
>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>> Cc: alexander.deucher@amd.com
>>>>> Cc: christian.koenig@amd.com
>>>>> ---
>>>>>    drivers/gpu/drm/drm_irq.c | 7 +++++++
>>>>>    1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>>> index fb17c45..88bdf19 100644
>>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>    			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>    				      " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>    				      pipe, (long long) diff_ns, framedur_ns);
>>>>> +
>>>>> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
>>>>> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
>>>>> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
>>>>> +				      pipe);
>>>>> +			diff = 1;
>>>>> +		}
>>>>>    	} else {
>>>>>    		/* some kind of default for drivers w/o accurate vbl timestamping */
>>>>>    		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>> --
>>>>> 1.9.1
>>>>>
>>>>
>>
>> --
>> Ville Syrjälä
>> Intel OTC
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-10 16:28             ` Mario Kleiner
  (?)
@ 2016-02-10 17:17             ` Daniel Vetter
  2016-02-10 18:36               ` Mario Kleiner
  -1 siblings, 1 reply; 52+ messages in thread
From: Daniel Vetter @ 2016-02-10 17:17 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Ville Syrjälä,
	dri-devel, linux, stable, Michel Dänzer, Vlastimil Babka,
	alexander.deucher, Christian König

On Wed, Feb 10, 2016 at 5:28 PM, Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
> There's another scenario where this zero-ts case can be hit. If the driver
> drm_vblank_init()'s - setting all timestamps to zero - and then code starts
> using vblanks (drm_vblank_get()) without drm_vblank_on beforehand, which is
> afaics the case with nouveau. Unless that's considered an error as well,
> we'd need to init the timestamps to something non-zero but harmless like 1
> usecs at drm_vblank_init() time?

Both legacy modeset helpers and atomic ones assume by default that you
start out with everything disabled. Pre-atomic drivers make that
happen by calling disable_unused_functions() to shut down anything the
bios has enabled. I think this can't happen.

For drivers that do take over bootloader display config they must call
vblank_on explicitly themselves, which i915 does.

> What makes sense as output here? DRM_WARN_ONCE?

I'd go with WARN_ON and tune it down if it's offensive. But WARN_ON
patch for 4.6 of course.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-10 17:17             ` Daniel Vetter
@ 2016-02-10 18:36               ` Mario Kleiner
  2016-02-10 19:34                   ` Daniel Vetter
  0 siblings, 1 reply; 52+ messages in thread
From: Mario Kleiner @ 2016-02-10 18:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ville Syrjälä,
	dri-devel, linux, stable, Michel Dänzer, Vlastimil Babka,
	alexander.deucher, Christian König

On 02/10/2016 06:17 PM, Daniel Vetter wrote:
> On Wed, Feb 10, 2016 at 5:28 PM, Mario Kleiner
> <mario.kleiner.de@gmail.com> wrote:
>> There's another scenario where this zero-ts case can be hit. If the driver
>> drm_vblank_init()'s - setting all timestamps to zero - and then code starts
>> using vblanks (drm_vblank_get()) without drm_vblank_on beforehand, which is
>> afaics the case with nouveau. Unless that's considered an error as well,
>> we'd need to init the timestamps to something non-zero but harmless like 1
>> usecs at drm_vblank_init() time?
>
> Both legacy modeset helpers and atomic ones assume by default that you
> start out with everything disabled. Pre-atomic drivers make that
> happen by calling disable_unused_functions() to shut down anything the
> bios has enabled. I think this can't happen.
>
> For drivers that do take over bootloader display config they must call
> vblank_on explicitly themselves, which i915 does.
>
>> What makes sense as output here? DRM_WARN_ONCE?
>
> I'd go with WARN_ON and tune it down if it's offensive. But WARN_ON
> patch for 4.6 of course.
> -Daniel
>

Ok, so does this one have your R-b for stable as is? What about a proper 
nouveau fix if i find one? Probably also for 4.6 then, given that this 
patch fixes it up good enough for stable?

-mario

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
  2016-02-10 18:36               ` Mario Kleiner
@ 2016-02-10 19:34                   ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-10 19:34 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Ville Syrjälä,
	dri-devel, linux, stable, Michel Dänzer, Vlastimil Babka,
	alexander.deucher, Christian König

On Wed, Feb 10, 2016 at 7:36 PM, Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
> On 02/10/2016 06:17 PM, Daniel Vetter wrote:
>>
>> On Wed, Feb 10, 2016 at 5:28 PM, Mario Kleiner
>> <mario.kleiner.de@gmail.com> wrote:
>>>
>>> There's another scenario where this zero-ts case can be hit. If the
>>> driver
>>> drm_vblank_init()'s - setting all timestamps to zero - and then code
>>> starts
>>> using vblanks (drm_vblank_get()) without drm_vblank_on beforehand, which
>>> is
>>> afaics the case with nouveau. Unless that's considered an error as well,
>>> we'd need to init the timestamps to something non-zero but harmless like
>>> 1
>>> usecs at drm_vblank_init() time?
>>
>>
>> Both legacy modeset helpers and atomic ones assume by default that you
>> start out with everything disabled. Pre-atomic drivers make that
>> happen by calling disable_unused_functions() to shut down anything the
>> bios has enabled. I think this can't happen.
>>
>> For drivers that do take over bootloader display config they must call
>> vblank_on explicitly themselves, which i915 does.
>>
>>> What makes sense as output here? DRM_WARN_ONCE?
>>
>>
>> I'd go with WARN_ON and tune it down if it's offensive. But WARN_ON
>> patch for 4.6 of course.
>> -Daniel
>>
>
> Ok, so does this one have your R-b for stable as is? What about a proper
> nouveau fix if i find one? Probably also for 4.6 then, given that this patch
> fixes it up good enough for stable?

If possible I'd prefer we just fix nouveau up for stable, and do a
WARN_ON patch when this ever happens for 4.6. It sounded like you've
figured out already what nouveau needs?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
@ 2016-02-10 19:34                   ` Daniel Vetter
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Vetter @ 2016-02-10 19:34 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: Michel Dänzer, linux, stable, dri-devel, alexander.deucher,
	Christian König, Vlastimil Babka

On Wed, Feb 10, 2016 at 7:36 PM, Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
> On 02/10/2016 06:17 PM, Daniel Vetter wrote:
>>
>> On Wed, Feb 10, 2016 at 5:28 PM, Mario Kleiner
>> <mario.kleiner.de@gmail.com> wrote:
>>>
>>> There's another scenario where this zero-ts case can be hit. If the
>>> driver
>>> drm_vblank_init()'s - setting all timestamps to zero - and then code
>>> starts
>>> using vblanks (drm_vblank_get()) without drm_vblank_on beforehand, which
>>> is
>>> afaics the case with nouveau. Unless that's considered an error as well,
>>> we'd need to init the timestamps to something non-zero but harmless like
>>> 1
>>> usecs at drm_vblank_init() time?
>>
>>
>> Both legacy modeset helpers and atomic ones assume by default that you
>> start out with everything disabled. Pre-atomic drivers make that
>> happen by calling disable_unused_functions() to shut down anything the
>> bios has enabled. I think this can't happen.
>>
>> For drivers that do take over bootloader display config they must call
>> vblank_on explicitly themselves, which i915 does.
>>
>>> What makes sense as output here? DRM_WARN_ONCE?
>>
>>
>> I'd go with WARN_ON and tune it down if it's offensive. But WARN_ON
>> patch for 4.6 of course.
>> -Daniel
>>
>
> Ok, so does this one have your R-b for stable as is? What about a proper
> nouveau fix if i find one? Probably also for 4.6 then, given that this patch
> fixes it up good enough for stable?

If possible I'd prefer we just fix nouveau up for stable, and do a
WARN_ON patch when this ever happens for 4.6. It sounded like you've
figured out already what nouveau needs?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-08  1:13   ` Mario Kleiner
  (?)
  (?)
@ 2016-02-11 13:03   ` Vlastimil Babka
  -1 siblings, 0 replies; 52+ messages in thread
From: Vlastimil Babka @ 2016-02-11 13:03 UTC (permalink / raw)
  To: Mario Kleiner, dri-devel
  Cc: linux, stable, michel, ville.syrjala, daniel.vetter,
	alexander.deucher, christian.koenig

On 02/08/2016 02:13 AM, Mario Kleiner wrote:
> Changes to drm_update_vblank_count() in Linux 4.4 broke the
> behaviour of the pre/post modeset functions as the new update
> code doesn't deal with hw vblank counter resets inbetween calls
> to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
> should.
>
> This causes mistreatment of such hw counter resets as counter
> wraparound, and thereby large forward jumps of the software
> vblank counter which in turn cause vblank event dispatching
> and vblank waits to fail/hang --> userspace clients hang.
>
> This symptom was reported on radeon-kms to cause a infinite
> hang of KDE Plasma 5 shell's login procedure, preventing users
> from logging in.
>
> Fix this by detecting when drm_update_vblank_count() is called
> inside a pre->post modeset interval. If so, clamp valid vblank
> increments to the safe values 0 and 1, pretty much restoring
> the update behavior of the old update code of Linux 4.3 and
> earlier. Also reset the last recorded hw vblank count at call
> to drm_vblank_post_modeset() to be safe against hw that after
> modesetting, dpms on etc. only fires its first vblank irq after
> drm_vblank_post_modeset() was already called.
>
> Reported-by: Vlastimil Babka <vbabka@suse.cz>

FWIW, I've applied the whole patchset to 4.4 and the kde5 login problem 
didn't occur. I can test the next version too.

Thanks,
Vlastimil


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-02-11 13:03 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
2016-02-08  1:13 ` [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off() Mario Kleiner
2016-02-08  1:13   ` Mario Kleiner
2016-02-09  9:54   ` Daniel Vetter
2016-02-09  9:54     ` Daniel Vetter
2016-02-09 13:27     ` Mario Kleiner
2016-02-08  1:13 ` [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients Mario Kleiner
2016-02-08  1:13   ` Mario Kleiner
2016-02-09  9:56   ` Daniel Vetter
2016-02-09  9:56     ` Daniel Vetter
2016-02-09 10:07     ` Ville Syrjälä
2016-02-09 10:07       ` Ville Syrjälä
2016-02-09 10:23       ` Daniel Vetter
2016-02-09 10:23         ` Daniel Vetter
2016-02-09 13:39         ` Mario Kleiner
2016-02-09 13:39           ` Mario Kleiner
2016-02-09 14:29           ` Daniel Vetter
2016-02-09 14:29             ` Daniel Vetter
2016-02-09 16:18             ` Mario Kleiner
2016-02-09 16:18               ` Mario Kleiner
2016-02-08  1:13 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
2016-02-08  1:13   ` Mario Kleiner
2016-02-09 10:00   ` Daniel Vetter
2016-02-11 13:03   ` Vlastimil Babka
2016-02-08  1:13 ` [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() Mario Kleiner
2016-02-08  1:13   ` Mario Kleiner
2016-02-09 10:06   ` Daniel Vetter
2016-02-09 10:06     ` Daniel Vetter
2016-02-09 11:10     ` Ville Syrjälä
2016-02-09 11:10       ` Ville Syrjälä
2016-02-09 13:29       ` Mario Kleiner
2016-02-09 13:29         ` Mario Kleiner
2016-02-09 13:41         ` Ville Syrjälä
2016-02-09 13:41           ` Ville Syrjälä
2016-02-09 14:31           ` Daniel Vetter
2016-02-09 14:31             ` Daniel Vetter
2016-02-08  1:13 ` [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method Mario Kleiner
2016-02-08  1:13   ` Mario Kleiner
2016-02-09 10:09   ` Daniel Vetter
2016-02-09 13:53     ` Mario Kleiner
2016-02-09 14:11       ` Ville Syrjälä
2016-02-09 14:11         ` Ville Syrjälä
2016-02-09 15:03         ` Daniel Vetter
2016-02-09 15:03           ` Daniel Vetter
2016-02-10 16:28           ` Mario Kleiner
2016-02-10 16:28             ` Mario Kleiner
2016-02-10 17:17             ` Daniel Vetter
2016-02-10 18:36               ` Mario Kleiner
2016-02-10 19:34                 ` Daniel Vetter
2016-02-10 19:34                   ` Daniel Vetter
2016-02-08  1:13 ` [PATCH 6/6] drm/radeon/pm: Handle failure of drm_vblank_get Mario Kleiner
2016-02-09 10:10   ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.