From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751756AbcGRPmG (ORCPT <rfc822;w@1wt.eu>);
	Mon, 18 Jul 2016 11:42:06 -0400
Received: from mx1.redhat.com ([209.132.183.28]:48585 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751425AbcGRPmD (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 18 Jul 2016 11:42:03 -0400
From: Lyude <cpaul@redhat.com>
To: amd-gfx@lists.freedesktop.org
Cc: Lyude <cpaul@redhat.com>, stable@vger.kernel.org,
        Alex Deucher <alexdeucher@gmail.com>,
        Alex Deucher <alexander.deucher@amd.com>,
        =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>,
        David Airlie <airlied@linux.ie>, Chunming Zhou <David1.Zhou@amd.com>,
        Jammy Zhou <Jammy.Zhou@amd.com>, Monk Liu <Monk.Liu@amd.com>,
        Flora Cui <Flora.Cui@amd.com>, Tom St Denis <tom.stdenis@amd.com>,
        dri-devel@lists.freedesktop.org (open list:RADEON and AMDGPU DRM
	DRIVERS),
        linux-kernel@vger.kernel.org (open list)
Subject: [PATCH v2] drm/amdgpu: Disable RPM helpers while reprobing connectors on resume
Date: Mon, 18 Jul 2016 11:41:37 -0400
Message-Id: <1468856499-19246-1-git-send-email-cpaul@redhat.com>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 18 Jul 2016 15:42:02 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Just about all of amdgpu's connector probing functions try to acquire
runtime PM refs. If we try to do this in the context of
amdgpu_resume_kms by calling drm_helper_hpd_irq_event(), we end up
deadlocking the system.

Since we're guaranteed to be holding the spinlock for RPM in
amdgpu_resume_kms, and we already know the GPU is in working order, we
need to prevent the RPM helpers from trying to run during the initial
connector reprobe on resume.

There's a couple of solutions I've explored for fixing this, but this
one by far seems to be the simplest and most reliable (plus I'm pretty
sure that's what disable_depth is there for anyway).

Reproduction recipe:
  - Get any laptop dual GPUs using PRIME
  - Make sure runtime PM is enabled for amdgpu
  - Boot the machine
  - If the machine managed to boot without hanging, switch out of X to
    another VT. This should definitely cause X to hang infinitely.

Changes since v1:
  - add appropriate #ifdef checks for CONFIG_PM. This is not very
    useful, but it appears some kernel test suites test compiling amdgpu
    with CONFIG_PM disabled, which results in this patch breaking the builds
    if we don't include this #ifdef

Cc: stable@vger.kernel.org
Cc: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Lyude <cpaul@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6e92008..b7f5650 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1841,7 +1841,23 @@ int amdgpu_resume_kms(struct drm_device *dev, bool resume, bool fbcon)
 	}
 
 	drm_kms_helper_poll_enable(dev);
+
+	/*
+	 * Most of the connector probing functions try to acquire runtime pm
+	 * refs to ensure that the GPU is powered on when connector polling is
+	 * performed. Since we're calling this from a runtime PM callback,
+	 * trying to acquire rpm refs will cause us to deadlock.
+	 *
+	 * Since we're guaranteed to be holding the rpm lock, it's safe to
+	 * temporarily disable the rpm helpers so this doesn't deadlock us.
+	 */
+#ifdef CONFIG_PM
+	dev->dev->power.disable_depth++;
+#endif
 	drm_helper_hpd_irq_event(dev);
+#ifdef CONFIG_PM
+	dev->dev->power.disable_depth--;
+#endif
 
 	if (fbcon) {
 		amdgpu_fbdev_set_suspend(adev, 0);
-- 
2.7.4