From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933148AbcLSMQV (ORCPT ); Mon, 19 Dec 2016 07:16:21 -0500 Received: from fllnx209.ext.ti.com ([198.47.19.16]:46649 "EHLO fllnx209.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932298AbcLSMQN (ORCPT ); Mon, 19 Dec 2016 07:16:13 -0500 Subject: Re: [PATCH v2] drm: drm_probe_helper: Fix output_poll_work scheduling To: Jani Nikula , Daniel Vetter References: <20160831110905.31289-1-peter.ujfalusi@ti.com> <20160831112317.GC20761@phenom.ffwll.local> <871sx4nt1e.fsf@intel.com> CC: , , , , "Tahvanainen, Jari" From: Peter Ujfalusi Message-ID: <486b1f53-afd3-d789-0a2e-aa50daad71a2@ti.com> Date: Mon, 19 Dec 2016 14:15:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <871sx4nt1e.fsf@intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/19/2016 11:54 AM, Jani Nikula wrote: > On Wed, 31 Aug 2016, Daniel Vetter wrote: >> On Wed, Aug 31, 2016 at 02:09:05PM +0300, Peter Ujfalusi wrote: >>> drm_kms_helper_poll_enable_locked() should check if we have delayed event >>> pending and if we have, schedule the work to run without delay. >>> >>> Currently the output_poll_work is only scheduled if any of the connectors >>> have DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT with >>> DRM_OUTPUT_POLL_PERIOD delay. It does not matter if we have delayed event >>> already registered to be handled. The detection will be delayd by >>> DRM_OUTPUT_POLL_PERIOD in any case. >>> Furthermore if none of the connectors are marked as POLL_CONNECT or >>> POLL_DISCONNECT because all connectors are either POLL_HPD or they are >>> always connected: the output_poll_work will not run at all even if we >>> have delayed event marked. >>> >>> When none of the connectors require polling, their initial status change >>> from unknown to connected/disconnected is not going to be handled until >>> the first kms application starts or if we have fb console enabled. >>> >>> Signed-off-by: Peter Ujfalusi >>> --- >>> Hi, >>> >>> Changes since v1: >>> - dropped the last paragraph from the commit message. >> >> I added a few more words to the commit message to explain when exactly >> this is a problem and applied your patch to drm-misc. > > Hi Peter, sadly looks like this regresses users out there [1]. Seems to > be a reliable bisect. We need to have this fixed or reverted. When I sent the patch I did booted my laptop (with Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)) and my old desktop with some nVidia Quadro using nouveau stack and seen no issue. But with 4.9 now on my Dell laptop I see lots of warnings during boot: [ 0.573119] ------------[ cut here ]------------ [ 0.573127] WARNING: CPU: 3 PID: 874 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1d5/0x210 [ 0.573132] WARN_ON(!msg->buffer != !msg->size) [ 0.573134] Modules linked in: [ 0.573140] CPU: 3 PID: 874 Comm: kworker/u8:3 Not tainted 4.9.0-gentoo #1 [ 0.573143] Hardware name: Dell Inc. Latitude E7440/06MFX3, BIOS A14 02/02/2015 [ 0.573150] Workqueue: events_unbound async_run_entry_fn [ 0.573154] ffffc900021afaf0 ffffffff813c3bad ffffc900021afb40 0000000000000000 [ 0.573160] ffffc900021afb30 ffffffff8106a596 0000042614b82908 ffffc900021afc00 [ 0.573166] ffff8802144040e0 0000000000000003 0000000000000000 ffff880214404158 [ 0.573172] Call Trace: [ 0.573178] [] dump_stack+0x4f/0x72 [ 0.573182] [] __warn+0xc6/0xe0 [ 0.573185] [] warn_slowpath_fmt+0x4a/0x50 [ 0.573189] [] ? intel_dp_aux_transfer+0xc9/0x210 [ 0.573193] [] intel_dp_aux_transfer+0x1d5/0x210 [ 0.573198] [] ? _raw_write_unlock_irqrestore+0x13/0x30 [ 0.573202] [] ? _raw_spin_unlock_irqrestore+0x9/0x10 [ 0.573207] [] drm_dp_dpcd_access+0x58/0xf0 [ 0.573210] [] drm_dp_dpcd_write+0x16/0x20 [ 0.573214] [] intel_dp_start_link_train+0x2b3/0x4a0 [ 0.573218] [] intel_dp_check_link_status+0xb2/0xf0 [ 0.573222] [] intel_dp_detect+0x7d6/0xb40 [ 0.573226] [] drm_helper_probe_single_connector_modes+0x41b/0x4e0 [ 0.573233] [] drm_fb_helper_initial_config+0x7c/0x3f0 [ 0.573237] [] ? _raw_spin_unlock_irq+0x9/0x10 [ 0.573242] [] intel_fbdev_initial_config+0x13/0x30 [ 0.573245] [] async_run_entry_fn+0x32/0xe0 [ 0.573249] [] process_one_work+0x148/0x4c0 [ 0.573253] [] worker_thread+0x43/0x4e0 [ 0.573257] [] ? process_one_work+0x4c0/0x4c0 [ 0.573260] [] ? process_one_work+0x4c0/0x4c0 [ 0.573264] [] ? call_usermodehelper_exec_async+0x137/0x140 [ 0.573269] [] kthread+0xc5/0xe0 [ 0.573273] [] ? kthread_park+0x60/0x60 [ 0.573277] [] ? umh_complete+0x40/0x40 [ 0.573280] [] ret_from_fork+0x22/0x30 [ 0.573285] ---[ end trace a544f5d689389b41 ]--- If I revert the patch in question I have tons of: [ 0.569127] ------------[ cut here ]------------ [ 0.569136] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1d5/0x210 [ 0.569141] WARN_ON(!msg->buffer != !msg->size) [ 0.569143] Modules linked in: [ 0.569149] CPU: 3 PID: 564 Comm: kworker/3:2 Not tainted 4.9.0-gentoo #2 [ 0.569152] Hardware name: Dell Inc. Latitude E7440/06MFX3, BIOS A14 02/02/2015 [ 0.569159] Workqueue: events i915_hotplug_work_func [ 0.569164] ffffc90001a93ba0 ffffffff813c3bad ffffc90001a93bf0 0000000000000000 [ 0.569174] ffffc90001a93be0 ffffffff8106a596 0000042614b12908 ffffc90001a93cb0 [ 0.569184] ffff88021482b0e0 0000000000000003 0000000000000000 ffff88021482b158 [ 0.569191] Call Trace: [ 0.569198] [] dump_stack+0x4f/0x72 [ 0.569205] [] __warn+0xc6/0xe0 [ 0.569210] [] warn_slowpath_fmt+0x4a/0x50 [ 0.569214] [] ? intel_dp_aux_transfer+0xc9/0x210 [ 0.569218] [] intel_dp_aux_transfer+0x1d5/0x210 [ 0.569223] [] ? _raw_write_unlock_irqrestore+0x13/0x30 [ 0.569224] loop: module loaded [ 0.569233] [] ? _raw_spin_unlock_irqrestore+0x9/0x10 [ 0.569241] [] drm_dp_dpcd_access+0x58/0xf0 [ 0.569248] [] drm_dp_dpcd_write+0x16/0x20 [ 0.569254] [] intel_dp_start_link_train+0x2b3/0x4a0 [ 0.569261] [] intel_dp_check_link_status+0xb2/0xf0 [ 0.569268] [] intel_dp_detect+0x7d6/0xb40 [ 0.569275] [] i915_hotplug_work_func+0x1d9/0x2a0 [ 0.569283] [] process_one_work+0x148/0x4c0 [ 0.569290] [] worker_thread+0x43/0x4e0 [ 0.569296] [] ? process_one_work+0x4c0/0x4c0 [ 0.569302] [] ? process_one_work+0x4c0/0x4c0 [ 0.569309] [] kthread+0xc5/0xe0 [ 0.569316] [] ? kthread_park+0x60/0x60 [ 0.569322] [] ret_from_fork+0x22/0x30 [ 0.569330] ---[ end trace bc0abba135d2cb5d ]--- So the behaviour is changed, no question about it. But I still believe that the patch itself fixes a valid bug (or shortcoming). atm I don't see how it can lock the kernel. The only thing the patch does is to schedule the output_poll_work w/o delay if we already have delayed_event to reduce the time to handle pending event(s) when the poll is enabled for the first time. >>From the log it is not clear for me why Xorg would lock up, but we have intel and nouveau drivers probed, it might be because we have two GPUs? > > BR, > Jani. > > > [1] https://bugs.freedesktop.org/show_bug.cgi?id=98690 > > > >> >> Thanks, Daniel >>> >>> Regards, >>> Peter >>> >>> drivers/gpu/drm/drm_probe_helper.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c >>> index a0df377d7d1c..f6b64d7d3528 100644 >>> --- a/drivers/gpu/drm/drm_probe_helper.c >>> +++ b/drivers/gpu/drm/drm_probe_helper.c >>> @@ -129,6 +129,7 @@ void drm_kms_helper_poll_enable_locked(struct drm_device *dev) >>> { >>> bool poll = false; >>> struct drm_connector *connector; >>> + unsigned long delay = DRM_OUTPUT_POLL_PERIOD; >>> >>> WARN_ON(!mutex_is_locked(&dev->mode_config.mutex)); >>> >>> @@ -141,8 +142,13 @@ void drm_kms_helper_poll_enable_locked(struct drm_device *dev) >>> poll = true; >>> } >>> >>> + if (dev->mode_config.delayed_event) { >>> + poll = true; >>> + delay = 0; >>> + } >>> + >>> if (poll) >>> - schedule_delayed_work(&dev->mode_config.output_poll_work, DRM_OUTPUT_POLL_PERIOD); >>> + schedule_delayed_work(&dev->mode_config.output_poll_work, delay); >>> } >>> EXPORT_SYMBOL(drm_kms_helper_poll_enable_locked); >>> >>> -- >>> 2.9.3 >>> > -- Péter