From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762452AbdAIQ7U (ORCPT ); Mon, 9 Jan 2017 11:59:20 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:34223 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751317AbdAIQ7Q (ORCPT ); Mon, 9 Jan 2017 11:59:16 -0500 MIME-Version: 1.0 X-Originating-IP: [212.51.149.109] In-Reply-To: References: <7fd16549-1349-a9e5-ceff-9aa6f748caae@intel.com> <20170109101516.y3acaev5ujbjugwl@phenom.ffwll.local> <16a1e734-667c-5d9a-c418-555b1f13e446@intel.com> From: Daniel Vetter Date: Mon, 9 Jan 2017 17:59:15 +0100 X-Google-Sender-Auth: AqSz5utI_AGuYx2HU13wz8ROiTs Message-ID: Subject: Re: [Intel-gfx] 4.10-rc2 oops in DRM connector code To: Dave Hansen Cc: Daniel Vetter , Jani Nikula , David Airlie , intel-gfx , dri-devel , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 9, 2017 at 5:50 PM, Dave Hansen wrote: > On 01/09/2017 08:41 AM, Daniel Vetter wrote: >> On Mon, Jan 9, 2017 at 2:40 PM, Dave Hansen wrote: >>> Well, now I found where the -2 comes from. >>> intel_dp_register_mst_connector() calls drm_connector_register(), which >>> fails to add the kobject (warning below). But, it does zero error >>> checking on the drm_connector_register() call and leaves the >>> partially-constructed connector in place. >>> >>> The next time some poor, hapless code goes and tries to do anything with >>> that kdev, they oops. I'm perplexed by this, though. The >>> drm_dp_mst_topology_cbs->register_connector just returns void. It seems >>> a bit goofy that it can't even _return_ failure. >>> >>> Is there some stable code to go back to here? Or, is there something >>> about my configuration that's unique? I really wonder why nobody else >>> is running into this. >>> >>> There's probably some other race going on here. This warning doesn't >>> happen on every boot. >> This smells more like the root-cause: Something goes wrong on boot >> that prevents connectors from properly registering, then we fall over >> later on. And the register callback is intentionally void, assuming >> that any prep work has been done earlier and that therefore the >> register step can't fail. Can you pls check whether the oops later on >> only happens together with this warning at boot, or whether they're >> not correlated? > > Looking through my logs, I can't find any instance of the oops without > the warning at boot. So I do think the later oops is entirely caused by > the issue warned about in early boot. Hm, I guess then we'd need to fix that boot-up warning. Can you try to figure out why it's unhappy? On a hunch it could be that we call drm_connector_register from the mst probe worker before the main driver load thread has reached the drm_dev_register call. A few printk to decide whether that's the case (plus a few boot-up tests to gather the statistics, sorry about that) would be real great. If that's inconclusive I'm again a bit low on ideas ... > My distro kernel (4.4.0-57-generic) is also unstable, but I haven't > managed to capture a good oops there. It's hitting this, which I assume > is unrelated: > > WARNING: CPU: 0 PID: 41 at /build/linux-lts-xenial-FdAdUy/linux- > lts-xenial-4.4.0/ubuntu/i915/intel_pm.c:3675 > skl_update_other_pipe_wm+0x191/0x1a0 [i915_bpo]() wm programming issues, which will kill your box. Needs a newer kernel to fix (both the wm programming issues, and that wm programming issues lead to system death). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch