From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96A56C433F5 for ; Fri, 24 Sep 2021 22:50:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 654C060F41 for ; Fri, 24 Sep 2021 22:50:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343936AbhIXWwE (ORCPT ); Fri, 24 Sep 2021 18:52:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232123AbhIXWwD (ORCPT ); Fri, 24 Sep 2021 18:52:03 -0400 Received: from mail-oi1-x22c.google.com (mail-oi1-x22c.google.com [IPv6:2607:f8b0:4864:20::22c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C21FCC061571 for ; Fri, 24 Sep 2021 15:50:29 -0700 (PDT) Received: by mail-oi1-x22c.google.com with SMTP id x124so16535751oix.9 for ; Fri, 24 Sep 2021 15:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fFDW7lncbxlfdS6O3az8pUVpsIvToFPs2S/93aYbsYE=; b=DCHhPp3O1tvVWh5nQ86u4yOWXvOhsgEuz0xHF/9ViKXs553UsDP90Ve/x006NTFn1S HD4C3h9oDySg896+tpvl3c3i9tnCoG3h5N/PMUJK7HpwroAMguPmrA9Y2+tYq3qJgip9 thb4YclQrm1Cdq+NVM8Zr2lX+oCfYKj9W7Srs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fFDW7lncbxlfdS6O3az8pUVpsIvToFPs2S/93aYbsYE=; b=gzCHQt6GL2AJFRa0ZW4CKzRSgUNDm5gxdaURUQMYRFRF1RnQF/k/VCy2pyefOVejoi 6matbnaaDUIn7Kozn+GrcBdOifz0gZmbK45yHNOBGFTW4kCm5dFY7n2mWsApcjmIfm/e 5WMmthnQhTaXElJCh5rjAllo+gMkptTDcqBkJiLA+vSrXsPeIF1cMt1x34pZ5kMYVsi4 Lb2C3zc2pfTC5mqj/gOHWXc4JX+2rlQVDB8hxtBrA0ZEMPWxXA+l/hoPTLLz13g4Xipt uiiqGqyXU51ZRZ5NH0bKvrcXPXIlhmOGs8pTHvgmLxtyM/5xieRyJcw7CYuNFv1XGlA6 umAg== X-Gm-Message-State: AOAM5317irPsNga90WcyUJOk9xbQ+bR8qPwIKY8nJsYzIHOmShnx6qUJ h3g96uNOdbfQqujktjNssm/ILhpCt4rt7jyZMMW+IQ== X-Google-Smtp-Source: ABdhPJwhNF96FPqLktf2AeTdHBfKB4aoz398F+kZSpY/G9t8bSrit9cTUBqsQk6btKhlTkjxJumwvEhMKpCgNF3WI5Y= X-Received: by 2002:aca:ebd5:: with SMTP id j204mr3429389oih.14.1632523829045; Fri, 24 Sep 2021 15:50:29 -0700 (PDT) MIME-Version: 1.0 References: <20210920171042.oq3ndp3ox4xv5odh@gilmour> <20210922095725.dk4vk42zb3kh7y6s@gilmour> <20210924133022.waqgtr5xjjxigong@gilmour> In-Reply-To: <20210924133022.waqgtr5xjjxigong@gilmour> From: Daniel Vetter Date: Sat, 25 Sep 2021 00:50:17 +0200 Message-ID: Subject: Re: Regression with mainline kernel on rpi4 To: Maxime Ripard Cc: Linus Torvalds , Sudip Mukherjee , Emma Anholt , David Airlie , Philipp Zabel , dri-devel , linux-kernel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 24, 2021 at 3:30 PM Maxime Ripard wrote: > > On Wed, Sep 22, 2021 at 01:25:21PM -0700, Linus Torvalds wrote: > > On Wed, Sep 22, 2021 at 1:19 PM Sudip Mukherjee > > wrote: > > > > > > I added some debugs to print the addresses, and I am getting: > > > [ 38.813809] sudip crtc 0000000000000000 > > > > > > This is from struct drm_crtc *crtc = connector->state->crtc; > > > > Yeah, that was my personal suspicion, because while the line number > > implied "crtc->state" being NULL, the drm data structure documentation > > and other drivers both imply that "crtc" was the more likely one. > > > > I suspect a simple > > > > if (!crtc) > > return; > > > > in vc4_hdmi_set_n_cts() is at least part of the fix for this all, but > > I didn't check if there is possibly something else that needs to be > > done too. > > Thanks for the decode_stacktrace.sh and the follow-up > > Yeah, it looks like we have several things wrong here: > > * we only check that connector->state is set, and not > connector->state->crtc indeed. > > * We also check only in startup(), so at open() and not later on when > the sound streaming actually start. This has been there for a while, > so I guess it's never really been causing a practical issue before. You also have no locking, plus looking at ->state objects outside of atomic commit machinery makes no sense because you're not actually in sync with the hw state. Relevant bits need to be copied over at commit time, protected by some spinlock (and that spinlock also needs to be held over whatever other stuff you're setting to make sure we don't get a funny out-of-sync state anywhere). Liberally sprinkling a few NULL checks here doesn't fix much at all, it only papers over design bugs in the code. -Daniel > I'm still not entirely sure how we can end up in that situation though. > The only case I could think of is that: > > * The firmware enables the HDMI controller, then boots Linux > > * The driver starts, registers its audio card. connector->state is > NULL then, and if the HDMI monitor is actually an HDMI monitor (vs a > DVI monitor), the VC4_HDMI_RAM_PACKET_ENABLE bit that we test in > startup will be set. > > * The driver will create the connector->state (through a call to > drm_mode_config_reset in vc4_kms_load), connector->state isn't NULL > anymore, VC4_HDMI_RAM_PACKET_ENABLE is still set. > > * The driver then disables the HDMI controller (in > vc4_crtc_disable_at_boot) but never clears the > VC4_HDMI_RAM_PACKET_ENABLE bit. > > * Pulseaudio opens the audio device, startup succeeds because both > conditions we test succeed. > > * However, since we either never enabled the HDMI connector (or if it > was disabled at some point), connector->state->crtc is NULL and we > get our NULL pointer dereference. > > The Ubuntu configuration has the framebuffer emulation and the > framebuffer console enabled, so it's likely to be enabled and > something (X.org?) comes along and disables the connector right when > pulseaudio calls prepare(). > > Maxime -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB7A6C433EF for ; Fri, 24 Sep 2021 22:50:33 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 314B061107 for ; Fri, 24 Sep 2021 22:50:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 314B061107 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EABBD6EE91; Fri, 24 Sep 2021 22:50:31 +0000 (UTC) Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by gabe.freedesktop.org (Postfix) with ESMTPS id DFE776EE91 for ; Fri, 24 Sep 2021 22:50:29 +0000 (UTC) Received: by mail-oi1-x22a.google.com with SMTP id s24so13980285oij.8 for ; Fri, 24 Sep 2021 15:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fFDW7lncbxlfdS6O3az8pUVpsIvToFPs2S/93aYbsYE=; b=DCHhPp3O1tvVWh5nQ86u4yOWXvOhsgEuz0xHF/9ViKXs553UsDP90Ve/x006NTFn1S HD4C3h9oDySg896+tpvl3c3i9tnCoG3h5N/PMUJK7HpwroAMguPmrA9Y2+tYq3qJgip9 thb4YclQrm1Cdq+NVM8Zr2lX+oCfYKj9W7Srs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fFDW7lncbxlfdS6O3az8pUVpsIvToFPs2S/93aYbsYE=; b=St+HFF7a99fpOCYkhbTQo4CB0ztWputita2isgKgvKrlAxcmiRLc7s25HcFIJ1KLwJ WBlosFwlQZYanEzN7h3W5mJZSJANpB1rw+dNgwxu9fIFDlT7Fj0hRA5FBaF0QPGTVBPA 7h9CF3qicQjvgHm6mfisqpd0Rx3UL3cRaYOQZZc+jrmQoVuUc6EFFgVm+5VZGL6elJdi H4TmMN9D9/iUsgcO8/LAgrFjkus+8ZKrANZsdSueHo6YbrehrNAmkTT28ivAGoSKkBR5 OyHsoGpaHt96+4VdklJGayebrKlYut25XVfEs4CmSPh45A5l9/DO6mJiPdZbdx1Esc69 0NMw== X-Gm-Message-State: AOAM531YtZkA3zG6i4/99DBrDMdJQigKiqsC+Y+EoAUOzARS2XYAyyFt R8ThGers/SGFHZgbeC5+LQmhJLdpGLGCJKR7Wya0pA== X-Google-Smtp-Source: ABdhPJwhNF96FPqLktf2AeTdHBfKB4aoz398F+kZSpY/G9t8bSrit9cTUBqsQk6btKhlTkjxJumwvEhMKpCgNF3WI5Y= X-Received: by 2002:aca:ebd5:: with SMTP id j204mr3429389oih.14.1632523829045; Fri, 24 Sep 2021 15:50:29 -0700 (PDT) MIME-Version: 1.0 References: <20210920171042.oq3ndp3ox4xv5odh@gilmour> <20210922095725.dk4vk42zb3kh7y6s@gilmour> <20210924133022.waqgtr5xjjxigong@gilmour> In-Reply-To: <20210924133022.waqgtr5xjjxigong@gilmour> From: Daniel Vetter Date: Sat, 25 Sep 2021 00:50:17 +0200 Message-ID: Subject: Re: Regression with mainline kernel on rpi4 To: Maxime Ripard Cc: Linus Torvalds , Sudip Mukherjee , Emma Anholt , David Airlie , Philipp Zabel , dri-devel , linux-kernel Content-Type: text/plain; charset="UTF-8" X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Fri, Sep 24, 2021 at 3:30 PM Maxime Ripard wrote: > > On Wed, Sep 22, 2021 at 01:25:21PM -0700, Linus Torvalds wrote: > > On Wed, Sep 22, 2021 at 1:19 PM Sudip Mukherjee > > wrote: > > > > > > I added some debugs to print the addresses, and I am getting: > > > [ 38.813809] sudip crtc 0000000000000000 > > > > > > This is from struct drm_crtc *crtc = connector->state->crtc; > > > > Yeah, that was my personal suspicion, because while the line number > > implied "crtc->state" being NULL, the drm data structure documentation > > and other drivers both imply that "crtc" was the more likely one. > > > > I suspect a simple > > > > if (!crtc) > > return; > > > > in vc4_hdmi_set_n_cts() is at least part of the fix for this all, but > > I didn't check if there is possibly something else that needs to be > > done too. > > Thanks for the decode_stacktrace.sh and the follow-up > > Yeah, it looks like we have several things wrong here: > > * we only check that connector->state is set, and not > connector->state->crtc indeed. > > * We also check only in startup(), so at open() and not later on when > the sound streaming actually start. This has been there for a while, > so I guess it's never really been causing a practical issue before. You also have no locking, plus looking at ->state objects outside of atomic commit machinery makes no sense because you're not actually in sync with the hw state. Relevant bits need to be copied over at commit time, protected by some spinlock (and that spinlock also needs to be held over whatever other stuff you're setting to make sure we don't get a funny out-of-sync state anywhere). Liberally sprinkling a few NULL checks here doesn't fix much at all, it only papers over design bugs in the code. -Daniel > I'm still not entirely sure how we can end up in that situation though. > The only case I could think of is that: > > * The firmware enables the HDMI controller, then boots Linux > > * The driver starts, registers its audio card. connector->state is > NULL then, and if the HDMI monitor is actually an HDMI monitor (vs a > DVI monitor), the VC4_HDMI_RAM_PACKET_ENABLE bit that we test in > startup will be set. > > * The driver will create the connector->state (through a call to > drm_mode_config_reset in vc4_kms_load), connector->state isn't NULL > anymore, VC4_HDMI_RAM_PACKET_ENABLE is still set. > > * The driver then disables the HDMI controller (in > vc4_crtc_disable_at_boot) but never clears the > VC4_HDMI_RAM_PACKET_ENABLE bit. > > * Pulseaudio opens the audio device, startup succeeds because both > conditions we test succeed. > > * However, since we either never enabled the HDMI connector (or if it > was disabled at some point), connector->state->crtc is NULL and we > get our NULL pointer dereference. > > The Ubuntu configuration has the framebuffer emulation and the > framebuffer console enabled, so it's likely to be enabled and > something (X.org?) comes along and disables the connector right when > pulseaudio calls prepare(). > > Maxime -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch