All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marcin Ślusarz" <marcin.slusarz@gmail.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>,
	"moderated list:SOUND - SOC LAYER / DYNAMIC AUDIO POWER
	MANAGEM..."  <alsa-devel@alsa-project.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Vinod Koul <vkoul@kernel.org>,
	Bard Liao <yung-chuan.liao@linux.intel.com>,
	Len Brown <lenb@kernel.org>, Erik Kaneda <erik.kaneda@intel.com>
Subject: Re: Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10
Date: Thu, 28 Jan 2021 14:45:09 +0100	[thread overview]
Message-ID: <CA+GA0_u56Rf1ETi0q9-AgHH0taszhcY4xUcEarvxi_fFu6DqCw@mail.gmail.com> (raw)
In-Reply-To: <CAJZ5v0h8abkdrdN97RHouzxynPBFXBoAuMSb7Zy56+-sTXkPKQ@mail.gmail.com>

czw., 28 sty 2021 o 13:39 Rafael J. Wysocki <rafael@kernel.org> napisał(a):
>
> On Thu, Jan 28, 2021 at 1:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Jan 27, 2021 at 8:19 PM Marcin Ślusarz <marcin.slusarz@gmail.com> wrote:
> > >
> > > śr., 27 sty 2021 o 18:28 Pierre-Louis Bossart
> > > <pierre-louis.bossart@linux.intel.com> napisał(a):
> > > > > Weird, I can't reproduce this problem with my self-compiled kernel :/
> > > > > I don't even see soundwire modules loaded in. Manually loading them of course
> > > > > doesn't do much.
> > > > >
> > > > > Previously I could boot into the "faulty" kernel by using "recovery mode", but
> > > > > I can't do that anymore - it crashes too.
> > > > >
> > > > > Maybe there's some kind of race and this bug depends on some specific
> > > > > ordering of events?
> > > >
> > > > missing Kconfig?
> > > > You need CONFIG_SOUNDWIRE and CONFIG_SND_SOC_SOF_INTEL_SOUNDWIRE
> > > > selected to enter this sdw_intel_acpi_scan() routine.
> > >
> > > It was a PEBKAC, but a slightly different one. I won't bore you with
> > > (embarrassing) details ;).
> > >
> > > I reproduced the problem, tested both your and Rafael's patches
> > > and the kernel still crashes, with the same stack trace.
> > > (Yes, I'm sure I booted the right kernel :)
> > >
> > > Why "recovery mode" stopped working (or worked previously) is still a mystery.
> >
> > So for clarity, you've tried this:
> >
> > static int snd_intel_dsp_check_soundwire(struct pci_dev *pci)
> > {
> >     struct sdw_intel_acpi_info info;
> >     acpi_handle handle;
> >     int ret;
> >
> >     handle = ACPI_HANDLE(&pci->dev);
> >     if (!handle)
> >         return -ENODEV;
> >
> > and it has not made a difference?
> >
> > And the relevant part of the trace is:
> >
> > RIP: 0010:acpi_ns_validate_handle+0x1a/0x23
> > Code: 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 44 00 00
> > 48 8d 57 ff 48 89 f8 48 83 fa fd 76 08 48 8b 05 0c b8 67 01 c3 <80> 7f
> > 08 0f 74 02 31 c0 c3 0f 1f 44 00 00 48 8b 3d f6 b7 67 01 e8
> > RSP: 0000:ffffc388807c7b20 EFLAGS: 00010213
> > RAX: 0000000000000048 RBX: ffffc388807c7b70 RCX: 0000000000000000
> > RDX: 0000000000000047 RSI: 0000000000000246 RDI: 0000000000000048
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: ffffffffc0f5f4d1 R11: ffffffff8f0cb268 R12: 0000000000001001
> > R13: ffffffff8e33b160 R14: 0000000000000048 R15: 0000000000000000
> > FS:  00007f24548288c0(0000) GS:ffff9f781fb80000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000050 CR3: 0000000106158004 CR4: 0000000000770ee0
> > PKRU: 55555554
> > Call Trace:
> >  acpi_get_data_full+0x4d/0x92
> >  acpi_bus_get_device+0x1f/0x40
> >  sdw_intel_acpi_scan+0x59/0x230 [soundwire_intel]
> >  ? strstr+0x22/0x60
> >  ? dmi_matches+0x76/0xe0
> >  snd_intel_dsp_driver_probe.cold+0xaf/0x163 [snd_intel_dspcfg]
> >  azx_probe+0x7a/0x970 [snd_hda_intel]
> >  local_pci_probe+0x42/0x80
> >  ? _cond_resched+0x16/0x40
> >  pci_device_probe+0xfd/0x1b0
> >
> > so it looks like we got to sdw_intel_acpi_scan() with a non-NULL, but
> > otherwise invalid parent_handle which then was passed to
> > acpi_bus_get_device().  Subsequently it got to acpi_get_data_full()
> > and acpi_ns_validate_handle() that crashed, because it tried to
> > dereference it via ACPI_GET_DESCRIPTOR_TYPE().
>
> But interestingly enough, sdw_intel_acpi_cb() calls
> acpi_evaluate_integer() on the same handle that is passed to
> acpi_bus_get_device() later and it also calls
> acpi_ns_validate_handle() on that handle which doesn't crash.
>
> Moreover, it asks _ADR to be evaluated with respect to that handle and
> because it gets to the acpi_bus_get_device() call at all, this appears
> to have been successful.
>
> The only explanation for that I can think about (and which does not
> involve supernatural intervention so to speak) is a stack corruption
> occurring between these two calls in sdw_intel_acpi_cb().  IOW,
> something scribbles on the handle in the meantime, but ATM I have no
> idea what that can be.
>
> Marcin, please boot with ACPICA deubg (level = ACPI_LV_INFO and
> component = ACPI_NAMESPACE | ACPI_BUS_COMPONENT) enabled (see
> Documentation/firmware-guide/acpi/debug.rst for instructions) and
> collect the log.

https://people.freedesktop.org/~mslusarz/tmp/acpi_debug.txt

WARNING: multiple messages have this Message-ID (diff)
From: "Marcin Ślusarz" <marcin.slusarz@gmail.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "moderated list:SOUND - SOC LAYER / DYNAMIC AUDIO POWER
	MANAGEM..." <alsa-devel@alsa-project.org>,
	Erik Kaneda <erik.kaneda@intel.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Vinod Koul <vkoul@kernel.org>,
	Bard Liao <yung-chuan.liao@linux.intel.com>,
	Len Brown <lenb@kernel.org>
Subject: Re: Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10
Date: Thu, 28 Jan 2021 14:45:09 +0100	[thread overview]
Message-ID: <CA+GA0_u56Rf1ETi0q9-AgHH0taszhcY4xUcEarvxi_fFu6DqCw@mail.gmail.com> (raw)
In-Reply-To: <CAJZ5v0h8abkdrdN97RHouzxynPBFXBoAuMSb7Zy56+-sTXkPKQ@mail.gmail.com>

czw., 28 sty 2021 o 13:39 Rafael J. Wysocki <rafael@kernel.org> napisał(a):
>
> On Thu, Jan 28, 2021 at 1:13 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Jan 27, 2021 at 8:19 PM Marcin Ślusarz <marcin.slusarz@gmail.com> wrote:
> > >
> > > śr., 27 sty 2021 o 18:28 Pierre-Louis Bossart
> > > <pierre-louis.bossart@linux.intel.com> napisał(a):
> > > > > Weird, I can't reproduce this problem with my self-compiled kernel :/
> > > > > I don't even see soundwire modules loaded in. Manually loading them of course
> > > > > doesn't do much.
> > > > >
> > > > > Previously I could boot into the "faulty" kernel by using "recovery mode", but
> > > > > I can't do that anymore - it crashes too.
> > > > >
> > > > > Maybe there's some kind of race and this bug depends on some specific
> > > > > ordering of events?
> > > >
> > > > missing Kconfig?
> > > > You need CONFIG_SOUNDWIRE and CONFIG_SND_SOC_SOF_INTEL_SOUNDWIRE
> > > > selected to enter this sdw_intel_acpi_scan() routine.
> > >
> > > It was a PEBKAC, but a slightly different one. I won't bore you with
> > > (embarrassing) details ;).
> > >
> > > I reproduced the problem, tested both your and Rafael's patches
> > > and the kernel still crashes, with the same stack trace.
> > > (Yes, I'm sure I booted the right kernel :)
> > >
> > > Why "recovery mode" stopped working (or worked previously) is still a mystery.
> >
> > So for clarity, you've tried this:
> >
> > static int snd_intel_dsp_check_soundwire(struct pci_dev *pci)
> > {
> >     struct sdw_intel_acpi_info info;
> >     acpi_handle handle;
> >     int ret;
> >
> >     handle = ACPI_HANDLE(&pci->dev);
> >     if (!handle)
> >         return -ENODEV;
> >
> > and it has not made a difference?
> >
> > And the relevant part of the trace is:
> >
> > RIP: 0010:acpi_ns_validate_handle+0x1a/0x23
> > Code: 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 44 00 00
> > 48 8d 57 ff 48 89 f8 48 83 fa fd 76 08 48 8b 05 0c b8 67 01 c3 <80> 7f
> > 08 0f 74 02 31 c0 c3 0f 1f 44 00 00 48 8b 3d f6 b7 67 01 e8
> > RSP: 0000:ffffc388807c7b20 EFLAGS: 00010213
> > RAX: 0000000000000048 RBX: ffffc388807c7b70 RCX: 0000000000000000
> > RDX: 0000000000000047 RSI: 0000000000000246 RDI: 0000000000000048
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: ffffffffc0f5f4d1 R11: ffffffff8f0cb268 R12: 0000000000001001
> > R13: ffffffff8e33b160 R14: 0000000000000048 R15: 0000000000000000
> > FS:  00007f24548288c0(0000) GS:ffff9f781fb80000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000050 CR3: 0000000106158004 CR4: 0000000000770ee0
> > PKRU: 55555554
> > Call Trace:
> >  acpi_get_data_full+0x4d/0x92
> >  acpi_bus_get_device+0x1f/0x40
> >  sdw_intel_acpi_scan+0x59/0x230 [soundwire_intel]
> >  ? strstr+0x22/0x60
> >  ? dmi_matches+0x76/0xe0
> >  snd_intel_dsp_driver_probe.cold+0xaf/0x163 [snd_intel_dspcfg]
> >  azx_probe+0x7a/0x970 [snd_hda_intel]
> >  local_pci_probe+0x42/0x80
> >  ? _cond_resched+0x16/0x40
> >  pci_device_probe+0xfd/0x1b0
> >
> > so it looks like we got to sdw_intel_acpi_scan() with a non-NULL, but
> > otherwise invalid parent_handle which then was passed to
> > acpi_bus_get_device().  Subsequently it got to acpi_get_data_full()
> > and acpi_ns_validate_handle() that crashed, because it tried to
> > dereference it via ACPI_GET_DESCRIPTOR_TYPE().
>
> But interestingly enough, sdw_intel_acpi_cb() calls
> acpi_evaluate_integer() on the same handle that is passed to
> acpi_bus_get_device() later and it also calls
> acpi_ns_validate_handle() on that handle which doesn't crash.
>
> Moreover, it asks _ADR to be evaluated with respect to that handle and
> because it gets to the acpi_bus_get_device() call at all, this appears
> to have been successful.
>
> The only explanation for that I can think about (and which does not
> involve supernatural intervention so to speak) is a stack corruption
> occurring between these two calls in sdw_intel_acpi_cb().  IOW,
> something scribbles on the handle in the meantime, but ATM I have no
> idea what that can be.
>
> Marcin, please boot with ACPICA deubg (level = ACPI_LV_INFO and
> component = ACPI_NAMESPACE | ACPI_BUS_COMPONENT) enabled (see
> Documentation/firmware-guide/acpi/debug.rst for instructions) and
> collect the log.

https://people.freedesktop.org/~mslusarz/tmp/acpi_debug.txt

  reply	other threads:[~2021-01-28 13:46 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 19:56 Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10 Marcin Ślusarz
2021-01-20 19:56 ` Marcin Ślusarz
2021-01-20 20:34 ` Rafael J. Wysocki
2021-01-20 20:34   ` Rafael J. Wysocki
2021-01-20 22:28   ` Pierre-Louis Bossart
2021-01-20 22:28     ` Pierre-Louis Bossart
2021-01-21 17:47     ` Marcin Ślusarz
2021-01-21 17:47       ` Marcin Ślusarz
2021-01-27 16:36       ` Marcin Ślusarz
2021-01-27 16:36         ` Marcin Ślusarz
2021-01-27 17:28         ` Pierre-Louis Bossart
2021-01-27 17:28           ` Pierre-Louis Bossart
2021-01-27 19:18           ` Marcin Ślusarz
2021-01-27 19:18             ` Marcin Ślusarz
2021-01-27 21:52             ` Pierre-Louis Bossart
2021-01-27 21:52               ` Pierre-Louis Bossart
2021-01-27 22:02             ` Pierre-Louis Bossart
2021-01-27 22:02               ` Pierre-Louis Bossart
2021-01-28 13:25               ` Marcin Ślusarz
2021-01-28 13:25                 ` Marcin Ślusarz
2021-01-28 13:31                 ` Rafael J. Wysocki
2021-01-28 13:31                   ` Rafael J. Wysocki
2021-01-28 12:13             ` Rafael J. Wysocki
2021-01-28 12:13               ` Rafael J. Wysocki
2021-01-28 12:39               ` Rafael J. Wysocki
2021-01-28 12:39                 ` Rafael J. Wysocki
2021-01-28 13:45                 ` Marcin Ślusarz [this message]
2021-01-28 13:45                   ` Marcin Ślusarz
2021-01-28 14:32                 ` Marcin Ślusarz
2021-01-28 14:32                   ` Marcin Ślusarz
2021-01-29 18:59                   ` Marcin Ślusarz
2021-01-29 18:59                     ` Marcin Ślusarz
2021-01-29 20:03                     ` Marcin Ślusarz
2021-01-29 20:03                       ` Marcin Ślusarz
2021-02-01 11:42                       ` Rafael J. Wysocki
2021-02-01 11:42                         ` Rafael J. Wysocki
2021-02-01 12:16                         ` Marcin Ślusarz
2021-02-01 12:16                           ` Marcin Ślusarz
2021-02-04 12:11                           ` Marcin Ślusarz
2021-02-04 12:11                             ` Marcin Ślusarz
2021-02-04 12:48                             ` Marcin Ślusarz
2021-02-04 12:48                               ` Marcin Ślusarz
2021-02-05 15:40                               ` [PATCH] soundwire: intel: fix possible crash when no device is detected (was Re: Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10) Marcin Ślusarz
2021-02-05 15:40                                 ` Marcin Ślusarz
2021-02-05 16:16                                 ` Pierre-Louis Bossart
2021-02-05 16:16                                   ` Pierre-Louis Bossart
2021-02-08 12:01                                   ` [PATCH 1/2] soundwire: intel: fix possible crash when no device is detected Marcin Ślusarz
2021-02-08 12:01                                     ` Marcin Ślusarz
2021-02-08 12:01                                     ` [PATCH 2/2] ACPICA: update documentation of acpi_walk_namespace Marcin Ślusarz
2021-02-08 12:01                                       ` Marcin Ślusarz
2021-02-08 12:43                                       ` Rafael J. Wysocki
2021-02-08 12:43                                         ` Rafael J. Wysocki
2021-02-12 12:27                                         ` [PATCH] " Marcin Ślusarz
2021-02-12 13:26                                           ` Rafael J. Wysocki
2021-02-08 12:37                                     ` [PATCH 1/2] soundwire: intel: fix possible crash when no device is detected Rafael J. Wysocki
2021-02-08 12:37                                       ` Rafael J. Wysocki
2021-02-10 23:15                                       ` Pierre-Louis Bossart
2021-02-10 23:15                                         ` Pierre-Louis Bossart
2021-02-11  5:20                                         ` Vinod Koul
2021-02-11  5:20                                           ` Vinod Koul
2021-01-28 13:29               ` Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10 Marcin Ślusarz
2021-01-28 13:29                 ` Marcin Ślusarz
2021-01-28 13:36                 ` Rafael J. Wysocki
2021-01-28 13:36                   ` Rafael J. Wysocki
2021-01-25 18:38     ` Salvatore Bonaccorso
2021-01-25 18:38       ` Salvatore Bonaccorso
2021-01-25 19:26       ` Pierre-Louis Bossart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+GA0_u56Rf1ETi0q9-AgHH0taszhcY4xUcEarvxi_fFu6DqCw@mail.gmail.com \
    --to=marcin.slusarz@gmail.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=erik.kaneda@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=pierre-louis.bossart@linux.intel.com \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=vkoul@kernel.org \
    --cc=yung-chuan.liao@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.