alsa-devel.alsa-project.org archive mirror
 help / color / mirror / Atom feed
From: "Marcin Ślusarz" <marcin.slusarz@gmail.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "moderated list:SOUND - SOC LAYER / DYNAMIC AUDIO POWER
	MANAGEM..." <alsa-devel@alsa-project.org>,
	Erik Kaneda <erik.kaneda@intel.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Vinod Koul <vkoul@kernel.org>,
	Bard Liao <yung-chuan.liao@linux.intel.com>,
	Len Brown <lenb@kernel.org>
Subject: Re: Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10
Date: Thu, 4 Feb 2021 13:11:21 +0100	[thread overview]
Message-ID: <CA+GA0_sZQXACjuzYYvrJq-vF-mmjaq82SJ=kifqo4Utv45s5Yg@mail.gmail.com> (raw)
In-Reply-To: <CA+GA0_s7atD4O_DP0NXwVUVvdia2NWwSEfW2Mcw-UoJ9effPvg@mail.gmail.com>

pon., 1 lut 2021 o 13:16 Marcin Ślusarz <marcin.slusarz@gmail.com> napisał(a):
>
> pon., 1 lut 2021 o 12:43 Rafael J. Wysocki <rafael@kernel.org> napisał(a):
> >
> > On Fri, Jan 29, 2021 at 9:03 PM Marcin Ślusarz <marcin.slusarz@gmail.com> wrote:
> > >
> > > pt., 29 sty 2021 o 19:59 Marcin Ślusarz <marcin.slusarz@gmail.com> napisał(a):
> > > >
> > > > czw., 28 sty 2021 o 15:32 Marcin Ślusarz <marcin.slusarz@gmail.com> napisał(a):
> > > > >
> > > > > czw., 28 sty 2021 o 13:39 Rafael J. Wysocki <rafael@kernel.org> napisał(a):
> > > > > > The only explanation for that I can think about (and which does not
> > > > > > involve supernatural intervention so to speak) is a stack corruption
> > > > > > occurring between these two calls in sdw_intel_acpi_cb().  IOW,
> > > > > > something scribbles on the handle in the meantime, but ATM I have no
> > > > > > idea what that can be.
> > > > >
> > > > > I tried KASAN but it didn't find anything and kernel actually booted
> > > > > successfully.
> > > >
> > > > I investigated this and it looks like a compiler bug (or something nastier),
> > > > but I can't find where exactly registers get corrupted because if I add printks
> > > > the corruption seems on the printk side, but if I don't add them it seems
> > > > the value gets corrupted earlier.
> > > (...)
> > > > I'm using gcc 10.2.1 from Debian testing.
> > >
> > > Someone on IRC, after hearing only that "gcc miscompiles the kernel",
> > > suggested disabling CONFIG_STACKPROTECTOR_STRONG.
> > > It helped indeed and it matches my observations, so it's quite likely it
> > > is the culprit.
> > >
> > > What do we do now?
> >
> > Figure out why the stack protection kicks in, I suppose.
> >
> > The target object is not on the stack, so if the pointer to it is
> > valid (we need to verify somehow that it is indeed), dereferencing it
> > shouldn't cause the stack protection to trigger.
>
> Well, the problem is not that stack protector finds something, but
> the feature itself corrupts some registers.

I retract this statement.

Originally I based it on this piece of code:
   0xffffffff815781f0 <+35>:    mov    %r12,%rdx
   0xffffffff815781f3 <+38>:    mov    $0xffffffff81eca4c0,%rsi
   0xffffffff815781fa <+45>:    mov    $0xffffffff82146d46,%rdi
   0xffffffff81578201 <+52>:    call   0xffffffff818909f1 <printk>
   0xffffffff81578206 <+57>:    cmpb   $0xf,0x8(%r12)
where crash is on the last line and I supposedly could see the message
printed by printk with the correct value of %r12.
However, after attaching kgdb+kgdboe (it's so much pain...) to the kernel
I discovered that someting corrupts memory so much that the formatting
string becomes "", which means that I don't actually see the output of printk.

So stack corruption from printk is rather unlikely and something else
must be going on.

Before I started messing with kgdb, I tried to bisect this issue - it pointed at
279c3393e2c113365c999f16cd096bcf3d34319e "mm: kmem: move
memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()",
which is odd, because it's totally unrelated and doesn't even trigger
recompilation of anything else. I can consistently reproduce the crash
on this commit and can't on commit before. Reverting it on 5.10.11 is
not possible, because it conflicts with changes that went in after this one.

acpi_ns_validate_handle is called hundreds (if not thousands) of times
before it crashes, so I think it's unlikely that it is compiled incorrectly
(and I spent many hours reading the assembly, comparing to what
gcc 9 generates, diving into printk, etc).
Something before it must be corrupting memory.

Another thing that I noticed is that when I set breakpoints in kgdb
on two functions (do_init_module and local_pci_probe) and just hit
"continue" the kernel doesn't crash!

I discovered it because I wanted to trace sdw_intel_acpi_scan /
sdw_intel_acpi_cb to see where the memory is corrupted, but I can't
set breakpoints on code in modules with kgdb :(, so when I tried
to step into this code from module loading the crash disappeared.

The first code I could trace where I see memory corruption is
acpi_bus_get_device, which is called from sdw_intel_scan_controller.
I suspect that sdw_intel_acpi_scan is doing this (which means that
sdw_intel_acpi_cb -> acpi_evaluate_integer is likely to blame),
but I don't have proof.

This issue is driving me mad ;). Please help.

Marcin

  reply	other threads:[~2021-02-04 12:12 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 19:56 Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10 Marcin Ślusarz
2021-01-20 20:34 ` Rafael J. Wysocki
2021-01-20 22:28   ` Pierre-Louis Bossart
2021-01-21 17:47     ` Marcin Ślusarz
2021-01-27 16:36       ` Marcin Ślusarz
2021-01-27 17:28         ` Pierre-Louis Bossart
2021-01-27 19:18           ` Marcin Ślusarz
2021-01-27 21:52             ` Pierre-Louis Bossart
2021-01-27 22:02             ` Pierre-Louis Bossart
2021-01-28 13:25               ` Marcin Ślusarz
2021-01-28 13:31                 ` Rafael J. Wysocki
2021-01-28 12:13             ` Rafael J. Wysocki
2021-01-28 12:39               ` Rafael J. Wysocki
2021-01-28 13:45                 ` Marcin Ślusarz
2021-01-28 14:32                 ` Marcin Ślusarz
2021-01-29 18:59                   ` Marcin Ślusarz
2021-01-29 20:03                     ` Marcin Ślusarz
2021-02-01 11:42                       ` Rafael J. Wysocki
2021-02-01 12:16                         ` Marcin Ślusarz
2021-02-04 12:11                           ` Marcin Ślusarz [this message]
2021-02-04 12:48                             ` Marcin Ślusarz
2021-02-05 15:40                               ` [PATCH] soundwire: intel: fix possible crash when no device is detected (was Re: Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10) Marcin Ślusarz
2021-02-05 16:16                                 ` Pierre-Louis Bossart
2021-02-08 12:01                                   ` [PATCH 1/2] soundwire: intel: fix possible crash when no device is detected Marcin Ślusarz
2021-02-08 12:01                                     ` [PATCH 2/2] ACPICA: update documentation of acpi_walk_namespace Marcin Ślusarz
2021-02-08 12:43                                       ` Rafael J. Wysocki
2021-02-08 12:37                                     ` [PATCH 1/2] soundwire: intel: fix possible crash when no device is detected Rafael J. Wysocki
2021-02-10 23:15                                       ` Pierre-Louis Bossart
2021-02-11  5:20                                         ` Vinod Koul
2021-01-28 13:29               ` Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10 Marcin Ślusarz
2021-01-28 13:36                 ` Rafael J. Wysocki
2021-01-25 18:38     ` Salvatore Bonaccorso
2021-01-25 19:26       ` Pierre-Louis Bossart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+GA0_sZQXACjuzYYvrJq-vF-mmjaq82SJ=kifqo4Utv45s5Yg@mail.gmail.com' \
    --to=marcin.slusarz@gmail.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=erik.kaneda@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=pierre-louis.bossart@linux.intel.com \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=vkoul@kernel.org \
    --cc=yung-chuan.liao@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).