All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Robert Święcki" <robert@swiecki.net>,
	"Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: linux-i2c@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Crashes in 5.15-git in i2c code
Date: Mon, 8 Nov 2021 12:58:23 -0600	[thread overview]
Message-ID: <20211108185823.GA1101310@bhelgaas> (raw)
In-Reply-To: <CAP145phj7jEy6tkdFMdW-rzPprMTUckaaSrtrVysE-u+S+=Lcg@mail.gmail.com>

[+cc Uwe, Rafael, linux-pm, linux-pci, linux-kernel, beginning of
thread: https://lore.kernel.org/linux-i2c/CAP145pgdrdiMAT7=-iB1DMgA7t_bMqTcJL4N0=6u8kNY3EU0dw@mail.gmail.com/T/#t]

On Mon, Nov 08, 2021 at 05:34:14PM +0100, Robert Święcki wrote:
> > I'm daily-driving the linux from Linus' git (recompiling every day or
> > two), and yesterday it stopped booting. Below is the dmesg from
> > pstore.
> > ...
> 
> This introduced the bug: 0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
> https://github.com/torvalds/linux/commit/0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4

Thank you very much for the debugging and this report!  This report is
for i2c, but the problem will affect many drivers.

> > <1>[    1.431369][  T447] BUG: kernel NULL pointer dereference,
> > address: 0000000000000540
> > <1>[    1.431371][  T447] #PF: supervisor read access in kernel mode
> > <1>[    1.431375][  T447] #PF: error_code(0x0000) - not-present page
> > <6>[    1.431378][  T447] PGD 0 P4D 0
> > <4>[    1.431384][  T447] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > <4>[    1.431388][  T447] CPU: 12 PID: 447 Comm: systemd-udevd
> > Tainted: G            E     5.15.0+ #91
> > <4>[    1.431391][  T447] Hardware name: ASUS System Product Name/ROG
> > CROSSHAIR VIII FORMULA, BIOS 3801 07/30/2021
> > <4>[    1.431392][  T447] RIP: 0010:i2c_dw_pci_resume+0x8/0x40
> > [i2c_designware_pci]
> > <4>[    1.431399][  T447] Code: 00 00 00 00 66 66 2e 0f 1f 84 00 00 00
> > 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 84 00 00 00 00 00 53 48
> > 8b 5f 78 48 89 df <ff> 93 40 05 00 00 c6 83 c0 05 00 00 00 5b c3 66 66
> > 2e 0f 1f 84 00
> > <4>[    1.431401][  T447] RSP: 0018:ffffb3e740a13ba8 EFLAGS: 00010286
> > <4>[    1.431403][  T447] RAX: 0000000000000000 RBX: 0000000000000000
> > RCX: 0000000000000000

  $ ./scripts/decodecode < oops
    22:       53                      push   %rbx
    23:       48 8b 5f 78             mov    0x78(%rdi),%rbx
    27:       48 89 df                mov    %rbx,%rdi
    2a:*      ff 93 40 05 00 00       callq  *0x540(%rbx)             <-- trapping instruction
    30:       c6 83 c0 05 00 00 00    movb   $0x0,0x5c0(%rbx)
    37:       5b                      pop    %rbx
    38:       c3                      retq

  static int i2c_dw_pci_resume(struct device *dev)
  {
    struct dw_i2c_dev *i_dev = dev_get_drvdata(dev);
    int ret;

    ret = i_dev->init(i_dev);
    i_dev->suspended = false;

    return ret;

So I think we're trying to call i_dev->init(), which is a NULL
pointer.

> > <4>[    1.431422][  T447]  pci_pm_runtime_resume+0xaa/0x100
> > <4>[    1.431434][  T447]  __rpm_callback+0x3c/0x100
> > <4>[    1.431442][  T447]  rpm_callback+0x54/0x80
> > <4>[    1.431445][  T447]  rpm_resume+0x410/0x700
> > <4>[    1.431455][  T447]  __pm_runtime_resume+0x45/0x80
> > <4>[    1.431457][  T447]  pci_device_probe+0xa2/0x140
> > <4>[    1.431459][  T447]  really_probe+0x1e4/0x400
> > <4>[    1.431464][  T447]  __driver_probe_device+0xf9/0x180
> > <4>[    1.431466][  T447]  driver_probe_device+0x19/0xc0

I think the problem here is that:

  - really_probe() sets dev->driver

  - local_pci_probe() calls pm_runtime_get_sync(), which leads to:

  - pci_pm_runtime_resume(), which previously skipped the driver's
    .runtime_resume() method when "pci_dev->driver" as NULL

  - after 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of
    pci_dev->driver") [1], it checks "dev->driver" instead of
    "pci_dev->driver"

  - dev->driver is non-NULL (set by really_probe() above), but at this
    point pci_dev->driver used to be NULL because local_pci_probe()
    didn't set it until after after calling pm_runtime_get_sync() (see
    b5f9c644eb1b ("PCI: Remove struct pci_dev->driver") [2])

  - because dev->driver is non-NULL, we call i2c_dw_pci_resume()
    before i2c_dw_pci_probe(), so the driver init hasn't been done

Here's the call tree:

    really_probe
      dev->driver = drv;                       # <--
      call_driver_probe
        dev->bus->probe
          pci_device_probe
            __pci_device_probe
              pci_call_probe
                local_pci_probe
                  pm_runtime_get_sync
                    ...
                    pci_pm_runtime_resume
  -                   if (!pci_dev->driver)    # 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of pci_dev->driver")
  +                   if (!to_pci_driver(dev->driver))
                        return 0
                      pm->runtime_resume
                        i2c_dw_pci_resume
                          i_dev->init()        # <-- NULL ptr deref
  -                 pci_dev->driver = pci_drv  # b5f9c644eb1b ("PCI: Remove struct pci_dev->driver")
                  pci_drv->probe
                    i2c_dw_pci_probe

Note that we used to call pm_runtime_get_sync() (and the driver's
runtime_resume() method) in the window where dev->driver had been set
but pci_dev->driver had not.  Since we got rid of pci_dev->driver
altogether, that window no longer exists, so we call the driver's
runtime_resume() when it isn't prepared for it.

[1] https://git.kernel.org/linus/2a4d9408c9e8
[2] https://git.kernel.org/linus/b5f9c644eb1b

  reply	other threads:[~2021-11-08 18:58 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAP145pgwt7svtDwcD=AStKTt_GSN-ZqPL2u74Y63TAY5ghAagQ@mail.gmail.com>
     [not found] ` <CAP145pgrL-tOHrxsKwk_yzQihyk4TMFrgBb6zhNgC1i2wUTCeQ@mail.gmail.com>
2021-11-08 15:37   ` Fwd: Crashes in 5.15-git in i2c code Robert Święcki
2021-11-08 16:34     ` Robert Święcki
2021-11-08 18:58       ` Bjorn Helgaas [this message]
2021-11-08 19:09         ` Bjorn Helgaas
2021-11-08 21:22         ` [PATCH] pci: Don't call resume callback for nearly bound devices Uwe Kleine-König
2021-11-08 21:36           ` Robert Święcki
2021-11-09  0:00             ` Krzysztof Wilczyński
2021-11-09  2:56           ` Bjorn Helgaas
2021-11-09  6:42             ` Greg Kroah-Hartman
2021-11-09  6:59             ` Uwe Kleine-König
2021-11-09 12:42               ` Robert Święcki
2021-11-10 21:26                 ` Bjorn Helgaas
2021-11-10 22:01                   ` Robert Święcki
2021-11-09 17:18               ` Rafael J. Wysocki
2021-11-09 18:12                 ` Bjorn Helgaas
2021-11-09 18:52                   ` Rafael J. Wysocki
2021-11-09 18:58                     ` Rafael J. Wysocki
2021-11-09 20:05                       ` Bjorn Helgaas
2021-11-09 20:43                         ` Uwe Kleine-König
2021-11-10 14:14                         ` Bjorn Helgaas
2021-11-10 16:33                           ` Robert Święcki
2021-11-10 16:48                             ` Rafael J. Wysocki
2021-11-10 17:59                               ` Bjorn Helgaas
2021-11-10 21:19                             ` Bjorn Helgaas
2021-11-11 17:01                               ` Bjorn Helgaas
2021-11-11 17:32                                 ` Robert Święcki
2021-11-11 18:09                                   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211108185823.GA1101310@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=linux-i2c@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=robert@swiecki.net \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.