linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Jesse Hathaway <jesse@mbuki-mvuki.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: Regression causes a hang on boot with a Comtrol PCI card
Date: Thu, 21 Mar 2019 18:23:10 -0500	[thread overview]
Message-ID: <20190321232310.GL251185@google.com> (raw)
In-Reply-To: <CANSNSoXPFDu9RQAUA6dUCUrSCAj68q-Nj2W7ECz3fKpFtSNU+Q@mail.gmail.com>

On Thu, Mar 14, 2019 at 03:57:07PM -0500, Jesse Hathaway wrote:
> > > 1302fcf0d03e (refs/bisect/bad) PCI: Configure *all* devices, not just
> > > hot-added ones
> > > 1c3c5eab1715 sched/core: Enable might_sleep() and smp_processor_id()
> > > checks early
> >
> > How did you narrow it down to *two* commits, and do you have to revert
> > both of them to avoid the hang?  Usually a bisection identifies a
> > single commit, and the two you mention aren't related.
> 
> Sorry I should have been more verbose in what the bisection process was, I
> found the problem after attempting to upgrade from linux v3.16 to v4.9. When
> v4.9 hung I tried the latest kernel, v5.0, which also hanged. I began a git
> bisect, but found there was more than one bad commit. Here is my current
> understanding:
> 
> - [x] v3.18 vanilla, 1302fcf0d03e committed, hangs
> - [x] v3.18 with revert of 1302fcf0d03e, works
> .
> .
> .
> - [x] v4.12 vanilla, hangs
> - [x] v4.12 with revert of 1302fcf0d03e, works
> 
> - [x] v4.13 vanilla, 1c3c5eab1715 committed, hangs
> - [x] v4.13 with revert of 1302fcf0d03e, hangs
> - [x] v4.13 with revert of 1c3c5eab1715, hangs
> - [x] v4.13 with revert of 1302fcf0d03e & 1c3c5eab1715, works
> 
> - [x] v5.0 vanilla, hangs
> - [x] v5.0 with revert of 1302fcf0d03e & 1c3c5eab1715, works

Thanks!  I doubt either of those commits is the real problem, but
they're both related to system_state, so it's conceivable they're both
involved in exposing the problem.

> > Can you collect a complete dmesg log (with a working kernel) and
> > output of "sudo lspci -vvxxx"?  You can open a bug report at
> > https://bugzilla.kernel.org, attach the logs there, and respond here
> > with the URL.
> 
> Bug submitted along with the requested logs,
> https://bugzilla.kernel.org/show_bug.cgi?id=202927

Thanks for that.

> > Where does the hang happen?  Is it when we configure the Comtrol card?
> 
> Hang occurs after PCI is initialized, snippet below, I have included the full
> output in the bug report:
> 
> [   10.561971] pci 0000:81:00.0:   bridge window [mem 0xc8000000-0xc80fffff]
> [   10.569661] pci 0000:80:01.0: PCI bridge to [bus 81-82]
> [   10.575594] pci 0000:80:01.0:   bridge window [mem 0xc8000000-0xc80fffff]
> [   10.583278] pci 0000:80:03.0: PCI bridge to [bus 83]
> [   10.589008] NET: Registered protocol family 2
> [   10.594254] tcp_listen_portaddr_hash hash table entries: 65536
> (order: 8, 1048576 bytes)
> [   10.603671] TCP established hash table entries: 524288 (order: 10,
> 4194304 bytes)
> [   10.612729] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
> [   10.620446] TCP: Hash tables configured (established 524288 bind 65536)
> [   10.628124] UDP hash table entries: 65536 (order: 9, 2097152 bytes)
> [   10.635541] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes)
> [   10.643669] NET: Registered protocol family 1

The successful boot continues on with this:

  [   10.675996] pci 0000:00:1a.0: quirk_usb_early_handoff+0x0/0x6a0 took 22519 usecs
  [   10.684519] pci 0000:03:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD for)
  [   10.696404] pci 0000:03:00.0: quirk_blacklist_vpd+0x0/0x30 took 11605 usecs
  [   10.704515] pci 0000:0b:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]

So apparently the hang happens while we're running the "final" PCI
fixups.  This happens after all the rest of PCI is initialized.

Can you boot v5.0 vanilla with "initcall_debug"?  Maybe we can narrow
it down to a specific quirk.

Bjorn

  parent reply	other threads:[~2019-03-21 23:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-13 16:50 Regression causes a hang on boot with a Comtrol PCI card Jesse Hathaway
2019-03-13 23:21 ` Bjorn Helgaas
2019-03-14 20:57   ` Jesse Hathaway
2019-03-21 20:36     ` Jesse Hathaway
2019-03-21 23:23     ` Bjorn Helgaas [this message]
2019-03-22 20:02       ` Jesse Hathaway
2019-04-01 19:43         ` Jesse Hathaway
2019-04-01 21:13         ` Bjorn Helgaas
2019-04-02 14:29           ` Alan Stern
2019-04-02 14:49             ` Mathias Nyman
2019-04-02 18:26               ` Alan Stern
2019-04-04 15:41             ` Jesse Hathaway
2019-04-04 17:16               ` Alan Stern
     [not found] <CANSNSoWL-2hP6j+nQwjr26vUmvyJ_Y1c9CrJ4bHnuqYCXhecdg@mail.gmail.com>
2019-04-04 19:14 ` Alan Stern
2019-04-05 21:27   ` Jesse Hathaway
2019-04-06 15:32     ` Alan Stern
2019-04-15 21:47       ` Jesse Hathaway
2019-04-16 15:00         ` Alan Stern
2019-04-23 20:18           ` Jesse Hathaway
2019-04-24 14:20             ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190321232310.GL251185@google.com \
    --to=helgaas@kernel.org \
    --cc=jesse@mbuki-mvuki.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).