linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Hang while booting 4.15.7
@ 2018-03-08 16:21 Brian Rak
  2018-03-08 17:49 ` Randy Dunlap
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Rak @ 2018-03-08 16:21 UTC (permalink / raw)
  To: linux-kernel

We have some Dell servers running Intel Gold 6126 processors. Some of 
them hang on boot under 4.15.7,  but work fine on 4.14.14.  When they 
hang, we see the following on console:

Error parsing PCC subspaces from PCCT
watchdog: BUG: soft lockup - CPU #16 stuck for 23s! [swapper/0:1]

We see that PCC subspaces error under 4.14 as well, but it doesn't cause 
the machine to hang.

So far we haven't been able to correlate these hangs with anything in 
particular.  Some machines will hang, some machines will boot.  They're 
otherwise identical as far as hardware and firmware goes.

I've tried pcie_aspm=off, since that seems to be the next bit of code 
that's being executed.  This resulted in the machine booting a little 
further, but then oopsing somewhere in acpi_os_purge_cache. I'm not able 
to get a full trace there, as I don't have serial access easily available.

Any suggestions?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Hang while booting 4.15.7
  2018-03-08 16:21 Hang while booting 4.15.7 Brian Rak
@ 2018-03-08 17:49 ` Randy Dunlap
  2018-03-08 18:02   ` Brian Rak
  0 siblings, 1 reply; 4+ messages in thread
From: Randy Dunlap @ 2018-03-08 17:49 UTC (permalink / raw)
  To: Brian Rak, linux-kernel

On 03/08/2018 08:21 AM, Brian Rak wrote:
> We have some Dell servers running Intel Gold 6126 processors. Some of them hang on boot under 4.15.7,  but work fine on 4.14.14.  When they hang, we see the following on console:
> 
> Error parsing PCC subspaces from PCCT
> watchdog: BUG: soft lockup - CPU #16 stuck for 23s! [swapper/0:1]
> 
> We see that PCC subspaces error under 4.14 as well, but it doesn't cause the machine to hang.
> 
> So far we haven't been able to correlate these hangs with anything in particular.  Some machines will hang, some machines will boot.  They're otherwise identical as far as hardware and firmware goes.
> 
> I've tried pcie_aspm=off, since that seems to be the next bit of code that's being executed.  This resulted in the machine booting a little further, but then oopsing somewhere in acpi_os_purge_cache. I'm not able to get a full trace there, as I don't have serial access easily available.
> 
> Any suggestions?
> 

Hi,

The first thing that I would do is boot with:
  ignore_loglevel initcall_debug
on the kernel boot command line.

That will add lots of messages and maybe give us a stronger hint about where
the hang is actually happening.

And then worst case (without a boot log via serial console or netconsole) is
to take a photo of the screen with the oops messages.

And if you are fairly certain that it's an ACPI issue, also write to the
linux-acpi@vger.kernel.org mailing list.

-- 
~Randy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Hang while booting 4.15.7
  2018-03-08 17:49 ` Randy Dunlap
@ 2018-03-08 18:02   ` Brian Rak
  2018-03-08 18:11     ` Brian Rak
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Rak @ 2018-03-08 18:02 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel, linux-acpi



On 3/8/2018 12:49 PM, Randy Dunlap wrote:
> On 03/08/2018 08:21 AM, Brian Rak wrote:
>> We have some Dell servers running Intel Gold 6126 processors. Some of them hang on boot under 4.15.7,  but work fine on 4.14.14.  When they hang, we see the following on console:
>>
>> Error parsing PCC subspaces from PCCT
>> watchdog: BUG: soft lockup - CPU #16 stuck for 23s! [swapper/0:1]
>>
>> We see that PCC subspaces error under 4.14 as well, but it doesn't cause the machine to hang.
>>
>> So far we haven't been able to correlate these hangs with anything in particular.  Some machines will hang, some machines will boot.  They're otherwise identical as far as hardware and firmware goes.
>>
>> I've tried pcie_aspm=off, since that seems to be the next bit of code that's being executed.  This resulted in the machine booting a little further, but then oopsing somewhere in acpi_os_purge_cache. I'm not able to get a full trace there, as I don't have serial access easily available.
>>
>> Any suggestions?
>>
> Hi,
>
> The first thing that I would do is boot with:
>    ignore_loglevel initcall_debug
> on the kernel boot command line.
>
> That will add lots of messages and maybe give us a stronger hint about where
> the hang is actually happening.
>
> And then worst case (without a boot log via serial console or netconsole) is
> to take a photo of the screen with the oops messages.
>
> And if you are fairly certain that it's an ACPI issue, also write to the
> linux-acpi@vger.kernel.org mailing list.
>
Thanks!

I booted with those parameters, and this certainly seems like an ACPI 
issue.  During bootup, the machine paused here for about 20s:
https://www.dropbox.com/s/39us0tlhbzuay7t/2018-03-08%2012_52_35.png?dl=0

then it started printing this trace:
https://www.dropbox.com/s/nxdhm19wcitrgm0/2018-03-08%2012_53_04.png?dl=0

(I can't see to figure out a decent way to capture the beginning of the 
trace here, I'll have to see if I can get the serial console working)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Hang while booting 4.15.7
  2018-03-08 18:02   ` Brian Rak
@ 2018-03-08 18:11     ` Brian Rak
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Rak @ 2018-03-08 18:11 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel, linux-acpi



On 3/8/2018 1:02 PM, Brian Rak wrote:
>
>
> On 3/8/2018 12:49 PM, Randy Dunlap wrote:
>> On 03/08/2018 08:21 AM, Brian Rak wrote:
>>> We have some Dell servers running Intel Gold 6126 processors. Some 
>>> of them hang on boot under 4.15.7, but work fine on 4.14.14.  When 
>>> they hang, we see the following on console:
>>>
>>> Error parsing PCC subspaces from PCCT
>>> watchdog: BUG: soft lockup - CPU #16 stuck for 23s! [swapper/0:1]
>>>
>>> We see that PCC subspaces error under 4.14 as well, but it doesn't 
>>> cause the machine to hang.
>>>
>>> So far we haven't been able to correlate these hangs with anything 
>>> in particular.  Some machines will hang, some machines will boot.  
>>> They're otherwise identical as far as hardware and firmware goes.
>>>
>>> I've tried pcie_aspm=off, since that seems to be the next bit of 
>>> code that's being executed.  This resulted in the machine booting a 
>>> little further, but then oopsing somewhere in acpi_os_purge_cache. 
>>> I'm not able to get a full trace there, as I don't have serial 
>>> access easily available.
>>>
>>> Any suggestions?
>>>
>> Hi,
>>
>> The first thing that I would do is boot with:
>>    ignore_loglevel initcall_debug
>> on the kernel boot command line.
>>
>> That will add lots of messages and maybe give us a stronger hint 
>> about where
>> the hang is actually happening.
>>
>> And then worst case (without a boot log via serial console or 
>> netconsole) is
>> to take a photo of the screen with the oops messages.
>>
>> And if you are fairly certain that it's an ACPI issue, also write to the
>> linux-acpi@vger.kernel.org mailing list.
>>
> Thanks!
>
> I booted with those parameters, and this certainly seems like an ACPI 
> issue.  During bootup, the machine paused here for about 20s:
> https://www.dropbox.com/s/39us0tlhbzuay7t/2018-03-08%2012_52_35.png?dl=0
>
> then it started printing this trace:
> https://www.dropbox.com/s/nxdhm19wcitrgm0/2018-03-08%2012_53_04.png?dl=0
>
> (I can't see to figure out a decent way to capture the beginning of 
> the trace here, I'll have to see if I can get the serial console working)
I got the serial console working, that's just the end of the CPU stuck 
message:

https://gist.githubusercontent.com/devicenull/abe9022877d0a7354fa2ffc8b8a8f042/raw/e497624f90037eb272760f7c5c3d2a0f21e5ea83/gistfile1.txt

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-03-08 18:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08 16:21 Hang while booting 4.15.7 Brian Rak
2018-03-08 17:49 ` Randy Dunlap
2018-03-08 18:02   ` Brian Rak
2018-03-08 18:11     ` Brian Rak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).