linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [Bug 204789] Boot failure with more than 256G of memory on POWER/ppc64
       [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
@ 2019-10-01  9:56 ` bugzilla-daemon
  2019-10-01  9:57 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-10-01  9:56 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=204789

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michael@ellerman.id.au
          Component|Other                       |PPC-64
           Assignee|akpm@linux-foundation.org   |platform_ppc-64@kernel-bugs
                   |                            |.osdl.org
            Product|Memory Management           |Platform Specific/Hardware

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 204789] Boot failure with more than 256G of memory on POWER/ppc64
       [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
  2019-10-01  9:56 ` [Bug 204789] Boot failure with more than 256G of memory on POWER/ppc64 bugzilla-daemon
@ 2019-10-01  9:57 ` bugzilla-daemon
  2019-10-01 10:04 ` [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU bugzilla-daemon
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-10-01  9:57 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=204789

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU
       [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
  2019-10-01  9:56 ` [Bug 204789] Boot failure with more than 256G of memory on POWER/ppc64 bugzilla-daemon
  2019-10-01  9:57 ` bugzilla-daemon
@ 2019-10-01 10:04 ` bugzilla-daemon
  2019-10-01 16:01 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-10-01 10:04 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=204789

--- Comment #10 from Michael Ellerman (michael@ellerman.id.au) ---
Can you boot a good kernel and do:

$ sudo grep RAM /proc/iomem

And paste the output. Just to confirm what your memory layout is.

What arrangement of DIMMs do you have? It's possible you could work around the
bug by changing that, depending on how many DIMMs and slots you have.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU
       [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2019-10-01 10:04 ` [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU bugzilla-daemon
@ 2019-10-01 16:01 ` bugzilla-daemon
  2019-10-05 20:07 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-10-01 16:01 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=204789

--- Comment #11 from Cameron (cam@neo-zeon.de) ---
grep RAM /proc/iomem
00000000-3fffffffff : System RAM

The system has 16 dimm slots, all are populated. Unfortunately, I will 
not have physical to access to the box in the foreseeable future.

Aneesh appears to be correct in that this issue started with 0034d395f89d.

On 10/1/19 3:04 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=204789
>
> --- Comment #10 from Michael Ellerman (michael@ellerman.id.au) ---
> Can you boot a good kernel and do:
>
> $ sudo grep RAM /proc/iomem
>
> And paste the output. Just to confirm what your memory layout is.
>
> What arrangement of DIMMs do you have? It's possible you could work around
> the
> bug by changing that, depending on how many DIMMs and slots you have.
>

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU
       [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2019-10-01 16:01 ` bugzilla-daemon
@ 2019-10-05 20:07 ` bugzilla-daemon
  2020-11-28 16:48 ` bugzilla-daemon
  2020-11-30  1:49 ` bugzilla-daemon
  6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-10-05 20:07 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=204789

Samuel Holland (samuel@sholland.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |samuel@sholland.org

--- Comment #12 from Samuel Holland (samuel@sholland.org) ---
I am also experiencing this issue on a Talos II, however with much less RAM.
Right now I have 16 GB attached to each CPU:

# grep RAM /proc/iomem 
00000000-3ffffffff : System RAM
200000000000-2003ffffffff : System RAM

Without the patchset linked above, I also have a failure to boot with 5.2 and
later kernels.

(/proc/cmdline)
console=hvc0 disable_radix ignore_loglevel no_console_suspend

With the first patch from the patchset linked above, the RAM attached to the
second node is ignored, as expected, but the system boots and otherwise runs
fine.

With the full patchset linked above, I get panics on boot, as mentioned in
comment 9:

[    5.286513] Oops: Machine check, sig: 7 [#1]
[    5.286536] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=256 NUMA PowerNV
[    5.286545] Modules linked in: soundcore
[    5.286554] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G   M             
5.3.4-00012-g8fc24abb8c31 #1
[    5.286569] NIP:  0000000000000000 LR: 7265677368657265 CTR:
0000000000000000
[    5.286590] REGS: c0000003ffb66fb0 TRAP: c00000000120dd00   Tainted: G   M  
            (5.3.4-00012-g8fc24abb8c31)
[    5.286602] MSR:  0000000000000000 <>  CR: c000000000036f04  XER: 00000000
[    5.286611] CFAR: 0000000000000000 IRQMASK: c0000003ffb67370
[    5.286611] GPR00: 0000000000000000 c0003d0000083d28 ffffffffffffffff
0000000006000000
[    5.286611] GPR04: 0500000002010101 00c75e1bc4c00a58 ffffffffffffffff
c000000000036530
[    5.286611] GPR08: c0003d0000083d28 c0000003ffb67510 0000000000000000
c0000003ffb670e0
[    5.286611] GPR12: c0000003ffb67040 8804422200000000 c0000000000804ec
c00000000120dd00
[    5.286611] GPR16: 0000000000000000 c0000003ffb673fc c0000003ffb67070
0000000000000000
[    5.286611] GPR20: c0000000000367f4 c00000000120dd00 0000000000000000
c0000003ffb670e0
[    5.286611] GPR24: c0000003ffb67370 0000000000000000 c000000000008380
0000000000000000
[    5.286611] GPR28: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[    5.286777] NIP [0000000000000000] 0x0
[    5.286792] LR [7265677368657265] 0x7265677368657265
[    5.286809] Call Trace:
[    5.286823] Instruction dump:
[    5.286838] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX
[    5.286858] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 60000000 60000000 60000000
60000000
[    5.286889] ---[ end trace 60912b64b73c973e ]---
[    5.819189]
[    5.819203] Oops: Machine check, sig: 7 [#2]
[    5.819205] Disabling lock debugging due to kernel taint
[    5.819223] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=256 NUMA PowerNV
[    5.819233] Modules linked in: snd_hda_intel(+) snd_hda_codec snd_hwdep
snd_hda_core snd_pcm tg3(+) snd_timer snd libphy ttm soundcore
[    5.819264] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G   M  D          
5.3.4-00012-g8fc24abb8c31 #1                                                    
[    5.819286] NIP:  0000000000000000 LR: 7265677368657265 CTR:
0000000000000000
[    5.819315] REGS: c0000003ffb7efb0 TRAP: c00000000120dd00   Tainted: G   M 
D            (5.3.4-00012-g8fc24abb8c31)                                        
[    5.819328] MSR:  0000000000000000 <>  CR: c000000000036f04  XER: 00000000
[    5.819347] CFAR: 0000000000000000 IRQMASK: c0000003ffb7f370
[    5.819347] GPR00: 0000000000000000 c0003d0000063d28 ffffffffffffffff
0000000006000000
[    5.819347] GPR04: 0500000002010101 000ed5d8325a5873 ffffffffffffffff
c000000000036530
[    5.819347] GPR08: c0003d0000063d28 c0000003ffb7f510 0000000000000000
c0000003ffb7f0e0
[    5.819347] GPR12: c0000003ffb7f040 8804424200000000 c0000000000804ec
c00000000120dd00
[    5.819347] GPR16: 0000000000000000 c0000003ffb7f3fc c0000003ffb7f070
0000000000000000
[    5.819347] GPR20: c0000000000367f4 c00000000120dd00 0000000000000000
c0000003ffb7f0e0
[    5.819347] GPR24: c0000003ffb7f370 0000000000000000 c000000000008380
0000000000000000
[    5.819347] GPR28: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[    5.819537] NIP [0000000000000000] 0x0
[    5.819554] LR [7265677368657265] 0x7265677368657265
[    5.819562] Call Trace:
[    5.819567] Instruction dump:
[    5.819573] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX                                                                        
[    5.819603] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 60000000 60000000 60000000
60000000                                                                        
[    5.819648] ---[ end trace 60912b64b73c973f ]---                             
[    6.311806]                                                                  
[    6.311820] Oops: Machine check, sig: 7 [#3]                                 
[    6.311829] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=256 NUMA PowerNV            
[    6.311839] Modules linked in: snd_hda_intel(+) snd_hda_codec snd_hwdep
snd_hda_core snd_pcm tg3(+) snd_timer snd libphy ttm soundcore                  
[    6.311869] CPU: 0 PID: 734 Comm: udevd Tainted: G   M  D          
5.3.4-00012-g8fc24abb8c31 #1                                                    
[    6.311882] NIP:  0000000000000000 LR: 7265677368657265 CTR:
0000000000000000                                                                
[    6.311903] REGS: c0000003ffbc6fb0 TRAP: c00000000120dd00   Tainted: G   M 
D            (5.3.4-00012-g8fc24abb8c31)                                        
[    6.311917] MSR:  0000000000000000 <>  CR: c000000000036f04  XER: 00000000   
[    6.311937] CFAR: 0000000000000000 IRQMASK: c0000003ffbc7370                 
[    6.311937] GPR00: 0000000000000000 c0003d0000003d28 ffffffffffffffff
0000000006000000                                                                
[    6.311937] GPR04: 0500000002010101 00b492f4c8c2175c ffffffffffffffff
c000000000036530                                                                
[    6.311937] GPR08: c0003d0000003d28 c0000003ffbc7510 0000000000000000
c0000003ffbc70e0                                                                
[    6.311937] GPR12: c0000003ffbc7040 8024428200000000 c0000000000804ec
c00000000120dd00                                                                
[    6.311937] GPR16: 0000000000000000 c0000003ffbc73fc c0000003ffbc7070
0000000000000000                                                                
[    6.311937] GPR20: c0000000000367f4 c00000000120dd00 0000000000000000
c0000003ffbc70e0                                                                
[    6.311937] GPR24: c0000003ffbc7370 0000000000000000 c000000000008380
0000000000000000                                                                
[    6.311937] GPR28: 0000000000000000 0000000000000000 0000000000000000
0000000000000000                                                                
[    6.312109] NIP [0000000000000000] 0x0                                       
[    6.312126] LR [7265677368657265] 0x7265677368657265                         
[    6.312143] Call Trace:                                                      
[    6.312148] Instruction dump:                                                
[    6.312155] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX                                                                        
[    6.312190] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 60000000 60000000 60000000
60000000                                                                        
[    6.312226] ---[ end trace 60912b64b73c9740 ]---                             
[    6.819242] Kernel panic - not syncing: Fatal

I have easy physical access to this machine, so I'd be able to try out patches
if needed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU
       [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2019-10-05 20:07 ` bugzilla-daemon
@ 2020-11-28 16:48 ` bugzilla-daemon
  2020-11-30  1:49 ` bugzilla-daemon
  6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-11-28 16:48 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=204789

Cameron (cam@neo-zeon.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #13 from Cameron (cam@neo-zeon.de) ---
This was resolved some time back by Aneesh and the patches made into mainline a
long time ago. Marking resolved.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU
       [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
                   ` (5 preceding siblings ...)
  2020-11-28 16:48 ` bugzilla-daemon
@ 2020-11-30  1:49 ` bugzilla-daemon
  6 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2020-11-30  1:49 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=204789

--- Comment #14 from Michael Ellerman (michael@ellerman.id.au) ---
The fix is:
  7746406baa3b ("powerpc/book3s64/hash/4k: Support large linear mapping range
with 4K")

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-30  1:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-204789-206035@https.bugzilla.kernel.org/>
2019-10-01  9:56 ` [Bug 204789] Boot failure with more than 256G of memory on POWER/ppc64 bugzilla-daemon
2019-10-01  9:57 ` bugzilla-daemon
2019-10-01 10:04 ` [Bug 204789] Boot failure with more than 256G of memory on Power9 with 4K pages & Hash MMU bugzilla-daemon
2019-10-01 16:01 ` bugzilla-daemon
2019-10-05 20:07 ` bugzilla-daemon
2020-11-28 16:48 ` bugzilla-daemon
2020-11-30  1:49 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).