linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* power9 NUMA crash while reading debugfs imc_cmd
@ 2019-06-27 21:21 Qian Cai
  2019-06-28  3:12 ` Michael Ellerman
  0 siblings, 1 reply; 6+ messages in thread
From: Qian Cai @ 2019-06-27 21:21 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Michael Ellerman, Anju T Sudhakar
  Cc: linuxppc-dev, linux-kernel

Read of debugfs imc_cmd file for a memory-less node will trigger a crash below
on this power9 machine which has the following NUMA layout. I don't understand
why I only saw it recently on linux-next where it was tested everyday. I can
reproduce it back to 4.20 where 4.18 seems work fine.

# cat /sys/kernel/debug/powerpc/imc/imc_cmd_252 (On a 4.18-based kernel)
0x0000000000000000

# numactl -H
available: 6 nodes (0,8,252-255)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
53 54 55 56 57 58 59 60 61 62 63
node 0 size: 130210 MB
node 0 free: 128406 MB
node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 8 size: 130784 MB
node 8 free: 130051 MB
node 252 cpus:
node 252 size: 0 MB
node 252 free: 0 MB
node 253 cpus:
node 253 size: 0 MB
node 253 free: 0 MB
node 254 cpus:
node 254 size: 0 MB
node 254 free: 0 MB
node 255 cpus:
node 255 size: 0 MB
node 255 free: 0 MB
node distances:
node   0   8  252  253  254  255 
  0:  10  40  80  80  80  80 
  8:  40  10  80  80  80  80 
 252:  80  80  10  80  80  80 
 253:  80  80  80  10  80  80 
 254:  80  80  80  80  10  80 
 255:  80  80  80  80  80  10

# cat /sys/kernel/debug/powerpc/imc/imc_cmd_252

[ 1139.415461][ T5301] Faulting instruction address: 0xc0000000000d0d58
[ 1139.415492][ T5301] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1139.415509][ T5301] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=256
DEBUG_PAGEALLOC NUMA PowerNV
[ 1139.415542][ T5301] Modules linked in: i2c_opal i2c_core ip_tables x_tables
xfs sd_mod bnx2x mdio ahci libahci tg3 libphy libata firmware_class dm_mirror
dm_region_hash dm_log dm_mod
[ 1139.415595][ T5301] CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-
20190627+ #19
[ 1139.415634][ T5301] NIP:  c0000000000d0d58 LR: c00000000049aa18 CTR:
c0000000000d0d50
[ 1139.415675][ T5301] REGS: c00020194548f9e0 TRAP: 0300   Not tainted  (5.2.0-
rc6-next-20190627+)
[ 1139.415705][ T5301] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
28022822  XER: 00000000
[ 1139.415777][ T5301] CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:
40000000 IRQMASK: 0 
[ 1139.415777][ T5301] GPR00: c00000000049aa18 c00020194548fc70 c0000000016f8b00
000000000003fc08 
[ 1139.415777][ T5301] GPR04: c00020194548fcd0 0000000000000000 0000000014884e73
ffffffff00011eaa 
[ 1139.415777][ T5301] GPR08: 000000007eea5a52 c0000000000d0d50 0000000000000000
0000000000000000 
[ 1139.415777][ T5301] GPR12: c0000000000d0d50 c000201fff7f8c00 0000000000000000
0000000000000000 
[ 1139.415777][ T5301] GPR16: 000000000000000d 00007fffeb0c3368 ffffffffffffffff
0000000000000000 
[ 1139.415777][ T5301] GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000020000 
[ 1139.415777][ T5301] GPR24: 0000000000000000 0000000000000000 0000000000020000
000000010ec90000 
[ 1139.415777][ T5301] GPR28: c00020194548fdf0 c00020049a584ef8 0000000000000000
c00020049a584ea8 
[ 1139.416116][ T5301] NIP [c0000000000d0d58] imc_mem_get+0x8/0x20
[ 1139.416143][ T5301] LR [c00000000049aa18] simple_attr_read+0x118/0x170
[ 1139.416158][ T5301] Call Trace:
[ 1139.416182][ T5301] [c00020194548fc70] [c00000000049a970]
simple_attr_read+0x70/0x170 (unreliable)
[ 1139.416255][ T5301] [c00020194548fd10] [c00000000054385c]
debugfs_attr_read+0x6c/0xb0
[ 1139.416305][ T5301] [c00020194548fd60] [c000000000454c1c]
__vfs_read+0x3c/0x70
[ 1139.416363][ T5301] [c00020194548fd80] [c000000000454d0c] vfs_read+0xbc/0x1a0
[ 1139.416392][ T5301] [c00020194548fdd0] [c00000000045519c]
ksys_read+0x7c/0x140
[ 1139.416434][ T5301] [c00020194548fe20] [c00000000000b108]
system_call+0x5c/0x70
[ 1139.416473][ T5301] Instruction dump:
[ 1139.416511][ T5301] 4e800020 60000000 7c0802a6 60000000 7c801d28 38600000
4e800020 60000000 
[ 1139.416572][ T5301] 60000000 60000000 7c0802a6 60000000 <7d201c28> 38600000
f9240000 4e800020 
[ 1139.416636][ T5301] ---[ end trace c44d1fb4ace04784 ]---
[ 1139.520686][ T5301] 
[ 1140.520820][ T5301] Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: power9 NUMA crash while reading debugfs imc_cmd
  2019-06-27 21:21 power9 NUMA crash while reading debugfs imc_cmd Qian Cai
@ 2019-06-28  3:12 ` Michael Ellerman
  2019-06-28  3:34   ` Qian Cai
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2019-06-28  3:12 UTC (permalink / raw)
  To: Qian Cai, Aneesh Kumar K.V, Anju T Sudhakar; +Cc: linuxppc-dev, linux-kernel

Qian Cai <cai@lca.pw> writes:
> Read of debugfs imc_cmd file for a memory-less node will trigger a crash below
> on this power9 machine which has the following NUMA layout.

What type of machine is it?

cheers

> I don't understand why I only saw it recently on linux-next where it
> was tested everyday. I can reproduce it back to 4.20 where 4.18 seems
> work fine.
>
> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252 (On a 4.18-based kernel)
> 0x0000000000000000
>
> # numactl -H
> available: 6 nodes (0,8,252-255)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
> 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
> 53 54 55 56 57 58 59 60 61 62 63
> node 0 size: 130210 MB
> node 0 free: 128406 MB
> node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
> 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
> 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
> node 8 size: 130784 MB
> node 8 free: 130051 MB
> node 252 cpus:
> node 252 size: 0 MB
> node 252 free: 0 MB
> node 253 cpus:
> node 253 size: 0 MB
> node 253 free: 0 MB
> node 254 cpus:
> node 254 size: 0 MB
> node 254 free: 0 MB
> node 255 cpus:
> node 255 size: 0 MB
> node 255 free: 0 MB
> node distances:
> node   0   8  252  253  254  255 
>   0:  10  40  80  80  80  80 
>   8:  40  10  80  80  80  80 
>  252:  80  80  10  80  80  80 
>  253:  80  80  80  10  80  80 
>  254:  80  80  80  80  10  80 
>  255:  80  80  80  80  80  10
>
> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252
>
> [ 1139.415461][ T5301] Faulting instruction address: 0xc0000000000d0d58
> [ 1139.415492][ T5301] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 1139.415509][ T5301] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=256
> DEBUG_PAGEALLOC NUMA PowerNV
> [ 1139.415542][ T5301] Modules linked in: i2c_opal i2c_core ip_tables x_tables
> xfs sd_mod bnx2x mdio ahci libahci tg3 libphy libata firmware_class dm_mirror
> dm_region_hash dm_log dm_mod
> [ 1139.415595][ T5301] CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-
> 20190627+ #19
> [ 1139.415634][ T5301] NIP:  c0000000000d0d58 LR: c00000000049aa18 CTR:
> c0000000000d0d50
> [ 1139.415675][ T5301] REGS: c00020194548f9e0 TRAP: 0300   Not tainted  (5.2.0-
> rc6-next-20190627+)
> [ 1139.415705][ T5301] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
> 28022822  XER: 00000000
> [ 1139.415777][ T5301] CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:
> 40000000 IRQMASK: 0 
> [ 1139.415777][ T5301] GPR00: c00000000049aa18 c00020194548fc70 c0000000016f8b00
> 000000000003fc08 
> [ 1139.415777][ T5301] GPR04: c00020194548fcd0 0000000000000000 0000000014884e73
> ffffffff00011eaa 
> [ 1139.415777][ T5301] GPR08: 000000007eea5a52 c0000000000d0d50 0000000000000000
> 0000000000000000 
> [ 1139.415777][ T5301] GPR12: c0000000000d0d50 c000201fff7f8c00 0000000000000000
> 0000000000000000 
> [ 1139.415777][ T5301] GPR16: 000000000000000d 00007fffeb0c3368 ffffffffffffffff
> 0000000000000000 
> [ 1139.415777][ T5301] GPR20: 0000000000000000 0000000000000000 0000000000000000
> 0000000000020000 
> [ 1139.415777][ T5301] GPR24: 0000000000000000 0000000000000000 0000000000020000
> 000000010ec90000 
> [ 1139.415777][ T5301] GPR28: c00020194548fdf0 c00020049a584ef8 0000000000000000
> c00020049a584ea8 
> [ 1139.416116][ T5301] NIP [c0000000000d0d58] imc_mem_get+0x8/0x20
> [ 1139.416143][ T5301] LR [c00000000049aa18] simple_attr_read+0x118/0x170
> [ 1139.416158][ T5301] Call Trace:
> [ 1139.416182][ T5301] [c00020194548fc70] [c00000000049a970]
> simple_attr_read+0x70/0x170 (unreliable)
> [ 1139.416255][ T5301] [c00020194548fd10] [c00000000054385c]
> debugfs_attr_read+0x6c/0xb0
> [ 1139.416305][ T5301] [c00020194548fd60] [c000000000454c1c]
> __vfs_read+0x3c/0x70
> [ 1139.416363][ T5301] [c00020194548fd80] [c000000000454d0c] vfs_read+0xbc/0x1a0
> [ 1139.416392][ T5301] [c00020194548fdd0] [c00000000045519c]
> ksys_read+0x7c/0x140
> [ 1139.416434][ T5301] [c00020194548fe20] [c00000000000b108]
> system_call+0x5c/0x70
> [ 1139.416473][ T5301] Instruction dump:
> [ 1139.416511][ T5301] 4e800020 60000000 7c0802a6 60000000 7c801d28 38600000
> 4e800020 60000000 
> [ 1139.416572][ T5301] 60000000 60000000 7c0802a6 60000000 <7d201c28> 38600000
> f9240000 4e800020 
> [ 1139.416636][ T5301] ---[ end trace c44d1fb4ace04784 ]---
> [ 1139.520686][ T5301] 
> [ 1140.520820][ T5301] Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: power9 NUMA crash while reading debugfs imc_cmd
  2019-06-28  3:12 ` Michael Ellerman
@ 2019-06-28  3:34   ` Qian Cai
  2019-06-28 11:49     ` Anju T Sudhakar
  0 siblings, 1 reply; 6+ messages in thread
From: Qian Cai @ 2019-06-28  3:34 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Aneesh Kumar K.V, Anju T Sudhakar, linuxppc-dev, linux-kernel



> On Jun 27, 2019, at 11:12 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> Qian Cai <cai@lca.pw> writes:
>> Read of debugfs imc_cmd file for a memory-less node will trigger a crash below
>> on this power9 machine which has the following NUMA layout.
> 
> What type of machine is it?

description: PowerNV
product: 8335-GTH (ibm,witherspoon)
vendor: IBM
width: 64 bits
capabilities: smp powernv opal

> 
> cheers
> 
>> I don't understand why I only saw it recently on linux-next where it
>> was tested everyday. I can reproduce it back to 4.20 where 4.18 seems
>> work fine.
>> 
>> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252 (On a 4.18-based kernel)
>> 0x0000000000000000
>> 
>> # numactl -H
>> available: 6 nodes (0,8,252-255)
>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
>> 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
>> 53 54 55 56 57 58 59 60 61 62 63
>> node 0 size: 130210 MB
>> node 0 free: 128406 MB
>> node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
>> 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
>> 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
>> node 8 size: 130784 MB
>> node 8 free: 130051 MB
>> node 252 cpus:
>> node 252 size: 0 MB
>> node 252 free: 0 MB
>> node 253 cpus:
>> node 253 size: 0 MB
>> node 253 free: 0 MB
>> node 254 cpus:
>> node 254 size: 0 MB
>> node 254 free: 0 MB
>> node 255 cpus:
>> node 255 size: 0 MB
>> node 255 free: 0 MB
>> node distances:
>> node   0   8  252  253  254  255 
>>   0:  10  40  80  80  80  80 
>>   8:  40  10  80  80  80  80 
>>  252:  80  80  10  80  80  80 
>>  253:  80  80  80  10  80  80 
>>  254:  80  80  80  80  10  80 
>>  255:  80  80  80  80  80  10
>> 
>> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252
>> 
>> [ 1139.415461][ T5301] Faulting instruction address: 0xc0000000000d0d58
>> [ 1139.415492][ T5301] Oops: Kernel access of bad area, sig: 11 [#1]
>> [ 1139.415509][ T5301] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=256
>> DEBUG_PAGEALLOC NUMA PowerNV
>> [ 1139.415542][ T5301] Modules linked in: i2c_opal i2c_core ip_tables x_tables
>> xfs sd_mod bnx2x mdio ahci libahci tg3 libphy libata firmware_class dm_mirror
>> dm_region_hash dm_log dm_mod
>> [ 1139.415595][ T5301] CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-
>> 20190627+ #19
>> [ 1139.415634][ T5301] NIP:  c0000000000d0d58 LR: c00000000049aa18 CTR:
>> c0000000000d0d50
>> [ 1139.415675][ T5301] REGS: c00020194548f9e0 TRAP: 0300   Not tainted  (5.2.0-
>> rc6-next-20190627+)
>> [ 1139.415705][ T5301] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
>> 28022822  XER: 00000000
>> [ 1139.415777][ T5301] CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:
>> 40000000 IRQMASK: 0 
>> [ 1139.415777][ T5301] GPR00: c00000000049aa18 c00020194548fc70 c0000000016f8b00
>> 000000000003fc08 
>> [ 1139.415777][ T5301] GPR04: c00020194548fcd0 0000000000000000 0000000014884e73
>> ffffffff00011eaa 
>> [ 1139.415777][ T5301] GPR08: 000000007eea5a52 c0000000000d0d50 0000000000000000
>> 0000000000000000 
>> [ 1139.415777][ T5301] GPR12: c0000000000d0d50 c000201fff7f8c00 0000000000000000
>> 0000000000000000 
>> [ 1139.415777][ T5301] GPR16: 000000000000000d 00007fffeb0c3368 ffffffffffffffff
>> 0000000000000000 
>> [ 1139.415777][ T5301] GPR20: 0000000000000000 0000000000000000 0000000000000000
>> 0000000000020000 
>> [ 1139.415777][ T5301] GPR24: 0000000000000000 0000000000000000 0000000000020000
>> 000000010ec90000 
>> [ 1139.415777][ T5301] GPR28: c00020194548fdf0 c00020049a584ef8 0000000000000000
>> c00020049a584ea8 
>> [ 1139.416116][ T5301] NIP [c0000000000d0d58] imc_mem_get+0x8/0x20
>> [ 1139.416143][ T5301] LR [c00000000049aa18] simple_attr_read+0x118/0x170
>> [ 1139.416158][ T5301] Call Trace:
>> [ 1139.416182][ T5301] [c00020194548fc70] [c00000000049a970]
>> simple_attr_read+0x70/0x170 (unreliable)
>> [ 1139.416255][ T5301] [c00020194548fd10] [c00000000054385c]
>> debugfs_attr_read+0x6c/0xb0
>> [ 1139.416305][ T5301] [c00020194548fd60] [c000000000454c1c]
>> __vfs_read+0x3c/0x70
>> [ 1139.416363][ T5301] [c00020194548fd80] [c000000000454d0c] vfs_read+0xbc/0x1a0
>> [ 1139.416392][ T5301] [c00020194548fdd0] [c00000000045519c]
>> ksys_read+0x7c/0x140
>> [ 1139.416434][ T5301] [c00020194548fe20] [c00000000000b108]
>> system_call+0x5c/0x70
>> [ 1139.416473][ T5301] Instruction dump:
>> [ 1139.416511][ T5301] 4e800020 60000000 7c0802a6 60000000 7c801d28 38600000
>> 4e800020 60000000 
>> [ 1139.416572][ T5301] 60000000 60000000 7c0802a6 60000000 <7d201c28> 38600000
>> f9240000 4e800020 
>> [ 1139.416636][ T5301] ---[ end trace c44d1fb4ace04784 ]---
>> [ 1139.520686][ T5301] 
>> [ 1140.520820][ T5301] Kernel panic - not syncing: Fatal exception


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: power9 NUMA crash while reading debugfs imc_cmd
  2019-06-28  3:34   ` Qian Cai
@ 2019-06-28 11:49     ` Anju T Sudhakar
  2019-06-28 13:00       ` Qian Cai
  0 siblings, 1 reply; 6+ messages in thread
From: Anju T Sudhakar @ 2019-06-28 11:49 UTC (permalink / raw)
  To: Qian Cai; +Cc: Michael Ellerman, Aneesh Kumar K.V, linuxppc-dev, linux-kernel


On 6/28/19 9:04 AM, Qian Cai wrote:
>
>> On Jun 27, 2019, at 11:12 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> Qian Cai <cai@lca.pw> writes:
>>> Read of debugfs imc_cmd file for a memory-less node will trigger a crash below
>>> on this power9 machine which has the following NUMA layout.
>> What type of machine is it?
> description: PowerNV
> product: 8335-GTH (ibm,witherspoon)
> vendor: IBM
> width: 64 bits
> capabilities: smp powernv opal


Hi Qian Cai,

Could you please try with this patch: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/192803.html

and see if the issue is resolved?


Thanks,

Anju



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: power9 NUMA crash while reading debugfs imc_cmd
  2019-06-28 11:49     ` Anju T Sudhakar
@ 2019-06-28 13:00       ` Qian Cai
  2019-06-29 11:22         ` Michael Ellerman
  0 siblings, 1 reply; 6+ messages in thread
From: Qian Cai @ 2019-06-28 13:00 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: Michael Ellerman, Aneesh Kumar K.V, linuxppc-dev, linux-kernel

On Fri, 2019-06-28 at 17:19 +0530, Anju T Sudhakar wrote:
> On 6/28/19 9:04 AM, Qian Cai wrote:
> > 
> > > On Jun 27, 2019, at 11:12 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> > > 
> > > Qian Cai <cai@lca.pw> writes:
> > > > Read of debugfs imc_cmd file for a memory-less node will trigger a crash
> > > > below
> > > > on this power9 machine which has the following NUMA layout.
> > > 
> > > What type of machine is it?
> > 
> > description: PowerNV
> > product: 8335-GTH (ibm,witherspoon)
> > vendor: IBM
> > width: 64 bits
> > capabilities: smp powernv opal
> 
> 
> Hi Qian Cai,
> 
> Could you please try with this patch: 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/192803.html
> 
> and see if the issue is resolved?

It works fine.

Just feel a bit silly that a node without CPU and memory is still online by
default during boot at the first place on powerpc, but that is probably a
different issue. For example,

# numactl -H
available: 6 nodes (0,8,252-255)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
53 54 55 56 57 58 59 60 61 62 63
node 0 size: 126801 MB
node 0 free: 123199 MB
node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 8 size: 130811 MB
node 8 free: 128436 MB
node 252 cpus:
node 252 size: 0 MB
node 252 free: 0 MB
node 253 cpus:
node 253 size: 0 MB
node 253 free: 0 MB
node 254 cpus:
node 254 size: 0 MB
node 254 free: 0 MB
node 255 cpus:
node 255 size: 0 MB
node 255 free: 0 MB
node distances:
node   0   8  252  253  254  255 
  0:  10  40  80  80  80  80 
  8:  40  10  80  80  80  80 
 252:  80  80  10  80  80  80 
 253:  80  80  80  10  80  80 
 254:  80  80  80  80  10  80 
 255:  80  80  80  80  80  10 

# cat /sys/devices/system/node/online 
0,8,252-255




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: power9 NUMA crash while reading debugfs imc_cmd
  2019-06-28 13:00       ` Qian Cai
@ 2019-06-29 11:22         ` Michael Ellerman
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2019-06-29 11:22 UTC (permalink / raw)
  To: Qian Cai, Anju T Sudhakar
  Cc: Aneesh Kumar K.V, linuxppc-dev, linux-kernel, Reza Arbab

Qian Cai <cai@lca.pw> writes:
> On Fri, 2019-06-28 at 17:19 +0530, Anju T Sudhakar wrote:
>> On 6/28/19 9:04 AM, Qian Cai wrote:
>> > 
>> > > On Jun 27, 2019, at 11:12 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>> > > 
>> > > Qian Cai <cai@lca.pw> writes:
>> > > > Read of debugfs imc_cmd file for a memory-less node will trigger a crash
>> > > > below
>> > > > on this power9 machine which has the following NUMA layout.
>> > > 
>> > > What type of machine is it?
>> > 
>> > description: PowerNV
>> > product: 8335-GTH (ibm,witherspoon)
>> > vendor: IBM
>> > width: 64 bits
>> > capabilities: smp powernv opal
>> 
>> 
>> Hi Qian Cai,
>> 
>> Could you please try with this patch: 
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/192803.html
>> 
>> and see if the issue is resolved?
>
> It works fine.
>
> Just feel a bit silly that a node without CPU and memory is still online by
> default during boot at the first place on powerpc, but that is probably a
> different issue. For example,

Those are there to represent the memory on your attached GPUs. It's not
onlined by default.

I don't really love that they show up like that, but I think that's
working as expected.

cheers

> # numactl -H
> available: 6 nodes (0,8,252-255)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
> 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
> 53 54 55 56 57 58 59 60 61 62 63
> node 0 size: 126801 MB
> node 0 free: 123199 MB
> node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
> 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
> 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
> node 8 size: 130811 MB
> node 8 free: 128436 MB
> node 252 cpus:
> node 252 size: 0 MB
> node 252 free: 0 MB
> node 253 cpus:
> node 253 size: 0 MB
> node 253 free: 0 MB
> node 254 cpus:
> node 254 size: 0 MB
> node 254 free: 0 MB
> node 255 cpus:
> node 255 size: 0 MB
> node 255 free: 0 MB
> node distances:
> node   0   8  252  253  254  255 
>   0:  10  40  80  80  80  80 
>   8:  40  10  80  80  80  80 
>  252:  80  80  10  80  80  80 
>  253:  80  80  80  10  80  80 
>  254:  80  80  80  80  10  80 
>  255:  80  80  80  80  80  10 
>
> # cat /sys/devices/system/node/online 
> 0,8,252-255

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-06-29 11:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27 21:21 power9 NUMA crash while reading debugfs imc_cmd Qian Cai
2019-06-28  3:12 ` Michael Ellerman
2019-06-28  3:34   ` Qian Cai
2019-06-28 11:49     ` Anju T Sudhakar
2019-06-28 13:00       ` Qian Cai
2019-06-29 11:22         ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).