linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* x86 boot broken on -rc1?
@ 2017-12-02  0:39 Jakub Kicinski
  2017-12-04  1:28 ` [bisected] x86 boot still broken on -rc2 Jakub Kicinski
  2017-12-13 19:37 ` x86 boot broken on -rc1? Björn Töpel
  0 siblings, 2 replies; 10+ messages in thread
From: Jakub Kicinski @ 2017-12-02  0:39 UTC (permalink / raw)
  To: LKML; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 7908 bytes --]

Hi!

I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
E5-2630 v4 box.  It also happens on linux-next.  Did anyone else
experience it?  (.config attached)

[    5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
[    5.007544] Modules linked in:
[    5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
[    5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[    5.007544] task: 000000009e842725 task.stack: 000000008a63fd2d
[    5.007544] RIP: 0010:uncore_pci_probe+0x285/0x2b0
[    5.007544] RSP: 0000:ffffad8580163d10 EFLAGS: 00010286
[    5.007544] RAX: ffff98576cc3df30 RBX: ffffffffb08037e0 RCX: ffffffffb0c1a120
[    5.007544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb0c1a960
[    5.007544] RBP: ffff985b6c00ac00 R08: fffffffffffffffe R09: 00000000000fffff
[    5.007544] R10: ffff98576f1b6018 R11: 0000000000000022 R12: ffff985b6c641000
[    5.007544] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
[    5.007544] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
[    5.007544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.007544] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
[    5.007544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    5.007544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    5.007544] Call Trace:
[    5.007544]  local_pci_probe+0x3d/0x90
[    5.007544]  ? pci_match_device+0xd9/0x100
[    5.007544]  pci_device_probe+0x122/0x180
[    5.007544]  driver_probe_device+0x246/0x330
[    5.007544]  ? set_debug_rodata+0x11/0x11
[    5.007544]  __driver_attach+0x8a/0x90
[    5.007544]  ? driver_probe_device+0x330/0x330
[    5.007544]  bus_for_each_dev+0x5c/0x90
[    5.007544]  bus_add_driver+0x196/0x220
[    5.007544]  driver_register+0x57/0xc0
[    5.007544]  intel_uncore_init+0x1e3/0x249
[    5.007544]  ? uncore_type_init+0x193/0x193
[    5.007544]  ? set_debug_rodata+0x11/0x11
[    5.007544]  do_one_initcall+0x4b/0x190
[    5.007544]  kernel_init_freeable+0x16e/0x1f5
[    5.007544]  ? rest_init+0xd0/0xd0
[    5.007544]  kernel_init+0xa/0x100
[    5.007544]  ret_from_fork+0x1f/0x30
[    5.007544] Code: 48 8b 52 08 48 85 d2 74 0d 89 44 24 04 48 89 df ff d2 8b 44 24 04 48 89 df 89 44 24 04 e8 54 0a 1c 00 8b 44 24 0 
[    5.007544] ---[ end trace 4dc4c3d5f5afcd2f ]---
[    5.244504] bdx_uncore: probe of 0000:ff:08.2 failed with error -22
[    5.251604] bdx_uncore: probe of 0000:ff:0b.1 failed with error -22
[    5.258711] bdx_uncore: probe of 0000:ff:10.1 failed with error -22
[    5.265819] bdx_uncore: probe of 0000:ff:14.0 failed with error -22
[    5.272919] bdx_uncore: probe of 0000:ff:14.1 failed with error -22
[    5.280019] bdx_uncore: probe of 0000:ff:15.0 failed with error -22
[    5.287112] bdx_uncore: probe of 0000:ff:15.1 failed with error -22
[    5.294376] WARNING: CPU: 1 PID: 15 at ../arch/x86/events/intel/uncore.c:1065 uncore_change_type_ctx.isra.5+0xe6/0xf0
[    5.298362] Modules linked in:
[    5.298362] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
[    5.298362] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[    5.298362] task: 00000000ae78bc8f task.stack: 00000000f79660c1
[    5.298362] RIP: 0010:uncore_change_type_ctx.isra.5+0xe6/0xf0
[    5.298362] RSP: 0000:ffffad85833b3db8 EFLAGS: 00010213
[    5.298362] RAX: 0000000000000000 RBX: ffff9857669b0200 RCX: 0000000000000001
[    5.298362] RDX: ffff985b6f000000 RSI: ffff985b66580400 RDI: ffffffffb0c1ae8c
[    5.298362] RBP: ffff985b66580400 R08: ffffffffb0c1ae8c R09: 0000000000000001
[    5.298362] R10: 0000000000000000 R11: 00000000003d0900 R12: 0000000000000000
[    5.298362] R13: ffffffffffffffff R14: 0000000000000001 R15: 0000000000000008
[    5.298362] FS:  0000000000000000(0000) GS:ffff985b6f000000(0000) knlGS:0000000000000000
[    5.298362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.298362] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
[    5.298362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    5.298362] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    5.298362] Call Trace:
[    5.298362]  uncore_event_cpu_online+0x283/0x340
[    5.298362]  ? uncore_event_cpu_offline+0x180/0x180
[    5.298362]  cpuhp_invoke_callback+0x8c/0x620
[    5.298362]  ? __schedule+0x1ad/0x6c0
[    5.298362]  ? sort_range+0x20/0x20
[    5.298362]  cpuhp_thread_fun+0xbc/0x140
[    5.298362]  smpboot_thread_fn+0x114/0x1d0
[    5.298362]  kthread+0x111/0x130
[    5.298362]  ? kthread_create_on_node+0x40/0x40
[    5.298362]  ret_from_fork+0x1f/0x30
[    5.298362] Code: 2a 44 89 73 10 41 83 c4 01 48 81 c5 40 01 00 00 45 3b 20 7c cf 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f f 
[    5.298362] ---[ end trace 4dc4c3d5f5afcd30 ]---
[    5.504808] Scanning for low memory corruption every 60 seconds
[    5.512347] Initialise system trusted keyrings
[    5.517470] workingset: timestamp_bits=40 max_order=23 bucket_order=0
[    5.524840] BUG: unable to handle kernel paging request at 0000000023314bf4
[    5.528761] IP: __kmalloc_track_caller+0xa8/0x210
[    5.528761] PGD 185c0a067 P4D 185c0a067 PUD 185c0c067 PMD 0 
[    5.528761] Oops: 0000 [#1] PREEMPT SMP
[    5.528761] Modules linked in:
[    5.528761] CPU: 14 PID: 1 Comm: swapper/0 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
[    5.528761] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[    5.528761] task: 000000009e842725 task.stack: 000000008a63fd2d
[    5.528761] RIP: 0010:__kmalloc_track_caller+0xa8/0x210
[    5.528761] RSP: 0000:ffffad8580163d58 EFLAGS: 00010286
[    5.528761] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000012ce0e
[    5.528761] RDX: 000000000012cd0e RSI: 000000000012cd0e RDI: 000000000001dde0
[    5.528761] RBP: ffff985700000001 R08: ffff98576f407c00 R09: ffffffffb071edbf
[    5.528761] R10: ffffd54de1995600 R11: ffff985b6655915f R12: 0000000000000004
[    5.528761] R13: 00000000014000c0 R14: ffffffffb026c239 R15: ffff98576f407c00
[    5.528761] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
[    5.528761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.528761] CR2: ffffffffffffffff CR3: 0000000185c09001 CR4: 00000000003606e0
[    5.528761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    5.528761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    5.528761] Call Trace:
[    5.528761]  kstrdup+0x2d/0x60
[    5.528761]  __kernfs_new_node+0x29/0x130
[    5.528761]  kernfs_new_node+0x24/0x50
[    5.528761]  kernfs_create_link+0x29/0x90
[    5.528761]  sysfs_do_create_link_sd.isra.0+0x5d/0xc0
[    5.528761]  sysfs_slab_add+0x1f5/0x270
[    5.528761]  ? set_debug_rodata+0x11/0x11
[    5.528761]  slab_sysfs_init+0x8b/0xfa
[    5.528761]  ? kmem_cache_init+0xf9/0xf9
[    5.528761]  do_one_initcall+0x4b/0x190
[    5.528761]  kernel_init_freeable+0x16e/0x1f5
[    5.528761]  ? rest_init+0xd0/0xd0
[    5.528761]  kernel_init+0xa/0x100
[    5.528761]  ret_from_fork+0x1f/0x30
[    5.528761] Code: 49 63 47 20 49 8b 3f 48 8d 8a 00 01 00 00 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 48 85 db 7 
[    5.528761] RIP: __kmalloc_track_caller+0xa8/0x210 RSP: ffffad8580163d58
[    5.528761] CR2: ffffffffffffffff
[    5.528761] ---[ end trace 4dc4c3d5f5afcd31 ]---
[    5.773089] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    5.773089] 
[    5.777076] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

[-- Attachment #2: .config.xz --]
[-- Type: application/x-xz, Size: 24284 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bisected] x86 boot still broken on -rc2
  2017-12-02  0:39 x86 boot broken on -rc1? Jakub Kicinski
@ 2017-12-04  1:28 ` Jakub Kicinski
  2017-12-04 12:28   ` Prarit Bhargava
  2017-12-13 19:37 ` x86 boot broken on -rc1? Björn Töpel
  1 sibling, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2017-12-04  1:28 UTC (permalink / raw)
  To: LKML; +Cc: netdev, Prarit Bhargava, Thomas Gleixner

Same thing on rc2, bisected down to:

commit b4c0a7326f5dc0ef7a64128b0ae7d081f4b2cbd1 (refs/bisect/bad)
Author: Prarit Bhargava <prarit@redhat.com>
Date:   Tue Nov 14 07:42:57 2017 -0500

    x86/smpboot: Fix __max_logical_packages estimate
    
    A system booted with a small number of cores enabled per package
    panics because the estimate of __max_logical_packages is too low.
    
    This occurs when the total number of active cores across all packages is
    less than the maximum core count for a single package. e.g.:
    
      On a 4 package system with 20 cores/package where only 4 cores are
      enabled on each package, the value of __max_logical_packages is
      calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
    
    Calculate __max_logical_packages after the cpu enumeration has completed.
    Use the boot cpu's data to extrapolate the number of packages.
    
    Signed-off-by: Prarit Bhargava <prarit@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Kan Liang <kan.liang@intel.com>
    Cc: He Chen <he.chen@linux.intel.com>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Piotr Luc <piotr.luc@intel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
    Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Mathias Krause <minipli@googlemail.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com


On Fri, 1 Dec 2017 16:39:54 -0800, Jakub Kicinski wrote:
> Hi!
> 
> I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
> E5-2630 v4 box.  It also happens on linux-next.  Did anyone else
> experience it?  (.config attached)
> 
> [    5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
> [    5.007544] Modules linked in:
> [    5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
> [    5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [    5.007544] task: 000000009e842725 task.stack: 000000008a63fd2d
> [    5.007544] RIP: 0010:uncore_pci_probe+0x285/0x2b0
> [    5.007544] RSP: 0000:ffffad8580163d10 EFLAGS: 00010286
> [    5.007544] RAX: ffff98576cc3df30 RBX: ffffffffb08037e0 RCX: ffffffffb0c1a120
> [    5.007544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb0c1a960
> [    5.007544] RBP: ffff985b6c00ac00 R08: fffffffffffffffe R09: 00000000000fffff
> [    5.007544] R10: ffff98576f1b6018 R11: 0000000000000022 R12: ffff985b6c641000
> [    5.007544] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
> [    5.007544] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
> [    5.007544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.007544] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
> [    5.007544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    5.007544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    5.007544] Call Trace:
> [    5.007544]  local_pci_probe+0x3d/0x90
> [    5.007544]  ? pci_match_device+0xd9/0x100
> [    5.007544]  pci_device_probe+0x122/0x180
> [    5.007544]  driver_probe_device+0x246/0x330
> [    5.007544]  ? set_debug_rodata+0x11/0x11
> [    5.007544]  __driver_attach+0x8a/0x90
> [    5.007544]  ? driver_probe_device+0x330/0x330
> [    5.007544]  bus_for_each_dev+0x5c/0x90
> [    5.007544]  bus_add_driver+0x196/0x220
> [    5.007544]  driver_register+0x57/0xc0
> [    5.007544]  intel_uncore_init+0x1e3/0x249
> [    5.007544]  ? uncore_type_init+0x193/0x193
> [    5.007544]  ? set_debug_rodata+0x11/0x11
> [    5.007544]  do_one_initcall+0x4b/0x190
> [    5.007544]  kernel_init_freeable+0x16e/0x1f5
> [    5.007544]  ? rest_init+0xd0/0xd0
> [    5.007544]  kernel_init+0xa/0x100
> [    5.007544]  ret_from_fork+0x1f/0x30
> [    5.007544] Code: 48 8b 52 08 48 85 d2 74 0d 89 44 24 04 48 89 df ff d2 8b 44 24 04 48 89 df 89 44 24 04 e8 54 0a 1c 00 8b 44 24 0 
> [    5.007544] ---[ end trace 4dc4c3d5f5afcd2f ]---
> [    5.244504] bdx_uncore: probe of 0000:ff:08.2 failed with error -22
> [    5.251604] bdx_uncore: probe of 0000:ff:0b.1 failed with error -22
> [    5.258711] bdx_uncore: probe of 0000:ff:10.1 failed with error -22
> [    5.265819] bdx_uncore: probe of 0000:ff:14.0 failed with error -22
> [    5.272919] bdx_uncore: probe of 0000:ff:14.1 failed with error -22
> [    5.280019] bdx_uncore: probe of 0000:ff:15.0 failed with error -22
> [    5.287112] bdx_uncore: probe of 0000:ff:15.1 failed with error -22
> [    5.294376] WARNING: CPU: 1 PID: 15 at ../arch/x86/events/intel/uncore.c:1065 uncore_change_type_ctx.isra.5+0xe6/0xf0
> [    5.298362] Modules linked in:
> [    5.298362] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
> [    5.298362] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [    5.298362] task: 00000000ae78bc8f task.stack: 00000000f79660c1
> [    5.298362] RIP: 0010:uncore_change_type_ctx.isra.5+0xe6/0xf0
> [    5.298362] RSP: 0000:ffffad85833b3db8 EFLAGS: 00010213
> [    5.298362] RAX: 0000000000000000 RBX: ffff9857669b0200 RCX: 0000000000000001
> [    5.298362] RDX: ffff985b6f000000 RSI: ffff985b66580400 RDI: ffffffffb0c1ae8c
> [    5.298362] RBP: ffff985b66580400 R08: ffffffffb0c1ae8c R09: 0000000000000001
> [    5.298362] R10: 0000000000000000 R11: 00000000003d0900 R12: 0000000000000000
> [    5.298362] R13: ffffffffffffffff R14: 0000000000000001 R15: 0000000000000008
> [    5.298362] FS:  0000000000000000(0000) GS:ffff985b6f000000(0000) knlGS:0000000000000000
> [    5.298362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.298362] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
> [    5.298362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    5.298362] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    5.298362] Call Trace:
> [    5.298362]  uncore_event_cpu_online+0x283/0x340
> [    5.298362]  ? uncore_event_cpu_offline+0x180/0x180
> [    5.298362]  cpuhp_invoke_callback+0x8c/0x620
> [    5.298362]  ? __schedule+0x1ad/0x6c0
> [    5.298362]  ? sort_range+0x20/0x20
> [    5.298362]  cpuhp_thread_fun+0xbc/0x140
> [    5.298362]  smpboot_thread_fn+0x114/0x1d0
> [    5.298362]  kthread+0x111/0x130
> [    5.298362]  ? kthread_create_on_node+0x40/0x40
> [    5.298362]  ret_from_fork+0x1f/0x30
> [    5.298362] Code: 2a 44 89 73 10 41 83 c4 01 48 81 c5 40 01 00 00 45 3b 20 7c cf 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f f 
> [    5.298362] ---[ end trace 4dc4c3d5f5afcd30 ]---
> [    5.504808] Scanning for low memory corruption every 60 seconds
> [    5.512347] Initialise system trusted keyrings
> [    5.517470] workingset: timestamp_bits=40 max_order=23 bucket_order=0
> [    5.524840] BUG: unable to handle kernel paging request at 0000000023314bf4
> [    5.528761] IP: __kmalloc_track_caller+0xa8/0x210
> [    5.528761] PGD 185c0a067 P4D 185c0a067 PUD 185c0c067 PMD 0 
> [    5.528761] Oops: 0000 [#1] PREEMPT SMP
> [    5.528761] Modules linked in:
> [    5.528761] CPU: 14 PID: 1 Comm: swapper/0 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
> [    5.528761] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [    5.528761] task: 000000009e842725 task.stack: 000000008a63fd2d
> [    5.528761] RIP: 0010:__kmalloc_track_caller+0xa8/0x210
> [    5.528761] RSP: 0000:ffffad8580163d58 EFLAGS: 00010286
> [    5.528761] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000012ce0e
> [    5.528761] RDX: 000000000012cd0e RSI: 000000000012cd0e RDI: 000000000001dde0
> [    5.528761] RBP: ffff985700000001 R08: ffff98576f407c00 R09: ffffffffb071edbf
> [    5.528761] R10: ffffd54de1995600 R11: ffff985b6655915f R12: 0000000000000004
> [    5.528761] R13: 00000000014000c0 R14: ffffffffb026c239 R15: ffff98576f407c00
> [    5.528761] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
> [    5.528761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.528761] CR2: ffffffffffffffff CR3: 0000000185c09001 CR4: 00000000003606e0
> [    5.528761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    5.528761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    5.528761] Call Trace:
> [    5.528761]  kstrdup+0x2d/0x60
> [    5.528761]  __kernfs_new_node+0x29/0x130
> [    5.528761]  kernfs_new_node+0x24/0x50
> [    5.528761]  kernfs_create_link+0x29/0x90
> [    5.528761]  sysfs_do_create_link_sd.isra.0+0x5d/0xc0
> [    5.528761]  sysfs_slab_add+0x1f5/0x270
> [    5.528761]  ? set_debug_rodata+0x11/0x11
> [    5.528761]  slab_sysfs_init+0x8b/0xfa
> [    5.528761]  ? kmem_cache_init+0xf9/0xf9
> [    5.528761]  do_one_initcall+0x4b/0x190
> [    5.528761]  kernel_init_freeable+0x16e/0x1f5
> [    5.528761]  ? rest_init+0xd0/0xd0
> [    5.528761]  kernel_init+0xa/0x100
> [    5.528761]  ret_from_fork+0x1f/0x30
> [    5.528761] Code: 49 63 47 20 49 8b 3f 48 8d 8a 00 01 00 00 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 48 85 db 7 
> [    5.528761] RIP: __kmalloc_track_caller+0xa8/0x210 RSP: ffffad8580163d58
> [    5.528761] CR2: ffffffffffffffff
> [    5.528761] ---[ end trace 4dc4c3d5f5afcd31 ]---
> [    5.773089] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [    5.773089] 
> [    5.777076] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bisected] x86 boot still broken on -rc2
  2017-12-04  1:28 ` [bisected] x86 boot still broken on -rc2 Jakub Kicinski
@ 2017-12-04 12:28   ` Prarit Bhargava
  2017-12-04 13:13     ` Prarit Bhargava
  0 siblings, 1 reply; 10+ messages in thread
From: Prarit Bhargava @ 2017-12-04 12:28 UTC (permalink / raw)
  To: Jakub Kicinski, LKML; +Cc: netdev, Thomas Gleixner



On 12/03/2017 08:28 PM, Jakub Kicinski wrote:
> Same thing on rc2, bisected down to:
> 
> commit b4c0a7326f5dc0ef7a64128b0ae7d081f4b2cbd1 (refs/bisect/bad)
> Author: Prarit Bhargava <prarit@redhat.com>
> Date:   Tue Nov 14 07:42:57 2017 -0500
> 
>     x86/smpboot: Fix __max_logical_packages estimate
>     
>     A system booted with a small number of cores enabled per package
>     panics because the estimate of __max_logical_packages is too low.
>     
>     This occurs when the total number of active cores across all packages is
>     less than the maximum core count for a single package. e.g.:
>     
>       On a 4 package system with 20 cores/package where only 4 cores are
>       enabled on each package, the value of __max_logical_packages is
>       calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>     
>     Calculate __max_logical_packages after the cpu enumeration has completed.
>     Use the boot cpu's data to extrapolate the number of packages.
>     
>     Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>     Cc: Tom Lendacky <thomas.lendacky@amd.com>
>     Cc: Andi Kleen <ak@linux.intel.com>
>     Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Kan Liang <kan.liang@intel.com>
>     Cc: He Chen <he.chen@linux.intel.com>
>     Cc: Stephane Eranian <eranian@google.com>
>     Cc: Dave Hansen <dave.hansen@intel.com>
>     Cc: Piotr Luc <piotr.luc@intel.com>
>     Cc: Andy Lutomirski <luto@kernel.org>
>     Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
>     Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>     Cc: Borislav Petkov <bp@suse.de>
>     Cc: Tim Chen <tim.c.chen@linux.intel.com>
>     Cc: Mathias Krause <minipli@googlemail.com>
>     Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>     Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
> 
> 
> On Fri, 1 Dec 2017 16:39:54 -0800, Jakub Kicinski wrote:
>> Hi!
>>
>> I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
>> E5-2630 v4 box.  It also happens on linux-next.  Did anyone else
>> experience it?  (.config attached)
>>
>> [    5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
>> [    5.007544] Modules linked in:
>> [    5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [    5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016

I have a Dell R730 available for use.  OOC are you booting with the default
BIOS options?

P.


>> [    5.007544] task: 000000009e842725 task.stack: 000000008a63fd2d
>> [    5.007544] RIP: 0010:uncore_pci_probe+0x285/0x2b0
>> [    5.007544] RSP: 0000:ffffad8580163d10 EFLAGS: 00010286
>> [    5.007544] RAX: ffff98576cc3df30 RBX: ffffffffb08037e0 RCX: ffffffffb0c1a120
>> [    5.007544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb0c1a960
>> [    5.007544] RBP: ffff985b6c00ac00 R08: fffffffffffffffe R09: 00000000000fffff
>> [    5.007544] R10: ffff98576f1b6018 R11: 0000000000000022 R12: ffff985b6c641000
>> [    5.007544] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
>> [    5.007544] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
>> [    5.007544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    5.007544] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
>> [    5.007544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    5.007544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    5.007544] Call Trace:
>> [    5.007544]  local_pci_probe+0x3d/0x90
>> [    5.007544]  ? pci_match_device+0xd9/0x100
>> [    5.007544]  pci_device_probe+0x122/0x180
>> [    5.007544]  driver_probe_device+0x246/0x330
>> [    5.007544]  ? set_debug_rodata+0x11/0x11
>> [    5.007544]  __driver_attach+0x8a/0x90
>> [    5.007544]  ? driver_probe_device+0x330/0x330
>> [    5.007544]  bus_for_each_dev+0x5c/0x90
>> [    5.007544]  bus_add_driver+0x196/0x220
>> [    5.007544]  driver_register+0x57/0xc0
>> [    5.007544]  intel_uncore_init+0x1e3/0x249
>> [    5.007544]  ? uncore_type_init+0x193/0x193
>> [    5.007544]  ? set_debug_rodata+0x11/0x11
>> [    5.007544]  do_one_initcall+0x4b/0x190
>> [    5.007544]  kernel_init_freeable+0x16e/0x1f5
>> [    5.007544]  ? rest_init+0xd0/0xd0
>> [    5.007544]  kernel_init+0xa/0x100
>> [    5.007544]  ret_from_fork+0x1f/0x30
>> [    5.007544] Code: 48 8b 52 08 48 85 d2 74 0d 89 44 24 04 48 89 df ff d2 8b 44 24 04 48 89 df 89 44 24 04 e8 54 0a 1c 00 8b 44 24 0 
>> [    5.007544] ---[ end trace 4dc4c3d5f5afcd2f ]---
>> [    5.244504] bdx_uncore: probe of 0000:ff:08.2 failed with error -22
>> [    5.251604] bdx_uncore: probe of 0000:ff:0b.1 failed with error -22
>> [    5.258711] bdx_uncore: probe of 0000:ff:10.1 failed with error -22
>> [    5.265819] bdx_uncore: probe of 0000:ff:14.0 failed with error -22
>> [    5.272919] bdx_uncore: probe of 0000:ff:14.1 failed with error -22
>> [    5.280019] bdx_uncore: probe of 0000:ff:15.0 failed with error -22
>> [    5.287112] bdx_uncore: probe of 0000:ff:15.1 failed with error -22
>> [    5.294376] WARNING: CPU: 1 PID: 15 at ../arch/x86/events/intel/uncore.c:1065 uncore_change_type_ctx.isra.5+0xe6/0xf0
>> [    5.298362] Modules linked in:
>> [    5.298362] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [    5.298362] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [    5.298362] task: 00000000ae78bc8f task.stack: 00000000f79660c1
>> [    5.298362] RIP: 0010:uncore_change_type_ctx.isra.5+0xe6/0xf0
>> [    5.298362] RSP: 0000:ffffad85833b3db8 EFLAGS: 00010213
>> [    5.298362] RAX: 0000000000000000 RBX: ffff9857669b0200 RCX: 0000000000000001
>> [    5.298362] RDX: ffff985b6f000000 RSI: ffff985b66580400 RDI: ffffffffb0c1ae8c
>> [    5.298362] RBP: ffff985b66580400 R08: ffffffffb0c1ae8c R09: 0000000000000001
>> [    5.298362] R10: 0000000000000000 R11: 00000000003d0900 R12: 0000000000000000
>> [    5.298362] R13: ffffffffffffffff R14: 0000000000000001 R15: 0000000000000008
>> [    5.298362] FS:  0000000000000000(0000) GS:ffff985b6f000000(0000) knlGS:0000000000000000
>> [    5.298362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    5.298362] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
>> [    5.298362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    5.298362] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    5.298362] Call Trace:
>> [    5.298362]  uncore_event_cpu_online+0x283/0x340
>> [    5.298362]  ? uncore_event_cpu_offline+0x180/0x180
>> [    5.298362]  cpuhp_invoke_callback+0x8c/0x620
>> [    5.298362]  ? __schedule+0x1ad/0x6c0
>> [    5.298362]  ? sort_range+0x20/0x20
>> [    5.298362]  cpuhp_thread_fun+0xbc/0x140
>> [    5.298362]  smpboot_thread_fn+0x114/0x1d0
>> [    5.298362]  kthread+0x111/0x130
>> [    5.298362]  ? kthread_create_on_node+0x40/0x40
>> [    5.298362]  ret_from_fork+0x1f/0x30
>> [    5.298362] Code: 2a 44 89 73 10 41 83 c4 01 48 81 c5 40 01 00 00 45 3b 20 7c cf 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f f 
>> [    5.298362] ---[ end trace 4dc4c3d5f5afcd30 ]---
>> [    5.504808] Scanning for low memory corruption every 60 seconds
>> [    5.512347] Initialise system trusted keyrings
>> [    5.517470] workingset: timestamp_bits=40 max_order=23 bucket_order=0
>> [    5.524840] BUG: unable to handle kernel paging request at 0000000023314bf4
>> [    5.528761] IP: __kmalloc_track_caller+0xa8/0x210
>> [    5.528761] PGD 185c0a067 P4D 185c0a067 PUD 185c0c067 PMD 0 
>> [    5.528761] Oops: 0000 [#1] PREEMPT SMP
>> [    5.528761] Modules linked in:
>> [    5.528761] CPU: 14 PID: 1 Comm: swapper/0 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [    5.528761] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [    5.528761] task: 000000009e842725 task.stack: 000000008a63fd2d
>> [    5.528761] RIP: 0010:__kmalloc_track_caller+0xa8/0x210
>> [    5.528761] RSP: 0000:ffffad8580163d58 EFLAGS: 00010286
>> [    5.528761] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000012ce0e
>> [    5.528761] RDX: 000000000012cd0e RSI: 000000000012cd0e RDI: 000000000001dde0
>> [    5.528761] RBP: ffff985700000001 R08: ffff98576f407c00 R09: ffffffffb071edbf
>> [    5.528761] R10: ffffd54de1995600 R11: ffff985b6655915f R12: 0000000000000004
>> [    5.528761] R13: 00000000014000c0 R14: ffffffffb026c239 R15: ffff98576f407c00
>> [    5.528761] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
>> [    5.528761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    5.528761] CR2: ffffffffffffffff CR3: 0000000185c09001 CR4: 00000000003606e0
>> [    5.528761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    5.528761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    5.528761] Call Trace:
>> [    5.528761]  kstrdup+0x2d/0x60
>> [    5.528761]  __kernfs_new_node+0x29/0x130
>> [    5.528761]  kernfs_new_node+0x24/0x50
>> [    5.528761]  kernfs_create_link+0x29/0x90
>> [    5.528761]  sysfs_do_create_link_sd.isra.0+0x5d/0xc0
>> [    5.528761]  sysfs_slab_add+0x1f5/0x270
>> [    5.528761]  ? set_debug_rodata+0x11/0x11
>> [    5.528761]  slab_sysfs_init+0x8b/0xfa
>> [    5.528761]  ? kmem_cache_init+0xf9/0xf9
>> [    5.528761]  do_one_initcall+0x4b/0x190
>> [    5.528761]  kernel_init_freeable+0x16e/0x1f5
>> [    5.528761]  ? rest_init+0xd0/0xd0
>> [    5.528761]  kernel_init+0xa/0x100
>> [    5.528761]  ret_from_fork+0x1f/0x30
>> [    5.528761] Code: 49 63 47 20 49 8b 3f 48 8d 8a 00 01 00 00 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 48 85 db 7 
>> [    5.528761] RIP: __kmalloc_track_caller+0xa8/0x210 RSP: ffffad8580163d58
>> [    5.528761] CR2: ffffffffffffffff
>> [    5.528761] ---[ end trace 4dc4c3d5f5afcd31 ]---
>> [    5.773089] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>> [    5.773089] 
>> [    5.777076] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [    5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bisected] x86 boot still broken on -rc2
  2017-12-04 12:28   ` Prarit Bhargava
@ 2017-12-04 13:13     ` Prarit Bhargava
  2017-12-04 16:45       ` Prarit Bhargava
  0 siblings, 1 reply; 10+ messages in thread
From: Prarit Bhargava @ 2017-12-04 13:13 UTC (permalink / raw)
  To: Jakub Kicinski, LKML; +Cc: netdev, Thomas Gleixner



On 12/04/2017 07:28 AM, Prarit Bhargava wrote:
> 
> 
> On 12/03/2017 08:28 PM, Jakub Kicinski wrote:
>> Same thing on rc2, bisected down to:
>>
>> commit b4c0a7326f5dc0ef7a64128b0ae7d081f4b2cbd1 (refs/bisect/bad)
>> Author: Prarit Bhargava <prarit@redhat.com>
>> Date:   Tue Nov 14 07:42:57 2017 -0500
>>
>>     x86/smpboot: Fix __max_logical_packages estimate
>>     
>>     A system booted with a small number of cores enabled per package
>>     panics because the estimate of __max_logical_packages is too low.
>>     
>>     This occurs when the total number of active cores across all packages is
>>     less than the maximum core count for a single package. e.g.:
>>     
>>       On a 4 package system with 20 cores/package where only 4 cores are
>>       enabled on each package, the value of __max_logical_packages is
>>       calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>>     
>>     Calculate __max_logical_packages after the cpu enumeration has completed.
>>     Use the boot cpu's data to extrapolate the number of packages.
>>     
>>     Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>     Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>     Cc: Andi Kleen <ak@linux.intel.com>
>>     Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>>     Cc: Peter Zijlstra <peterz@infradead.org>
>>     Cc: Kan Liang <kan.liang@intel.com>
>>     Cc: He Chen <he.chen@linux.intel.com>
>>     Cc: Stephane Eranian <eranian@google.com>
>>     Cc: Dave Hansen <dave.hansen@intel.com>
>>     Cc: Piotr Luc <piotr.luc@intel.com>
>>     Cc: Andy Lutomirski <luto@kernel.org>
>>     Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
>>     Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>>     Cc: Borislav Petkov <bp@suse.de>
>>     Cc: Tim Chen <tim.c.chen@linux.intel.com>
>>     Cc: Mathias Krause <minipli@googlemail.com>
>>     Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>     Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
>>
>>
>> On Fri, 1 Dec 2017 16:39:54 -0800, Jakub Kicinski wrote:
>>> Hi!
>>>
>>> I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
>>> E5-2630 v4 box.  It also happens on linux-next.  Did anyone else
>>> experience it?  (.config attached)
>>>
>>> [    5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
>>> [    5.007544] Modules linked in:
>>> [    5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>>> [    5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> 
> I have a Dell R730 available for use.  OOC are you booting with the default
> BIOS options?
>

Jakub, I was able to reproduce this on a similar system by DISABLING
hyperthreading in the BIOS.  Doing this on other systems seems to have no
impact.  What is odd about this system when booting is that the
kernel claims that hyperthreading is ENABLED:

x86: Booting SMP configuration:
.... node  #0, CPUs:        #1  #2  #3  #4
.... node  #1, CPUs:    #5  #6  #7  #8  #9
.... node  #0, CPUs:   #10 #11 #12 #13 #14
.... node  #1, CPUs:   #15 #16 #17 #18 #19
smp: Brought up 2 nodes, 20 CPUs
smpboot: Max logical packages: 1

which means that the calculation of logical packages is wrong because

	ncpus = cpu_data(0).booted_cores * smp_num_siblings;
	ncpus = 10 * 2;
	ncpus = 20;

smp_num_siblings is defined as "The number of threads in a core" which
should be 1 if HT/SMT is disabled.

It looks like my patch has exposed a bug in the
smp_num_siblings calculation.   I'm still debugging ...

FWIW, I did test this code on systems by disabling HT/SMT in BIOS on
several systems.  I have tested those systems again and don't see a
problem.  It is something peculiar to a few systems.

P.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bisected] x86 boot still broken on -rc2
  2017-12-04 13:13     ` Prarit Bhargava
@ 2017-12-04 16:45       ` Prarit Bhargava
  2017-12-04 19:48         ` Jakub Kicinski
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Prarit Bhargava @ 2017-12-04 16:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, Prarit, Jakub Kicinski, netdev, Thomas Gleixner,
	Clark Williams

On 12/04/2017 08:13 AM, Prarit Bhargava wrote:
> 
> 
> x86: Booting SMP configuration:
> .... node  #0, CPUs:        #1  #2  #3  #4
> .... node  #1, CPUs:    #5  #6  #7  #8  #9
> .... node  #0, CPUs:   #10 #11 #12 #13 #14
> .... node  #1, CPUs:   #15 #16 #17 #18 #19
> smp: Brought up 2 nodes, 20 CPUs
> smpboot: Max logical packages: 1
> 
> which means that the calculation of logical packages is wrong because
> 
>       ncpus = cpu_data(0).booted_cores * smp_num_siblings;
>       ncpus = 10 * 2;
>       ncpus = 20;
> 
> smp_num_siblings is defined as "The number of threads in a core" which
> should be 1 if HT/SMT is disabled.
> 
> It looks like my patch has exposed a bug in the
> smp_num_siblings calculation.   I'm still debugging ...

The bug is that smp_num_siblings has been incorrectly calculated as the
*maximum* number of threads in a core, and not the actual number of threads in
a core on systems which have a CPUID level greater than 0xb.  (see
arch/x86/kernel/cpu/topology.c:59)

That will take some time to investigate and come up with a proper solution and
fix.  In the meantime, the patch below will fix the problem in the short-term.
I've tested the patch using SMT enabled, SMT disabled, maxcpus=1 and nr_cpus=1.

tglx, Please revert b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages
estimate") if you think that is a better option.  The problem with
smp_num_siblings has been around for almost a decade.

P.

---8<---

Subject: [PATCH] arch/x86: Do not use smp_num_siblings in
 __max_logical_packages calculation

Documentation/x86/topology.txt defines smp_num_siblings as "The number of
threads in a core".  Since commit bbb65d2d365e ("x86: use cpuid vector 0xb
when available for detecting cpu topology") smp_num_siblings is the
maximum number of threads in a core.  If Simultaneous MultiThreading
(SMT) is disabled on a system, smp_num_siblings is 2 and not 1 as
expected.

Use topology_max_smt_threads() in the __max_logical_packages calculation.

Signed-off-by: Prarit Bhargava <prarit@redhat.com
Cc: Jakub Kicinski <kubakici@wp.pl>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Clark Williams <williams@redhat.com>
---
 arch/x86/kernel/smpboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3d01df7d7cf6..eaee15fb7d8b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1304,7 +1304,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	 * Today neither Intel nor AMD support heterogenous systems so
 	 * extrapolate the boot cpu's data to all packages.
 	 */
-	ncpus = cpu_data(0).booted_cores * smp_num_siblings;
+	ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
 	__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
 	pr_info("Max logical packages: %u\n", __max_logical_packages);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [bisected] x86 boot still broken on -rc2
  2017-12-04 16:45       ` Prarit Bhargava
@ 2017-12-04 19:48         ` Jakub Kicinski
  2017-12-04 22:10         ` [tip:x86/urgent] x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation tip-bot for Prarit Bhargava
  2017-12-07  9:40         ` tip-bot for Prarit Bhargava
  2 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2017-12-04 19:48 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, netdev, Thomas Gleixner, Clark Williams

On Mon,  4 Dec 2017 11:45:21 -0500, Prarit Bhargava wrote:
> On 12/04/2017 08:13 AM, Prarit Bhargava wrote:
> > x86: Booting SMP configuration:
> > .... node  #0, CPUs:        #1  #2  #3  #4
> > .... node  #1, CPUs:    #5  #6  #7  #8  #9
> > .... node  #0, CPUs:   #10 #11 #12 #13 #14
> > .... node  #1, CPUs:   #15 #16 #17 #18 #19
> > smp: Brought up 2 nodes, 20 CPUs
> > smpboot: Max logical packages: 1
> > 
> > which means that the calculation of logical packages is wrong because
> > 
> >       ncpus = cpu_data(0).booted_cores * smp_num_siblings;
> >       ncpus = 10 * 2;
> >       ncpus = 20;
> > 
> > smp_num_siblings is defined as "The number of threads in a core" which
> > should be 1 if HT/SMT is disabled.
> > 
> > It looks like my patch has exposed a bug in the
> > smp_num_siblings calculation.   I'm still debugging ...  
> 
> The bug is that smp_num_siblings has been incorrectly calculated as the
> *maximum* number of threads in a core, and not the actual number of threads in
> a core on systems which have a CPUID level greater than 0xb.  (see
> arch/x86/kernel/cpu/topology.c:59)
> 
> That will take some time to investigate and come up with a proper solution and
> fix.  In the meantime, the patch below will fix the problem in the short-term.
> I've tested the patch using SMT enabled, SMT disabled, maxcpus=1 and nr_cpus=1.

Thanks Prarit, the work around does the job!  Indeed, I have SMT
disabled.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:x86/urgent] x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
  2017-12-04 16:45       ` Prarit Bhargava
  2017-12-04 19:48         ` Jakub Kicinski
@ 2017-12-04 22:10         ` tip-bot for Prarit Bhargava
  2017-12-07  9:40         ` tip-bot for Prarit Bhargava
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Prarit Bhargava @ 2017-12-04 22:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: prarit, kubakici, mingo, hpa, linux-kernel, tglx, williams

Commit-ID:  b1cbacc8663a4dce62e4ae501e859c82f4aeb1ca
Gitweb:     https://git.kernel.org/tip/b1cbacc8663a4dce62e4ae501e859c82f4aeb1ca
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Mon, 4 Dec 2017 11:45:21 -0500
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 4 Dec 2017 23:03:48 +0100

x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation

Documentation/x86/topology.txt defines smp_num_siblings as "The number of
threads in a core".  Since commit bbb65d2d365e ("x86: use cpuid vector 0xb
when available for detecting cpu topology") smp_num_siblings is the
maximum number of threads in a core.  If Simultaneous MultiThreading
(SMT) is disabled on a system, smp_num_siblings is 2 and not 1 as
expected.

Use topology_max_smt_threads(), which contains the active numer of threads,
in the __max_logical_packages calculation.

Fixes: b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages estimate")
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Prarit Bhargava <prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jakub Kicinski <kubakici@wp.pl>
Cc: netdev@vger.kernel.org
Cc: "netdev@vger.kernel.org"
Cc: Clark Williams <williams@redhat.com>
Link: https://lkml.kernel.org/r/20171204164521.17870-1-prarit@redhat.com
---
 arch/x86/kernel/smpboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 05a97d5..7de0aa2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1304,7 +1304,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	 * Today neither Intel nor AMD support heterogenous systems so
 	 * extrapolate the boot cpu's data to all packages.
 	 */
-	ncpus = cpu_data(0).booted_cores * smp_num_siblings;
+	ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
 	__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
 	pr_info("Max logical packages: %u\n", __max_logical_packages);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/urgent] x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
  2017-12-04 16:45       ` Prarit Bhargava
  2017-12-04 19:48         ` Jakub Kicinski
  2017-12-04 22:10         ` [tip:x86/urgent] x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation tip-bot for Prarit Bhargava
@ 2017-12-07  9:40         ` tip-bot for Prarit Bhargava
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Prarit Bhargava @ 2017-12-07  9:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, williams, hpa, mingo, tglx, prarit, kubakici

Commit-ID:  947134d9b00f342415af7eddd42a5fce7262a1b9
Gitweb:     https://git.kernel.org/tip/947134d9b00f342415af7eddd42a5fce7262a1b9
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Mon, 4 Dec 2017 11:45:21 -0500
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 7 Dec 2017 10:28:22 +0100

x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation

Documentation/x86/topology.txt defines smp_num_siblings as "The number of
threads in a core".  Since commit bbb65d2d365e ("x86: use cpuid vector 0xb
when available for detecting cpu topology") smp_num_siblings is the
maximum number of threads in a core.  If Simultaneous MultiThreading
(SMT) is disabled on a system, smp_num_siblings is 2 and not 1 as
expected.

Use topology_max_smt_threads(), which contains the active numer of threads,
in the __max_logical_packages calculation.

On a single socket, single core, single thread system __max_smt_threads has
not been updated when the __max_logical_packages calculation happens, so its
zero which makes the package estimate fail. Initialize it to one, which is
the minimum number of threads on a core.

[ tglx: Folded the __max_smt_threads fix in ]

Fixes: b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages estimate")
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Prarit Bhargava <prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jakub Kicinski <kubakici@wp.pl>
Cc: netdev@vger.kernel.org
Cc: "netdev@vger.kernel.org"
Cc: Clark Williams <williams@redhat.com>
Link: https://lkml.kernel.org/r/20171204164521.17870-1-prarit@redhat.com
---
 arch/x86/kernel/smpboot.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 05a97d5..35cb20994 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -106,7 +106,7 @@ EXPORT_SYMBOL(__max_logical_packages);
 static unsigned int logical_packages __read_mostly;
 
 /* Maximum number of SMT threads on any online core */
-int __max_smt_threads __read_mostly;
+int __read_mostly __max_smt_threads = 1;
 
 /* Flag to indicate if a complete sched domain rebuild is required */
 bool x86_topology_update;
@@ -1304,7 +1304,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	 * Today neither Intel nor AMD support heterogenous systems so
 	 * extrapolate the boot cpu's data to all packages.
 	 */
-	ncpus = cpu_data(0).booted_cores * smp_num_siblings;
+	ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
 	__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
 	pr_info("Max logical packages: %u\n", __max_logical_packages);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: x86 boot broken on -rc1?
  2017-12-02  0:39 x86 boot broken on -rc1? Jakub Kicinski
  2017-12-04  1:28 ` [bisected] x86 boot still broken on -rc2 Jakub Kicinski
@ 2017-12-13 19:37 ` Björn Töpel
  2017-12-13 19:58   ` Jakub Kicinski
  1 sibling, 1 reply; 10+ messages in thread
From: Björn Töpel @ 2017-12-13 19:37 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: LKML, netdev

2017-12-02 1:39 GMT+01:00 Jakub Kicinski <jakub.kicinski@netronome.com>:
> Hi!
>
> I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
> E5-2630 v4 box.  It also happens on linux-next.  Did anyone else
> experience it?  (.config attached)
>
> [    5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
> [    5.007544] Modules linked in:
> [    5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
> [    5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [    5.007544] task: 000000009e842725 task.stack: 000000008a63fd2d
> [    5.007544] RIP: 0010:uncore_pci_probe+0x285/0x2b0
> [    5.007544] RSP: 0000:ffffad8580163d10 EFLAGS: 00010286
> [    5.007544] RAX: ffff98576cc3df30 RBX: ffffffffb08037e0 RCX: ffffffffb0c1a120
> [    5.007544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb0c1a960
> [    5.007544] RBP: ffff985b6c00ac00 R08: fffffffffffffffe R09: 00000000000fffff
> [    5.007544] R10: ffff98576f1b6018 R11: 0000000000000022 R12: ffff985b6c641000
> [    5.007544] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
> [    5.007544] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
> [    5.007544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.007544] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
> [    5.007544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    5.007544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    5.007544] Call Trace:
> [    5.007544]  local_pci_probe+0x3d/0x90
> [    5.007544]  ? pci_match_device+0xd9/0x100
> [    5.007544]  pci_device_probe+0x122/0x180
> [    5.007544]  driver_probe_device+0x246/0x330
> [    5.007544]  ? set_debug_rodata+0x11/0x11
> [    5.007544]  __driver_attach+0x8a/0x90
> [    5.007544]  ? driver_probe_device+0x330/0x330
> [    5.007544]  bus_for_each_dev+0x5c/0x90
> [    5.007544]  bus_add_driver+0x196/0x220
> [    5.007544]  driver_register+0x57/0xc0
> [    5.007544]  intel_uncore_init+0x1e3/0x249
> [    5.007544]  ? uncore_type_init+0x193/0x193
> [    5.007544]  ? set_debug_rodata+0x11/0x11
> [    5.007544]  do_one_initcall+0x4b/0x190
> [    5.007544]  kernel_init_freeable+0x16e/0x1f5
> [    5.007544]  ? rest_init+0xd0/0xd0
> [    5.007544]  kernel_init+0xa/0x100
> [    5.007544]  ret_from_fork+0x1f/0x30
> [    5.007544] Code: 48 8b 52 08 48 85 d2 74 0d 89 44 24 04 48 89 df ff d2 8b 44 24 04 48 89 df 89 44 24 04 e8 54 0a 1c 00 8b 44 24 0
> [    5.007544] ---[ end trace 4dc4c3d5f5afcd2f ]---
> [    5.244504] bdx_uncore: probe of 0000:ff:08.2 failed with error -22
> [    5.251604] bdx_uncore: probe of 0000:ff:0b.1 failed with error -22
> [    5.258711] bdx_uncore: probe of 0000:ff:10.1 failed with error -22
> [    5.265819] bdx_uncore: probe of 0000:ff:14.0 failed with error -22
> [    5.272919] bdx_uncore: probe of 0000:ff:14.1 failed with error -22
> [    5.280019] bdx_uncore: probe of 0000:ff:15.0 failed with error -22
> [    5.287112] bdx_uncore: probe of 0000:ff:15.1 failed with error -22
> [    5.294376] WARNING: CPU: 1 PID: 15 at ../arch/x86/events/intel/uncore.c:1065 uncore_change_type_ctx.isra.5+0xe6/0xf0
> [    5.298362] Modules linked in:
> [    5.298362] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
> [    5.298362] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [    5.298362] task: 00000000ae78bc8f task.stack: 00000000f79660c1
> [    5.298362] RIP: 0010:uncore_change_type_ctx.isra.5+0xe6/0xf0
> [    5.298362] RSP: 0000:ffffad85833b3db8 EFLAGS: 00010213
> [    5.298362] RAX: 0000000000000000 RBX: ffff9857669b0200 RCX: 0000000000000001
> [    5.298362] RDX: ffff985b6f000000 RSI: ffff985b66580400 RDI: ffffffffb0c1ae8c
> [    5.298362] RBP: ffff985b66580400 R08: ffffffffb0c1ae8c R09: 0000000000000001
> [    5.298362] R10: 0000000000000000 R11: 00000000003d0900 R12: 0000000000000000
> [    5.298362] R13: ffffffffffffffff R14: 0000000000000001 R15: 0000000000000008
> [    5.298362] FS:  0000000000000000(0000) GS:ffff985b6f000000(0000) knlGS:0000000000000000
> [    5.298362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.298362] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
> [    5.298362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    5.298362] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    5.298362] Call Trace:
> [    5.298362]  uncore_event_cpu_online+0x283/0x340
> [    5.298362]  ? uncore_event_cpu_offline+0x180/0x180
> [    5.298362]  cpuhp_invoke_callback+0x8c/0x620
> [    5.298362]  ? __schedule+0x1ad/0x6c0
> [    5.298362]  ? sort_range+0x20/0x20
> [    5.298362]  cpuhp_thread_fun+0xbc/0x140
> [    5.298362]  smpboot_thread_fn+0x114/0x1d0
> [    5.298362]  kthread+0x111/0x130
> [    5.298362]  ? kthread_create_on_node+0x40/0x40
> [    5.298362]  ret_from_fork+0x1f/0x30
> [    5.298362] Code: 2a 44 89 73 10 41 83 c4 01 48 81 c5 40 01 00 00 45 3b 20 7c cf 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f f
> [    5.298362] ---[ end trace 4dc4c3d5f5afcd30 ]---
> [    5.504808] Scanning for low memory corruption every 60 seconds
> [    5.512347] Initialise system trusted keyrings
> [    5.517470] workingset: timestamp_bits=40 max_order=23 bucket_order=0
> [    5.524840] BUG: unable to handle kernel paging request at 0000000023314bf4
> [    5.528761] IP: __kmalloc_track_caller+0xa8/0x210
> [    5.528761] PGD 185c0a067 P4D 185c0a067 PUD 185c0c067 PMD 0
> [    5.528761] Oops: 0000 [#1] PREEMPT SMP
> [    5.528761] Modules linked in:
> [    5.528761] CPU: 14 PID: 1 Comm: swapper/0 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
> [    5.528761] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
> [    5.528761] task: 000000009e842725 task.stack: 000000008a63fd2d
> [    5.528761] RIP: 0010:__kmalloc_track_caller+0xa8/0x210
> [    5.528761] RSP: 0000:ffffad8580163d58 EFLAGS: 00010286
> [    5.528761] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000012ce0e
> [    5.528761] RDX: 000000000012cd0e RSI: 000000000012cd0e RDI: 000000000001dde0
> [    5.528761] RBP: ffff985700000001 R08: ffff98576f407c00 R09: ffffffffb071edbf
> [    5.528761] R10: ffffd54de1995600 R11: ffff985b6655915f R12: 0000000000000004
> [    5.528761] R13: 00000000014000c0 R14: ffffffffb026c239 R15: ffff98576f407c00
> [    5.528761] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
> [    5.528761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.528761] CR2: ffffffffffffffff CR3: 0000000185c09001 CR4: 00000000003606e0
> [    5.528761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    5.528761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    5.528761] Call Trace:
> [    5.528761]  kstrdup+0x2d/0x60
> [    5.528761]  __kernfs_new_node+0x29/0x130
> [    5.528761]  kernfs_new_node+0x24/0x50
> [    5.528761]  kernfs_create_link+0x29/0x90
> [    5.528761]  sysfs_do_create_link_sd.isra.0+0x5d/0xc0
> [    5.528761]  sysfs_slab_add+0x1f5/0x270
> [    5.528761]  ? set_debug_rodata+0x11/0x11
> [    5.528761]  slab_sysfs_init+0x8b/0xfa
> [    5.528761]  ? kmem_cache_init+0xf9/0xf9
> [    5.528761]  do_one_initcall+0x4b/0x190
> [    5.528761]  kernel_init_freeable+0x16e/0x1f5
> [    5.528761]  ? rest_init+0xd0/0xd0
> [    5.528761]  kernel_init+0xa/0x100
> [    5.528761]  ret_from_fork+0x1f/0x30
> [    5.528761] Code: 49 63 47 20 49 8b 3f 48 8d 8a 00 01 00 00 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 48 85 db 7
> [    5.528761] RIP: __kmalloc_track_caller+0xa8/0x210 RSP: ffffad8580163d58
> [    5.528761] CR2: ffffffffffffffff
> [    5.528761] ---[ end trace 4dc4c3d5f5afcd31 ]---
> [    5.773089] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [    5.773089]
> [    5.777076] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

Yes, I'm getting that as well (v4.15-rc2-772-gcdc0974f10cf).

Did you bisect it? I haven't got around yet.


Björn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: x86 boot broken on -rc1?
  2017-12-13 19:37 ` x86 boot broken on -rc1? Björn Töpel
@ 2017-12-13 19:58   ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2017-12-13 19:58 UTC (permalink / raw)
  To: Björn Töpel; +Cc: LKML, netdev

On Wed, 13 Dec 2017 20:37:02 +0100, Björn Töpel wrote:
> 2017-12-02 1:39 GMT+01:00 Jakub Kicinski:
> > [    5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009  
> 
> Yes, I'm getting that as well (v4.15-rc2-772-gcdc0974f10cf).
> 
> Did you bisect it? I haven't got around yet.

Yup, it's fixed but I'm not sure who far it trickled down the various
trees:

947134d9b00f ("x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation")

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-12-13 19:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-02  0:39 x86 boot broken on -rc1? Jakub Kicinski
2017-12-04  1:28 ` [bisected] x86 boot still broken on -rc2 Jakub Kicinski
2017-12-04 12:28   ` Prarit Bhargava
2017-12-04 13:13     ` Prarit Bhargava
2017-12-04 16:45       ` Prarit Bhargava
2017-12-04 19:48         ` Jakub Kicinski
2017-12-04 22:10         ` [tip:x86/urgent] x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation tip-bot for Prarit Bhargava
2017-12-07  9:40         ` tip-bot for Prarit Bhargava
2017-12-13 19:37 ` x86 boot broken on -rc1? Björn Töpel
2017-12-13 19:58   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).