All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Sudeep Holla <sudeep.holla@arm.com>
Cc: linux-kernel@vger.kernel.org, conor.dooley@microchip.com,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Pierre Gondois <pierre.gondois@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-riscv@lists.infradead.org
Subject: Re: [PATCH -next] arch_topology: Fix cache attributes detection in the CPU hotplug path
Date: Mon, 18 Jul 2022 10:41:51 -0700	[thread overview]
Message-ID: <20220718174151.GA462603@roeck-us.net> (raw)
In-Reply-To: <20220713133344.1201247-1-sudeep.holla@arm.com>

On Wed, Jul 13, 2022 at 02:33:44PM +0100, Sudeep Holla wrote:
> init_cpu_topology() is called only once at the boot and all the cache
> attributes are detected early for all the possible CPUs. However when
> the CPUs are hotplugged out, the cacheinfo gets removed. While the
> attributes are added back when the CPUs are hotplugged back in as part
> of CPU hotplug state machine, it ends up called quite late after the
> update_siblings_masks() are called in the secondary_start_kernel()
> resulting in wrong llc_sibling_masks.
> 
> Move the call to detect_cache_attributes() inside update_siblings_masks()
> to ensure the cacheinfo is updated before the LLC sibling masks are
> updated. This will fix the incorrect LLC sibling masks generated when
> the CPUs are hotplugged out and hotplugged back in again.
> 
> Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
>  drivers/base/arch_topology.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> Hi Conor,
> 
> Ionela reported an issue with the CPU hotplug and as a fix I need to
> move the call to detect_cache_attributes() which I had thought to keep
> it there from first but for no reason had moved it to init_cpu_topology().
> 
> Wonder if this fixes the -ENOMEM on RISC-V as this one is called on the
> cpu in the secondary CPUs init path while init_cpu_topology executed
> detect_cache_attributes() for all possible CPUs much earlier. I think
> this might help as the percpu memory might be initialised in this case.
> 
> Anyways give this a try, also test the CPU hotplug and check if nothing
> is broken on RISC-V. We noticed this bug only on one platform while
> 

arm64, with next-20220718:

...
[    0.823405] Detected PIPT I-cache on CPU1
[    0.824456] BUG: sleeping function called from invalid context at kernel/locking/semaphore.c:164
[    0.824550] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
[    0.824600] preempt_count: 1, expected: 0
[    0.824633] RCU nest depth: 0, expected: 0
[    0.824899] no locks held by swapper/1/0.
[    0.825035] irq event stamp: 0
[    0.825072] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.826017] hardirqs last disabled at (0): [<ffff800008158870>] copy_process+0x5e0/0x18e4
[    0.826123] softirqs last  enabled at (0): [<ffff800008158870>] copy_process+0x5e0/0x18e4
[    0.826191] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.826764] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-rc7-next-20220718 #1
[    0.827397] Call trace:
[    0.827456]  dump_backtrace.part.0+0xd4/0xe0
[    0.827574]  show_stack+0x18/0x50
[    0.827625]  dump_stack_lvl+0x9c/0xd8
[    0.827678]  dump_stack+0x18/0x34
[    0.827722]  __might_resched+0x178/0x220
[    0.827778]  __might_sleep+0x48/0x80
[    0.827833]  down_timeout+0x2c/0xa0
[    0.827896]  acpi_os_wait_semaphore+0x68/0x9c
[    0.827952]  acpi_ut_acquire_mutex+0x4c/0xb8
[    0.828008]  acpi_get_table+0x38/0xbc
[    0.828059]  acpi_find_last_cache_level+0x44/0x130
[    0.828112]  init_cache_level+0xb8/0xcc
[    0.828165]  detect_cache_attributes+0x240/0x580
[    0.828217]  update_siblings_masks+0x28/0x270
[    0.828270]  store_cpu_topology+0x64/0x74
[    0.828326]  secondary_start_kernel+0xd0/0x150
[    0.828386]  __secondary_switched+0xb0/0xb4

I know the problem has already been reported, but I think the backtrace
above is slightly different.

Guenter

WARNING: multiple messages have this Message-ID (diff)
From: Guenter Roeck <linux@roeck-us.net>
To: Sudeep Holla <sudeep.holla@arm.com>
Cc: linux-kernel@vger.kernel.org, conor.dooley@microchip.com,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Pierre Gondois <pierre.gondois@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-riscv@lists.infradead.org
Subject: Re: [PATCH -next] arch_topology: Fix cache attributes detection in the CPU hotplug path
Date: Mon, 18 Jul 2022 10:41:51 -0700	[thread overview]
Message-ID: <20220718174151.GA462603@roeck-us.net> (raw)
In-Reply-To: <20220713133344.1201247-1-sudeep.holla@arm.com>

On Wed, Jul 13, 2022 at 02:33:44PM +0100, Sudeep Holla wrote:
> init_cpu_topology() is called only once at the boot and all the cache
> attributes are detected early for all the possible CPUs. However when
> the CPUs are hotplugged out, the cacheinfo gets removed. While the
> attributes are added back when the CPUs are hotplugged back in as part
> of CPU hotplug state machine, it ends up called quite late after the
> update_siblings_masks() are called in the secondary_start_kernel()
> resulting in wrong llc_sibling_masks.
> 
> Move the call to detect_cache_attributes() inside update_siblings_masks()
> to ensure the cacheinfo is updated before the LLC sibling masks are
> updated. This will fix the incorrect LLC sibling masks generated when
> the CPUs are hotplugged out and hotplugged back in again.
> 
> Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
>  drivers/base/arch_topology.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> Hi Conor,
> 
> Ionela reported an issue with the CPU hotplug and as a fix I need to
> move the call to detect_cache_attributes() which I had thought to keep
> it there from first but for no reason had moved it to init_cpu_topology().
> 
> Wonder if this fixes the -ENOMEM on RISC-V as this one is called on the
> cpu in the secondary CPUs init path while init_cpu_topology executed
> detect_cache_attributes() for all possible CPUs much earlier. I think
> this might help as the percpu memory might be initialised in this case.
> 
> Anyways give this a try, also test the CPU hotplug and check if nothing
> is broken on RISC-V. We noticed this bug only on one platform while
> 

arm64, with next-20220718:

...
[    0.823405] Detected PIPT I-cache on CPU1
[    0.824456] BUG: sleeping function called from invalid context at kernel/locking/semaphore.c:164
[    0.824550] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
[    0.824600] preempt_count: 1, expected: 0
[    0.824633] RCU nest depth: 0, expected: 0
[    0.824899] no locks held by swapper/1/0.
[    0.825035] irq event stamp: 0
[    0.825072] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.826017] hardirqs last disabled at (0): [<ffff800008158870>] copy_process+0x5e0/0x18e4
[    0.826123] softirqs last  enabled at (0): [<ffff800008158870>] copy_process+0x5e0/0x18e4
[    0.826191] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.826764] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-rc7-next-20220718 #1
[    0.827397] Call trace:
[    0.827456]  dump_backtrace.part.0+0xd4/0xe0
[    0.827574]  show_stack+0x18/0x50
[    0.827625]  dump_stack_lvl+0x9c/0xd8
[    0.827678]  dump_stack+0x18/0x34
[    0.827722]  __might_resched+0x178/0x220
[    0.827778]  __might_sleep+0x48/0x80
[    0.827833]  down_timeout+0x2c/0xa0
[    0.827896]  acpi_os_wait_semaphore+0x68/0x9c
[    0.827952]  acpi_ut_acquire_mutex+0x4c/0xb8
[    0.828008]  acpi_get_table+0x38/0xbc
[    0.828059]  acpi_find_last_cache_level+0x44/0x130
[    0.828112]  init_cache_level+0xb8/0xcc
[    0.828165]  detect_cache_attributes+0x240/0x580
[    0.828217]  update_siblings_masks+0x28/0x270
[    0.828270]  store_cpu_topology+0x64/0x74
[    0.828326]  secondary_start_kernel+0xd0/0x150
[    0.828386]  __secondary_switched+0xb0/0xb4

I know the problem has already been reported, but I think the backtrace
above is slightly different.

Guenter

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Guenter Roeck <linux@roeck-us.net>
To: Sudeep Holla <sudeep.holla@arm.com>
Cc: linux-kernel@vger.kernel.org, conor.dooley@microchip.com,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Pierre Gondois <pierre.gondois@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-riscv@lists.infradead.org
Subject: Re: [PATCH -next] arch_topology: Fix cache attributes detection in the CPU hotplug path
Date: Mon, 18 Jul 2022 10:41:51 -0700	[thread overview]
Message-ID: <20220718174151.GA462603@roeck-us.net> (raw)
In-Reply-To: <20220713133344.1201247-1-sudeep.holla@arm.com>

On Wed, Jul 13, 2022 at 02:33:44PM +0100, Sudeep Holla wrote:
> init_cpu_topology() is called only once at the boot and all the cache
> attributes are detected early for all the possible CPUs. However when
> the CPUs are hotplugged out, the cacheinfo gets removed. While the
> attributes are added back when the CPUs are hotplugged back in as part
> of CPU hotplug state machine, it ends up called quite late after the
> update_siblings_masks() are called in the secondary_start_kernel()
> resulting in wrong llc_sibling_masks.
> 
> Move the call to detect_cache_attributes() inside update_siblings_masks()
> to ensure the cacheinfo is updated before the LLC sibling masks are
> updated. This will fix the incorrect LLC sibling masks generated when
> the CPUs are hotplugged out and hotplugged back in again.
> 
> Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
>  drivers/base/arch_topology.c | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> Hi Conor,
> 
> Ionela reported an issue with the CPU hotplug and as a fix I need to
> move the call to detect_cache_attributes() which I had thought to keep
> it there from first but for no reason had moved it to init_cpu_topology().
> 
> Wonder if this fixes the -ENOMEM on RISC-V as this one is called on the
> cpu in the secondary CPUs init path while init_cpu_topology executed
> detect_cache_attributes() for all possible CPUs much earlier. I think
> this might help as the percpu memory might be initialised in this case.
> 
> Anyways give this a try, also test the CPU hotplug and check if nothing
> is broken on RISC-V. We noticed this bug only on one platform while
> 

arm64, with next-20220718:

...
[    0.823405] Detected PIPT I-cache on CPU1
[    0.824456] BUG: sleeping function called from invalid context at kernel/locking/semaphore.c:164
[    0.824550] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
[    0.824600] preempt_count: 1, expected: 0
[    0.824633] RCU nest depth: 0, expected: 0
[    0.824899] no locks held by swapper/1/0.
[    0.825035] irq event stamp: 0
[    0.825072] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.826017] hardirqs last disabled at (0): [<ffff800008158870>] copy_process+0x5e0/0x18e4
[    0.826123] softirqs last  enabled at (0): [<ffff800008158870>] copy_process+0x5e0/0x18e4
[    0.826191] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.826764] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-rc7-next-20220718 #1
[    0.827397] Call trace:
[    0.827456]  dump_backtrace.part.0+0xd4/0xe0
[    0.827574]  show_stack+0x18/0x50
[    0.827625]  dump_stack_lvl+0x9c/0xd8
[    0.827678]  dump_stack+0x18/0x34
[    0.827722]  __might_resched+0x178/0x220
[    0.827778]  __might_sleep+0x48/0x80
[    0.827833]  down_timeout+0x2c/0xa0
[    0.827896]  acpi_os_wait_semaphore+0x68/0x9c
[    0.827952]  acpi_ut_acquire_mutex+0x4c/0xb8
[    0.828008]  acpi_get_table+0x38/0xbc
[    0.828059]  acpi_find_last_cache_level+0x44/0x130
[    0.828112]  init_cache_level+0xb8/0xcc
[    0.828165]  detect_cache_attributes+0x240/0x580
[    0.828217]  update_siblings_masks+0x28/0x270
[    0.828270]  store_cpu_topology+0x64/0x74
[    0.828326]  secondary_start_kernel+0xd0/0x150
[    0.828386]  __secondary_switched+0xb0/0xb4

I know the problem has already been reported, but I think the backtrace
above is slightly different.

Guenter

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2022-07-18 17:41 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13 13:33 [PATCH -next] arch_topology: Fix cache attributes detection in the CPU hotplug path Sudeep Holla
2022-07-13 13:33 ` Sudeep Holla
2022-07-13 13:33 ` Sudeep Holla
2022-07-13 14:03 ` Greg Kroah-Hartman
2022-07-13 14:03   ` Greg Kroah-Hartman
2022-07-13 14:03   ` Greg Kroah-Hartman
2022-07-13 14:18   ` Sudeep Holla
2022-07-13 14:18     ` Sudeep Holla
2022-07-13 14:18     ` Sudeep Holla
2022-07-13 16:04 ` Conor.Dooley
2022-07-13 16:04   ` Conor.Dooley
2022-07-13 16:04   ` Conor.Dooley
2022-07-14 14:17 ` Conor.Dooley
2022-07-14 14:17   ` Conor.Dooley
2022-07-14 14:17   ` Conor.Dooley
2022-07-14 15:01   ` Sudeep Holla
2022-07-14 15:01     ` Sudeep Holla
2022-07-14 15:01     ` Sudeep Holla
2022-07-14 15:27     ` Conor.Dooley
2022-07-14 15:27       ` Conor.Dooley
2022-07-14 15:27       ` Conor.Dooley
2022-07-14 16:00       ` Sudeep Holla
2022-07-14 16:00         ` Sudeep Holla
2022-07-14 16:00         ` Sudeep Holla
2022-07-14 16:10         ` Conor.Dooley
2022-07-14 16:10           ` Conor.Dooley
2022-07-14 16:10           ` Conor.Dooley
2022-07-15  9:11           ` Sudeep Holla
2022-07-15  9:11             ` Sudeep Holla
2022-07-15  9:11             ` Sudeep Holla
2022-07-15  9:16             ` Conor.Dooley
2022-07-15  9:16               ` Conor.Dooley
2022-07-15  9:16               ` Conor.Dooley
2022-07-15 14:04               ` Conor.Dooley
2022-07-15 14:04                 ` Conor.Dooley
2022-07-15 14:04                 ` Conor.Dooley
2022-07-15 15:41                 ` Sudeep Holla
2022-07-15 15:41                   ` Sudeep Holla
2022-07-15 15:41                   ` Sudeep Holla
2022-07-14 17:52 ` Ionela Voinescu
2022-07-14 17:52   ` Ionela Voinescu
2022-07-14 17:52   ` Ionela Voinescu
2022-07-18 17:41 ` Guenter Roeck [this message]
2022-07-18 17:41   ` Guenter Roeck
2022-07-18 17:41   ` Guenter Roeck
2022-07-18 17:57   ` Conor.Dooley
2022-07-18 17:57     ` Conor.Dooley
2022-07-18 17:57     ` Conor.Dooley
2022-07-19 10:29     ` Sudeep Holla
2022-07-19 10:29       ` Sudeep Holla
2022-07-19 10:29       ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220718174151.GA462603@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=conor.dooley@microchip.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ionela.voinescu@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=pierre.gondois@arm.com \
    --cc=sudeep.holla@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.