All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
@ 2022-07-12  5:38 Nick Bowler
  2023-01-20  3:15 ` Nick Bowler
  2023-01-21 13:24 ` Linux kernel regression tracking (#adding)
  0 siblings, 2 replies; 12+ messages in thread
From: Nick Bowler @ 2022-07-12  5:38 UTC (permalink / raw)
  To: sparclinux; +Cc: Atish Patra

Hi,

When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
CPUs, I noticed that only CPU 0 comes up, while older kernels (including
4.7) are working fine with both CPUs.

I bisected the failure to this commit:

  9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
  commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
  Author: Atish Patra <atish.patra@oracle.com>
  Date:   Thu Sep 15 14:54:40 2016 -0600

      sparc64: Fix cpu_possible_mask if nr_cpus is set

This is a small change that reverts very easily on top of 5.18: there is
just one trivial conflict.  Once reverted, both CPUs work again.

Maybe this is related to the fact that the CPUs on this system are
numbered CPU0 and CPU2 (there is no CPU1)?

Here is /proc/cpuinfo on a working kernel:

    % cat /proc/cpuinfo
    cpu             : TI UltraSparc II  (BlackBird)
    fpu             : UltraSparc II integrated FPU
    pmu             : ultra12
    prom            : OBP 3.23.1 1999/07/16 12:08
    type            : sun4u
    ncpus probed    : 2
    ncpus active    : 2
    D$ parity tl1   : 0
    I$ parity tl1   : 0
    cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
    Cpu0ClkTck      : 000000001ad31b4f
    Cpu2ClkTck      : 000000001ad31b4f
    MMU Type        : Spitfire
    MMU PGSZs       : 8K,64K,512K,4MB
    State:
    CPU0:           online
    CPU2:           online

And on a broken kernel:

    % cat /proc/cpuinfo
    cpu             : TI UltraSparc II  (BlackBird)
    fpu             : UltraSparc II integrated FPU
    pmu             : ultra12
    prom            : OBP 3.23.1 1999/07/16 12:08
    type            : sun4u
    ncpus probed    : 2
    ncpus active    : 1
    D$ parity tl1   : 0
    I$ parity tl1   : 0
    cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
    Cpu0ClkTck      : 000000001ad31861
    MMU Type        : Spitfire
    MMU PGSZs       : 8K,64K,512K,4MB
    State:
    CPU0:           online

Let me know if you need any more info.

Thanks,
  Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2022-07-12  5:38 PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression) Nick Bowler
@ 2023-01-20  3:15 ` Nick Bowler
  2023-01-21 13:31   ` Linux kernel regression tracking (Thorsten Leemhuis)
  2023-01-21 13:24 ` Linux kernel regression tracking (#adding)
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Bowler @ 2023-01-20  3:15 UTC (permalink / raw)
  To: sparclinux; +Cc: linux-kernel

Hi,

I'm resending this report CC'd to linux-kernel as there was no response
on the sparclinux list.

I tried 6.2-rc4 and there is no change in behaviour.  Reverting the
indicated commit still works to fix the problem.

On 2022-07-12, Nick Bowler <nbowler@draconx.ca> wrote:
> When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
> CPUs, I noticed that only CPU 0 comes up, while older kernels (including
> 4.7) are working fine with both CPUs.
>
> I bisected the failure to this commit:
>
>   9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
>   commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
>   Author: Atish Patra <atish.patra@oracle.com>
>   Date:   Thu Sep 15 14:54:40 2016 -0600
>
>       sparc64: Fix cpu_possible_mask if nr_cpus is set
>
> This is a small change that reverts very easily on top of 5.18: there is
> just one trivial conflict.  Once reverted, both CPUs work again.
>
> Maybe this is related to the fact that the CPUs on this system are
> numbered CPU0 and CPU2 (there is no CPU1)?
>
> Here is /proc/cpuinfo on a working kernel:
>
>     % cat /proc/cpuinfo
>     cpu             : TI UltraSparc II  (BlackBird)
>     fpu             : UltraSparc II integrated FPU
>     pmu             : ultra12
>     prom            : OBP 3.23.1 1999/07/16 12:08
>     type            : sun4u
>     ncpus probed    : 2
>     ncpus active    : 2
>     D$ parity tl1   : 0
>     I$ parity tl1   : 0
>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>     Cpu0ClkTck      : 000000001ad31b4f
>     Cpu2ClkTck      : 000000001ad31b4f
>     MMU Type        : Spitfire
>     MMU PGSZs       : 8K,64K,512K,4MB
>     State:
>     CPU0:           online
>     CPU2:           online
>
> And on a broken kernel:
>
>     % cat /proc/cpuinfo
>     cpu             : TI UltraSparc II  (BlackBird)
>     fpu             : UltraSparc II integrated FPU
>     pmu             : ultra12
>     prom            : OBP 3.23.1 1999/07/16 12:08
>     type            : sun4u
>     ncpus probed    : 2
>     ncpus active    : 1
>     D$ parity tl1   : 0
>     I$ parity tl1   : 0
>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>     Cpu0ClkTck      : 000000001ad31861
>     MMU Type        : Spitfire
>     MMU PGSZs       : 8K,64K,512K,4MB
>     State:
>     CPU0:           online
>
> Let me know if you need any more info.
>
> Thanks,
>   Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2022-07-12  5:38 PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression) Nick Bowler
  2023-01-20  3:15 ` Nick Bowler
@ 2023-01-21 13:24 ` Linux kernel regression tracking (#adding)
  1 sibling, 0 replies; 12+ messages in thread
From: Linux kernel regression tracking (#adding) @ 2023-01-21 13:24 UTC (permalink / raw)
  To: Nick Bowler, sparclinux; +Cc: Atish Patra

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 12.07.22 07:38, Nick Bowler wrote:
> Hi,
> 
> When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
> CPUs, I noticed that only CPU 0 comes up, while older kernels (including
> 4.7) are working fine with both CPUs.
> 
> I bisected the failure to this commit:
> 
>   9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
>   commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
>   Author: Atish Patra <atish.patra@oracle.com>
>   Date:   Thu Sep 15 14:54:40 2016 -0600
> 
>       sparc64: Fix cpu_possible_mask if nr_cpus is set
> 
> This is a small change that reverts very easily on top of 5.18: there is
> just one trivial conflict.  Once reverted, both CPUs work again.
> 
> Maybe this is related to the fact that the CPUs on this system are
> numbered CPU0 and CPU2 (there is no CPU1)?
> 
> Here is /proc/cpuinfo on a working kernel:
> 
>     % cat /proc/cpuinfo
>     cpu             : TI UltraSparc II  (BlackBird)
>     fpu             : UltraSparc II integrated FPU
>     pmu             : ultra12
>     prom            : OBP 3.23.1 1999/07/16 12:08
>     type            : sun4u
>     ncpus probed    : 2
>     ncpus active    : 2
>     D$ parity tl1   : 0
>     I$ parity tl1   : 0
>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>     Cpu0ClkTck      : 000000001ad31b4f
>     Cpu2ClkTck      : 000000001ad31b4f
>     MMU Type        : Spitfire
>     MMU PGSZs       : 8K,64K,512K,4MB
>     State:
>     CPU0:           online
>     CPU2:           online
> 
> And on a broken kernel:
> 
>     % cat /proc/cpuinfo
>     cpu             : TI UltraSparc II  (BlackBird)
>     fpu             : UltraSparc II integrated FPU
>     pmu             : ultra12
>     prom            : OBP 3.23.1 1999/07/16 12:08
>     type            : sun4u
>     ncpus probed    : 2
>     ncpus active    : 1
>     D$ parity tl1   : 0
>     I$ parity tl1   : 0
>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>     Cpu0ClkTck      : 000000001ad31861
>     MMU Type        : Spitfire
>     MMU PGSZs       : 8K,64K,512K,4MB
>     State:
>     CPU0:           online
> 
> Let me know if you need any more info.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 9b2f753ec23710aa32c0d837
#regzbot title sparc: only one CPU active on Ultra 60
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2023-01-20  3:15 ` Nick Bowler
@ 2023-01-21 13:31   ` Linux kernel regression tracking (Thorsten Leemhuis)
  2024-03-22  4:57     ` Nick Bowler
  0 siblings, 1 reply; 12+ messages in thread
From: Linux kernel regression tracking (Thorsten Leemhuis) @ 2023-01-21 13:31 UTC (permalink / raw)
  To: Nick Bowler, sparclinux
  Cc: linux-kernel, David S. Miller, Linux kernel regressions list

CCing the sparc maintainer. Also CCing the regression list, as it should
be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html

The the mail address of the culprit's author bounces. There is another
Atish Patra still active; does anyone known if those two are the same
person?

Anyway, that's it from my side.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

On 20.01.23 04:15, Nick Bowler wrote:
> Hi,
> 
> I'm resending this report CC'd to linux-kernel as there was no response
> on the sparclinux list.
> 
> I tried 6.2-rc4 and there is no change in behaviour.  Reverting the
> indicated commit still works to fix the problem.
> 
> On 2022-07-12, Nick Bowler <nbowler@draconx.ca> wrote:
>> When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
>> CPUs, I noticed that only CPU 0 comes up, while older kernels (including
>> 4.7) are working fine with both CPUs.
>>
>> I bisected the failure to this commit:
>>
>>   9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
>>   commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
>>   Author: Atish Patra <atish.patra@oracle.com>
>>   Date:   Thu Sep 15 14:54:40 2016 -0600
>>
>>       sparc64: Fix cpu_possible_mask if nr_cpus is set
>>
>> This is a small change that reverts very easily on top of 5.18: there is
>> just one trivial conflict.  Once reverted, both CPUs work again.
>>
>> Maybe this is related to the fact that the CPUs on this system are
>> numbered CPU0 and CPU2 (there is no CPU1)?
>>
>> Here is /proc/cpuinfo on a working kernel:
>>
>>     % cat /proc/cpuinfo
>>     cpu             : TI UltraSparc II  (BlackBird)
>>     fpu             : UltraSparc II integrated FPU
>>     pmu             : ultra12
>>     prom            : OBP 3.23.1 1999/07/16 12:08
>>     type            : sun4u
>>     ncpus probed    : 2
>>     ncpus active    : 2
>>     D$ parity tl1   : 0
>>     I$ parity tl1   : 0
>>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>     Cpu0ClkTck      : 000000001ad31b4f
>>     Cpu2ClkTck      : 000000001ad31b4f
>>     MMU Type        : Spitfire
>>     MMU PGSZs       : 8K,64K,512K,4MB
>>     State:
>>     CPU0:           online
>>     CPU2:           online
>>
>> And on a broken kernel:
>>
>>     % cat /proc/cpuinfo
>>     cpu             : TI UltraSparc II  (BlackBird)
>>     fpu             : UltraSparc II integrated FPU
>>     pmu             : ultra12
>>     prom            : OBP 3.23.1 1999/07/16 12:08
>>     type            : sun4u
>>     ncpus probed    : 2
>>     ncpus active    : 1
>>     D$ parity tl1   : 0
>>     I$ parity tl1   : 0
>>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>     Cpu0ClkTck      : 000000001ad31861
>>     MMU Type        : Spitfire
>>     MMU PGSZs       : 8K,64K,512K,4MB
>>     State:
>>     CPU0:           online
>>
>> Let me know if you need any more info.
>>
>> Thanks,
>>   Nick
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2023-01-21 13:31   ` Linux kernel regression tracking (Thorsten Leemhuis)
@ 2024-03-22  4:57     ` Nick Bowler
  2024-03-28 19:36       ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Bowler @ 2024-03-22  4:57 UTC (permalink / raw)
  To: Linux regressions mailing list; +Cc: linux-kernel, David S. Miller, sparclinux

Hi,

Just a friendly reminder that this issue still happens on Linux 6.8 and
reverting commit 9b2f753ec237 as indicated below is still sufficient to
resolve the problem.

On 2023-01-21 08:31, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
> CCing the sparc maintainer. Also CCing the regression list, as it should
> be in the loop for regressions:
> https://docs.kernel.org/admin-guide/reporting-regressions.html
> 
> The the mail address of the culprit's author bounces. There is another
> Atish Patra still active; does anyone known if those two are the same
> person?
> 
> Anyway, that's it from my side.
[...]
> On 20.01.23 04:15, Nick Bowler wrote:
>> Hi,
>> 
>> I'm resending this report CC'd to linux-kernel as there was no response
>> on the sparclinux list.
>> 
>> I tried 6.2-rc4 and there is no change in behaviour.  Reverting the
>> indicated commit still works to fix the problem.
>> 
>> On 2022-07-12, Nick Bowler <nbowler@draconx.ca> wrote:
>>> When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
>>> CPUs, I noticed that only CPU 0 comes up, while older kernels (including
>>> 4.7) are working fine with both CPUs.
>>>
>>> I bisected the failure to this commit:
>>>
>>>   9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
>>>   commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
>>>   Author: Atish Patra <atish.patra@oracle.com>
>>>   Date:   Thu Sep 15 14:54:40 2016 -0600
>>>
>>>       sparc64: Fix cpu_possible_mask if nr_cpus is set
>>>
>>> This is a small change that reverts very easily on top of 5.18: there is
>>> just one trivial conflict.  Once reverted, both CPUs work again.
>>>
>>> Maybe this is related to the fact that the CPUs on this system are
>>> numbered CPU0 and CPU2 (there is no CPU1)?
>>>
>>> Here is /proc/cpuinfo on a working kernel:
>>>
>>>     % cat /proc/cpuinfo
>>>     cpu             : TI UltraSparc II  (BlackBird)
>>>     fpu             : UltraSparc II integrated FPU
>>>     pmu             : ultra12
>>>     prom            : OBP 3.23.1 1999/07/16 12:08
>>>     type            : sun4u
>>>     ncpus probed    : 2
>>>     ncpus active    : 2
>>>     D$ parity tl1   : 0
>>>     I$ parity tl1   : 0
>>>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>>     Cpu0ClkTck      : 000000001ad31b4f
>>>     Cpu2ClkTck      : 000000001ad31b4f
>>>     MMU Type        : Spitfire
>>>     MMU PGSZs       : 8K,64K,512K,4MB
>>>     State:
>>>     CPU0:           online
>>>     CPU2:           online
>>>
>>> And on a broken kernel:
>>>
>>>     % cat /proc/cpuinfo
>>>     cpu             : TI UltraSparc II  (BlackBird)
>>>     fpu             : UltraSparc II integrated FPU
>>>     pmu             : ultra12
>>>     prom            : OBP 3.23.1 1999/07/16 12:08
>>>     type            : sun4u
>>>     ncpus probed    : 2
>>>     ncpus active    : 1
>>>     D$ parity tl1   : 0
>>>     I$ parity tl1   : 0
>>>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>>     Cpu0ClkTck      : 000000001ad31861
>>>     MMU Type        : Spitfire
>>>     MMU PGSZs       : 8K,64K,512K,4MB
>>>     State:
>>>     CPU0:           online
>>>
>>> Let me know if you need any more info.
>>>
>>> Thanks,
>>>   Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2024-03-22  4:57     ` Nick Bowler
@ 2024-03-28 19:36       ` Linux regression tracking (Thorsten Leemhuis)
  2024-03-28 20:09         ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-03-28 19:36 UTC (permalink / raw)
  To: Nick Bowler, Linux regressions mailing list
  Cc: linux-kernel, David S. Miller, sparclinux, Linus Torvalds

[CCing Linus, in case I say something to his disliking]

On 22.03.24 05:57, Nick Bowler wrote:
> 
> Just a friendly reminder that this issue still happens on Linux 6.8 and
> reverting commit 9b2f753ec237 as indicated below is still sufficient to
> resolve the problem.

FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
nr_cpus is set") is from v4.8. Reverting it after all that time might
easily lead to even bigger trouble. That's why it might be better to
handle this like a bug and not like a regression. At least unless we
find someone to judge how likely such an outcome is. But it seems nobody
really cared so far, so unless this mail makes someone act you might be
out of luck. :-/

I wish it was different, but in the end we (including the maintainers)
are all just volunteers here which you can only motivate or compel (up
to some point) to look into some issue, but can not force to do so.

Ciao, Thorsten

> On 2023-01-21 08:31, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
>> CCing the sparc maintainer. Also CCing the regression list, as it should
>> be in the loop for regressions:
>> https://docs.kernel.org/admin-guide/reporting-regressions.html
>>
>> The the mail address of the culprit's author bounces. There is another
>> Atish Patra still active; does anyone known if those two are the same
>> person?
>>
>> Anyway, that's it from my side.
> [...]
>> On 20.01.23 04:15, Nick Bowler wrote:
>>> Hi,
>>>
>>> I'm resending this report CC'd to linux-kernel as there was no response
>>> on the sparclinux list.
>>>
>>> I tried 6.2-rc4 and there is no change in behaviour.  Reverting the
>>> indicated commit still works to fix the problem.
>>>
>>> On 2022-07-12, Nick Bowler <nbowler@draconx.ca> wrote:
>>>> When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
>>>> CPUs, I noticed that only CPU 0 comes up, while older kernels (including
>>>> 4.7) are working fine with both CPUs.
>>>>
>>>> I bisected the failure to this commit:
>>>>
>>>>   9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
>>>>   commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
>>>>   Author: Atish Patra <atish.patra@oracle.com>
>>>>   Date:   Thu Sep 15 14:54:40 2016 -0600
>>>>
>>>>       sparc64: Fix cpu_possible_mask if nr_cpus is set
>>>>
>>>> This is a small change that reverts very easily on top of 5.18: there is
>>>> just one trivial conflict.  Once reverted, both CPUs work again.
>>>>
>>>> Maybe this is related to the fact that the CPUs on this system are
>>>> numbered CPU0 and CPU2 (there is no CPU1)?
>>>>
>>>> Here is /proc/cpuinfo on a working kernel:
>>>>
>>>>     % cat /proc/cpuinfo
>>>>     cpu             : TI UltraSparc II  (BlackBird)
>>>>     fpu             : UltraSparc II integrated FPU
>>>>     pmu             : ultra12
>>>>     prom            : OBP 3.23.1 1999/07/16 12:08
>>>>     type            : sun4u
>>>>     ncpus probed    : 2
>>>>     ncpus active    : 2
>>>>     D$ parity tl1   : 0
>>>>     I$ parity tl1   : 0
>>>>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>>>     Cpu0ClkTck      : 000000001ad31b4f
>>>>     Cpu2ClkTck      : 000000001ad31b4f
>>>>     MMU Type        : Spitfire
>>>>     MMU PGSZs       : 8K,64K,512K,4MB
>>>>     State:
>>>>     CPU0:           online
>>>>     CPU2:           online
>>>>
>>>> And on a broken kernel:
>>>>
>>>>     % cat /proc/cpuinfo
>>>>     cpu             : TI UltraSparc II  (BlackBird)
>>>>     fpu             : UltraSparc II integrated FPU
>>>>     pmu             : ultra12
>>>>     prom            : OBP 3.23.1 1999/07/16 12:08
>>>>     type            : sun4u
>>>>     ncpus probed    : 2
>>>>     ncpus active    : 1
>>>>     D$ parity tl1   : 0
>>>>     I$ parity tl1   : 0
>>>>     cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>>>     Cpu0ClkTck      : 000000001ad31861
>>>>     MMU Type        : Spitfire
>>>>     MMU PGSZs       : 8K,64K,512K,4MB
>>>>     State:
>>>>     CPU0:           online
>>>>
>>>> Let me know if you need any more info.
>>>>
>>>> Thanks,
>>>>   Nick
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2024-03-28 19:36       ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-03-28 20:09         ` Linus Torvalds
  2024-03-28 21:08           ` Nick Bowler
  2024-04-05 15:05           ` Andreas Larsson
  0 siblings, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2024-03-28 20:09 UTC (permalink / raw)
  To: Linux regressions mailing list, Andreas Larsson
  Cc: Nick Bowler, linux-kernel, David S. Miller, sparclinux

On Thu, 28 Mar 2024 at 12:36, Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> [CCing Linus, in case I say something to his disliking]
>
> On 22.03.24 05:57, Nick Bowler wrote:
> >
> > Just a friendly reminder that this issue still happens on Linux 6.8 and
> > reverting commit 9b2f753ec237 as indicated below is still sufficient to
> > resolve the problem.
>
> FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
> nr_cpus is set") is from v4.8. Reverting it after all that time might
> easily lead to even bigger trouble.

I'm definitely not reverting a patch from almost a decade ago as a regression.

If it took that long to find, it can't be that critical of a regression.

So yes, let's treat it as a regular bug. And let's bring in Andreas to
the discussion too (although presumably he has seen it on the
sparclinux mailing list).

Andreas, if not, here's the link to lore for the beginning of the thread:

  https://lore.kernel.org/all/CADyTPEwt=ZNams+1bpMB1F9w_vUdPsGCt92DBQxxq_VtaLoTdw@mail.gmail.com/

And from a quick look I do think that commit is buggy, and yes, the
fix probably is just be to revert it.

As the original report makes clear, that commit 9b2f753ec23710 is
clearly confused about the difference between "number of CPU's", and
"index of CPU numbers".

When that smp_fill_in_cpu_possible_map() does

        int possible_cpus = num_possible_cpus();

and then uses that to fill in &__cpu_possible_mask, that's completely
nonsensical. Because we literally have

  #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
  #define num_possible_cpus()     cpumask_weight(cpu_possible_mask)

so it's reading cpu_possible_mask to figure out how many cpus it might
have, and then using that number to set possibly *different* bits in
the same bitmap that is just used to judge what the max number is.

So I do think a revert is called for, but I'm not going to treat this
as a regression, I'm going to just treat it as "sparc bug" and hope
that the sparc people try to figure out why that crazy code was
written.

And maybe it made more sense back a decade ago than it does now.

Andreas?

                Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2024-03-28 20:09         ` Linus Torvalds
@ 2024-03-28 21:08           ` Nick Bowler
  2024-03-29  9:44             ` Sam Ravnborg
  2024-04-05 15:05           ` Andreas Larsson
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Bowler @ 2024-03-28 21:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, David S. Miller, sparclinux, Andreas Larsson,
	Linux regressions mailing list

On 2024-03-28 16:09, Linus Torvalds wrote:
> On Thu, 28 Mar 2024 at 12:36, Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> [CCing Linus, in case I say something to his disliking]
>>
>> On 22.03.24 05:57, Nick Bowler wrote:
>>>
>>> Just a friendly reminder that this issue still happens on Linux 6.8 and
>>> reverting commit 9b2f753ec237 as indicated below is still sufficient to
>>> resolve the problem.
>>
>> FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
>> nr_cpus is set") is from v4.8. Reverting it after all that time might
>> easily lead to even bigger trouble.
> 
> I'm definitely not reverting a patch from almost a decade ago as a regression.
> 
> If it took that long to find, it can't be that critical of a regression.

FWIW I'm not the first person to notice this problem.  Searching the sparclinux
archive for "ultra 60" which turns up this very similar report[1] from two years
prior to mine which also went nowhere (sadly, this reporter did not perform a
bisection to find the problematic commit -- perhaps because nobody asked).

[1] https://lore.kernel.org/sparclinux/20201009161924.c8f031c079dd852941307870@gmx.de/

Cheers,
  Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2024-03-28 21:08           ` Nick Bowler
@ 2024-03-29  9:44             ` Sam Ravnborg
  2024-03-29 20:11               ` Nick Bowler
  0 siblings, 1 reply; 12+ messages in thread
From: Sam Ravnborg @ 2024-03-29  9:44 UTC (permalink / raw)
  To: Nick Bowler
  Cc: Linus Torvalds, linux-kernel, David S. Miller, sparclinux,
	Andreas Larsson, Linux regressions mailing list

Hi Nick,

On Thu, Mar 28, 2024 at 05:08:50PM -0400, Nick Bowler wrote:
> On 2024-03-28 16:09, Linus Torvalds wrote:
> > On Thu, 28 Mar 2024 at 12:36, Linux regression tracking (Thorsten
> > Leemhuis) <regressions@leemhuis.info> wrote:
> >>
> >> [CCing Linus, in case I say something to his disliking]
> >>
> >> On 22.03.24 05:57, Nick Bowler wrote:
> >>>
> >>> Just a friendly reminder that this issue still happens on Linux 6.8 and
> >>> reverting commit 9b2f753ec237 as indicated below is still sufficient to
> >>> resolve the problem.
> >>
> >> FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
> >> nr_cpus is set") is from v4.8. Reverting it after all that time might
> >> easily lead to even bigger trouble.
> > 
> > I'm definitely not reverting a patch from almost a decade ago as a regression.
> > 
> > If it took that long to find, it can't be that critical of a regression.
> 
> FWIW I'm not the first person to notice this problem.  Searching the sparclinux
> archive for "ultra 60" which turns up this very similar report[1] from two years
> prior to mine which also went nowhere (sadly, this reporter did not perform a
> bisection to find the problematic commit -- perhaps because nobody asked).
> 
> [1] https://lore.kernel.org/sparclinux/20201009161924.c8f031c079dd852941307870@gmx.de/

I took a look at this and may have a fix. Could you try the following
patch. It builds - but I have not tested it.

	Sam


From a0fb7c6e6817849550d07b4c5a354ccc58382bc1 Mon Sep 17 00:00:00 2001
From: Sam Ravnborg <sam@ravnborg.org>
Date: Fri, 29 Mar 2024 10:34:07 +0100
Subject: [PATCH] sparc64: Fix number of online CPUs

Nick Bowler reported:
    When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
    CPUs, I noticed that only CPU 0 comes up, while older kernels (including
    4.7) are working fine with both CPUs.

      I bisected the failure to this commit:

      9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
      commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
      Author: Atish Patra <atish.patra@oracle.com>
      Date:   Thu Sep 15 14:54:40 2016 -0600

      sparc64: Fix cpu_possible_mask if nr_cpus is set

    This is a small change that reverts very easily on top of 5.18: there is
    just one trivial conflict.  Once reverted, both CPUs work again.

    Maybe this is related to the fact that the CPUs on this system are
    numbered CPU0 and CPU2 (there is no CPU1)?

The current code that adjust cpu_possible based on nr_cpu_ids do not
take into account that CPU's may not come one after each other.
Move the check to the function that setup the cpu_possible mask
so there is no need to adjust it later.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reported-by: Nick Bowler <nbowler@draconx.ca>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "David S. Miller" <davem@davemloft.net>
---
 arch/sparc/include/asm/smp_64.h |  2 --
 arch/sparc/kernel/prom_64.c     |  4 +++-
 arch/sparc/kernel/setup_64.c    |  1 -
 arch/sparc/kernel/smp_64.c      | 14 --------------
 4 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/arch/sparc/include/asm/smp_64.h b/arch/sparc/include/asm/smp_64.h
index 505b6700805d..0964fede0b2c 100644
--- a/arch/sparc/include/asm/smp_64.h
+++ b/arch/sparc/include/asm/smp_64.h
@@ -47,7 +47,6 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask);
 int hard_smp_processor_id(void);
 #define raw_smp_processor_id() (current_thread_info()->cpu)
 
-void smp_fill_in_cpu_possible_map(void);
 void smp_fill_in_sib_core_maps(void);
 void __noreturn cpu_play_dead(void);
 
@@ -77,7 +76,6 @@ void __cpu_die(unsigned int cpu);
 #define smp_fill_in_sib_core_maps() do { } while (0)
 #define smp_fetch_global_regs() do { } while (0)
 #define smp_fetch_global_pmu() do { } while (0)
-#define smp_fill_in_cpu_possible_map() do { } while (0)
 #define smp_init_cpu_poke() do { } while (0)
 #define scheduler_poke() do { } while (0)
 
diff --git a/arch/sparc/kernel/prom_64.c b/arch/sparc/kernel/prom_64.c
index 998aa693d491..ba82884cb92a 100644
--- a/arch/sparc/kernel/prom_64.c
+++ b/arch/sparc/kernel/prom_64.c
@@ -483,7 +483,9 @@ static void *record_one_cpu(struct device_node *dp, int cpuid, int arg)
 	ncpus_probed++;
 #ifdef CONFIG_SMP
 	set_cpu_present(cpuid, true);
-	set_cpu_possible(cpuid, true);
+
+	if (num_possible_cpus() < nr_cpu_ids)
+		set_cpu_possible(cpuid, true);
 #endif
 	return NULL;
 }
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 6a4797dec34b..6bbe8e394ad3 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -671,7 +671,6 @@ void __init setup_arch(char **cmdline_p)
 
 	paging_init();
 	init_sparc64_elf_hwcap();
-	smp_fill_in_cpu_possible_map();
 	/*
 	 * Once the OF device tree and MDESC have been setup and nr_cpus has
 	 * been parsed, we know the list of possible cpus.  Therefore we can
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index f3969a3600db..e50c38eba2b8 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1220,20 +1220,6 @@ void __init smp_setup_processor_id(void)
 		xcall_deliver_impl = hypervisor_xcall_deliver;
 }
 
-void __init smp_fill_in_cpu_possible_map(void)
-{
-	int possible_cpus = num_possible_cpus();
-	int i;
-
-	if (possible_cpus > nr_cpu_ids)
-		possible_cpus = nr_cpu_ids;
-
-	for (i = 0; i < possible_cpus; i++)
-		set_cpu_possible(i, true);
-	for (; i < NR_CPUS; i++)
-		set_cpu_possible(i, false);
-}
-
 void smp_fill_in_sib_core_maps(void)
 {
 	unsigned int i;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2024-03-29  9:44             ` Sam Ravnborg
@ 2024-03-29 20:11               ` Nick Bowler
  2024-03-30  9:16                 ` Sam Ravnborg
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Bowler @ 2024-03-29 20:11 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Linus Torvalds, linux-kernel, David S. Miller, sparclinux,
	Andreas Larsson, Linux regressions mailing list

Hi Sam,

On 2024-03-29 05:44, Sam Ravnborg wrote:
> I took a look at this and may have a fix. Could you try the following
> patch. It builds - but I have not tested it.

With this patch applied on top of 6.9-rc1, both CPUs appear to come up:

  % cat /proc/cpuinfo 
  [...]
  ncpus probed	: 2
  ncpus active	: 2
  [...]
  State:
  CPU0:		online
  CPU2:		online

Thanks,
  Nick

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2024-03-29 20:11               ` Nick Bowler
@ 2024-03-30  9:16                 ` Sam Ravnborg
  0 siblings, 0 replies; 12+ messages in thread
From: Sam Ravnborg @ 2024-03-30  9:16 UTC (permalink / raw)
  To: Nick Bowler
  Cc: Linus Torvalds, linux-kernel, David S. Miller, sparclinux,
	Andreas Larsson, Linux regressions mailing list

On Fri, Mar 29, 2024 at 04:11:06PM -0400, Nick Bowler wrote:
> Hi Sam,
> 
> On 2024-03-29 05:44, Sam Ravnborg wrote:
> > I took a look at this and may have a fix. Could you try the following
> > patch. It builds - but I have not tested it.
> 
> With this patch applied on top of 6.9-rc1, both CPUs appear to come up:
> 
>   % cat /proc/cpuinfo 
>   [...]
>   ncpus probed	: 2
>   ncpus active	: 2
>   [...]
>   State:
>   CPU0:		online
>   CPU2:		online

Thanks, I will add a Tested-by: Nick Bowler <nbowler@draconx.ca>
and submit the patch properly along with a few other sparc64 related
fixes.

	Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)
  2024-03-28 20:09         ` Linus Torvalds
  2024-03-28 21:08           ` Nick Bowler
@ 2024-04-05 15:05           ` Andreas Larsson
  1 sibling, 0 replies; 12+ messages in thread
From: Andreas Larsson @ 2024-04-05 15:05 UTC (permalink / raw)
  To: Linus Torvalds, Linux regressions mailing list
  Cc: Nick Bowler, linux-kernel, David S. Miller, sparclinux, Sam Ravnborg



On 2024-03-28 21:09, Linus Torvalds wrote:
> On Thu, 28 Mar 2024 at 12:36, Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> [CCing Linus, in case I say something to his disliking]
>>
>> On 22.03.24 05:57, Nick Bowler wrote:
>>>
>>> Just a friendly reminder that this issue still happens on Linux 6.8 and
>>> reverting commit 9b2f753ec237 as indicated below is still sufficient to
>>> resolve the problem.
>>
>> FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
>> nr_cpus is set") is from v4.8. Reverting it after all that time might
>> easily lead to even bigger trouble.
> 
> I'm definitely not reverting a patch from almost a decade ago as a regression.
> 
> If it took that long to find, it can't be that critical of a regression.
> 
> So yes, let's treat it as a regular bug. And let's bring in Andreas to
> the discussion too (although presumably he has seen it on the
> sparclinux mailing list).
Yes, I am aware and I agree we should treat it as a regular bug.

Reverting it as a regression fix would lead to followup issues like
canceling the effect of commit ebb99a4c12e4 ("sparc64: Fix irq stack
bootmem allocation.") but with misleading comments left in place. 

Sam's fix looks like a good solution for me to pick up to my
for-next branch.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-05 15:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12  5:38 PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression) Nick Bowler
2023-01-20  3:15 ` Nick Bowler
2023-01-21 13:31   ` Linux kernel regression tracking (Thorsten Leemhuis)
2024-03-22  4:57     ` Nick Bowler
2024-03-28 19:36       ` Linux regression tracking (Thorsten Leemhuis)
2024-03-28 20:09         ` Linus Torvalds
2024-03-28 21:08           ` Nick Bowler
2024-03-29  9:44             ` Sam Ravnborg
2024-03-29 20:11               ` Nick Bowler
2024-03-30  9:16                 ` Sam Ravnborg
2024-04-05 15:05           ` Andreas Larsson
2023-01-21 13:24 ` Linux kernel regression tracking (#adding)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.