All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Latest PA8800/PA8900 cache flush patch
       [not found]           ` <0449745b-f3fd-eb86-31f9-1e26cc8bc0fd@gmx.de>
@ 2022-05-06 21:30             ` John David Anglin
  2022-05-06 22:34               ` John David Anglin
  0 siblings, 1 reply; 6+ messages in thread
From: John David Anglin @ 2022-05-06 21:30 UTC (permalink / raw)
  To: Helge Deller; +Cc: Deller, linux-parisc

On 2022-04-27 5:21 p.m., Helge Deller wrote:
> On 4/27/22 23:08, Helge Deller wrote:
>> On 4/27/22 23:04, Helge Deller wrote:
>>> On 4/27/22 22:50, John David Anglin wrote:
>>>> On 2022-04-27 4:44 p.m., John David Anglin wrote:
>>>>> On 2022-04-26 4:43 p.m., Helge Deller wrote:
>>>>>>> I have removed the flush_cache_dup_mm code as it improve perforance
>>>>>>> and hopefully it will fix issues on B160L.
>>>>>> I applied your patch on top of for-next tree.
>>>>>> I still see the same issue on the C3700 (PA8700 (PCX-W2)) with 32bit kernel...
>>>>>> Maybe it's not related to your cache flush patches but to mine?
>>>>>>
>>>>> Boot fails with your 32-bit config on c3750 with latest cache patch.  So, problem appears
>>>> Bah, I meant "without".
>>>>> to have been introduced earlier.
>>> Yes, happens before for-next tree and your patches.
>>>
>>>> I still have install problem with 5.17.0-1-parisc64.
>>> So, you run 5.17 (debian) and it is unstable? I'll try, but currently again my time is limited.
>> Debian 5.17 is based on stable-5.17-3.
>> This version includes this patch:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.17.y&id=e115f5a44360c4a2f158074ecb3feea88c45fdc0
>> Greg pulled it down from 5.18-rc...
>> Maybe that's the issue?
> I've built 5.17.5 (32bit). Boots ok on c3000. No segfaults.
> But I do see the stalls as well:
> ...
> Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon.
> Starting periodic command scheduler: cron.
> [   31.472708] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [   31.543577]  (detected by 0, t=2102 jiffies, g=7361, q=10)
> [   31.609191] rcu: All QSes seen, last rcu_sched kthread activity 2102 (-22271--24373), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [   31.747614] rcu: rcu_sched kthread starved for 2102 jiffies! g7361 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [   31.867313] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
> [   31.974535] rcu: RCU grace-period kthread stack dump:
> [   32.034962] task:rcu_sched       state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000000
> [   32.153733] Backtrace:
> [   32.181916]  [<1094c21c>] __schedule+0x2dc/0x964
> [   32.237240]  [<1094c90c>] schedule+0x68/0x138
> [   32.289340]  [<10953068>] schedule_timeout+0x84/0x178
> [   32.349762]  [<102472b4>] rcu_gp_fqs_loop+0x32c/0x428
> [   32.410186]  [<10249660>] rcu_gp_kthread+0x10c/0x1e8
> [   32.469569]  [<101ebc98>] kthread+0x100/0x108
> [   32.521674]  [<1019b01c>] ret_from_kernel_thread+0x1c/0x24
>
> ARGH!!!
This was introduced by the following commit:

commit d97180ad68bdb7ee10f327205a649bc2f558741d
Author: Helge Deller <deller@gmx.de>
Date:   Wed Sep 8 23:27:00 2021 +0200

     parisc: Mark sched_clock unstable only if clocks are not syncronized

     We check at runtime if the cr16 clocks are stable across CPUs. Only mark
     the sched_clock unstable by calling clear_sched_clock_stable() if we
     know that we run on a system which isn't syncronized across CPUs.

     Signed-off-by: Helge Deller <deller@gmx.de>

In searching for the cause, I also noticed this commit:

commit e4f2006f1287e7ea17660490569cff323772dac4
Author: Helge Deller <deller@gmx.de>
Date:   Tue Sep 7 05:03:29 2021 +0200

     parisc: Reduce sigreturn trampoline to 3 instructions

     We can move the INSN_LDI_R20 instruction into the branch delay slot.

     Signed-off-by: Helge Deller <deller@gmx.de>

Changing the sigreturn trampoline breaks gdb's detection of signal frames.
I suspect the INSN_LDI_R20 instruction was intentionally put before the
branch to make the sequence more unique.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest PA8800/PA8900 cache flush patch
  2022-05-06 21:30             ` Latest PA8800/PA8900 cache flush patch John David Anglin
@ 2022-05-06 22:34               ` John David Anglin
  2022-05-07  1:55                 ` John David Anglin
  0 siblings, 1 reply; 6+ messages in thread
From: John David Anglin @ 2022-05-06 22:34 UTC (permalink / raw)
  To: Helge Deller; +Cc: Deller, linux-parisc

On 2022-05-06 5:30 p.m., John David Anglin wrote:
>> I've built 5.17.5 (32bit). Boots ok on c3000. No segfaults.
>> But I do see the stalls as well:
>> ...
>> Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon.
>> Starting periodic command scheduler: cron.
>> [   31.472708] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>> [   31.543577]  (detected by 0, t=2102 jiffies, g=7361, q=10)
>> [   31.609191] rcu: All QSes seen, last rcu_sched kthread activity 2102 (-22271--24373), jiffies_till_next_fqs=1, root ->qsmask 0x0
>> [   31.747614] rcu: rcu_sched kthread starved for 2102 jiffies! g7361 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
>> [   31.867313] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
>> [   31.974535] rcu: RCU grace-period kthread stack dump:
>> [   32.034962] task:rcu_sched       state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000000
>> [   32.153733] Backtrace:
>> [   32.181916]  [<1094c21c>] __schedule+0x2dc/0x964
>> [   32.237240]  [<1094c90c>] schedule+0x68/0x138
>> [   32.289340]  [<10953068>] schedule_timeout+0x84/0x178
>> [   32.349762]  [<102472b4>] rcu_gp_fqs_loop+0x32c/0x428
>> [   32.410186]  [<10249660>] rcu_gp_kthread+0x10c/0x1e8
>> [   32.469569]  [<101ebc98>] kthread+0x100/0x108
>> [   32.521674]  [<1019b01c>] ret_from_kernel_thread+0x1c/0x24
>>
>> ARGH!!!
> This was introduced by the following commit:
>
> commit d97180ad68bdb7ee10f327205a649bc2f558741d
> Author: Helge Deller <deller@gmx.de>
> Date:   Wed Sep 8 23:27:00 2021 +0200
>
>     parisc: Mark sched_clock unstable only if clocks are not syncronized
>
>     We check at runtime if the cr16 clocks are stable across CPUs. Only mark
>     the sched_clock unstable by calling clear_sched_clock_stable() if we
>     know that we run on a system which isn't syncronized across CPUs.
>
>     Signed-off-by: Helge Deller <deller@gmx.de>
>
> In searching for the cause, I also noticed this commit:
>
> commit e4f2006f1287e7ea17660490569cff323772dac4
> Author: Helge Deller <deller@gmx.de>
> Date:   Tue Sep 7 05:03:29 2021 +0200
>
>     parisc: Reduce sigreturn trampoline to 3 instructions
>
>     We can move the INSN_LDI_R20 instruction into the branch delay slot.
>
>     Signed-off-by: Helge Deller <deller@gmx.de>
>
> Changing the sigreturn trampoline breaks gdb's detection of signal frames.
> I suspect the INSN_LDI_R20 instruction was intentionally put before the
> branch to make the sequence more unique.

It appears the latter commit has been reverted.  The former commit has been modified.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest PA8800/PA8900 cache flush patch
  2022-05-06 22:34               ` John David Anglin
@ 2022-05-07  1:55                 ` John David Anglin
  2022-05-07 18:49                   ` Helge Deller
  0 siblings, 1 reply; 6+ messages in thread
From: John David Anglin @ 2022-05-07  1:55 UTC (permalink / raw)
  To: Helge Deller; +Cc: Deller, linux-parisc

On 2022-05-06 6:34 p.m., John David Anglin wrote:
> On 2022-05-06 5:30 p.m., John David Anglin wrote:
>>> I've built 5.17.5 (32bit). Boots ok on c3000. No segfaults.
>>> But I do see the stalls as well:
>>> ...
>>> Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon.
>>> Starting periodic command scheduler: cron.
>>> [   31.472708] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>>> [   31.543577]  (detected by 0, t=2102 jiffies, g=7361, q=10)
>>> [   31.609191] rcu: All QSes seen, last rcu_sched kthread activity 2102 (-22271--24373), jiffies_till_next_fqs=1, root ->qsmask 0x0
>>> [   31.747614] rcu: rcu_sched kthread starved for 2102 jiffies! g7361 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
>>> [   31.867313] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
>>> [   31.974535] rcu: RCU grace-period kthread stack dump:
>>> [   32.034962] task:rcu_sched       state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000000
>>> [   32.153733] Backtrace:
>>> [   32.181916]  [<1094c21c>] __schedule+0x2dc/0x964
>>> [   32.237240]  [<1094c90c>] schedule+0x68/0x138
>>> [   32.289340]  [<10953068>] schedule_timeout+0x84/0x178
>>> [   32.349762]  [<102472b4>] rcu_gp_fqs_loop+0x32c/0x428
>>> [   32.410186]  [<10249660>] rcu_gp_kthread+0x10c/0x1e8
>>> [   32.469569]  [<101ebc98>] kthread+0x100/0x108
>>> [   32.521674]  [<1019b01c>] ret_from_kernel_thread+0x1c/0x24
>>>
>>> ARGH!!!
>> This was introduced by the following commit:
>>
>> commit d97180ad68bdb7ee10f327205a649bc2f558741d
>> Author: Helge Deller <deller@gmx.de>
>> Date:   Wed Sep 8 23:27:00 2021 +0200
>>
>>     parisc: Mark sched_clock unstable only if clocks are not syncronized
>>
>>     We check at runtime if the cr16 clocks are stable across CPUs. Only mark
>>     the sched_clock unstable by calling clear_sched_clock_stable() if we
>>     know that we run on a system which isn't syncronized across CPUs.
>>
>>     Signed-off-by: Helge Deller <deller@gmx.de>
>>
>> In searching for the cause, I also noticed this commit:
>>
>> commit e4f2006f1287e7ea17660490569cff323772dac4
>> Author: Helge Deller <deller@gmx.de>
>> Date:   Tue Sep 7 05:03:29 2021 +0200
>>
>>     parisc: Reduce sigreturn trampoline to 3 instructions
>>
>>     We can move the INSN_LDI_R20 instruction into the branch delay slot.
>>
>>     Signed-off-by: Helge Deller <deller@gmx.de>
>>
>> Changing the sigreturn trampoline breaks gdb's detection of signal frames.
>> I suspect the INSN_LDI_R20 instruction was intentionally put before the
>> branch to make the sequence more unique.
>
> It appears the latter commit has been reverted.  The former commit has been modified.
32bit v5.15.37 boots successfully if setup.c and time.c are reverted to v5.14.  Otherwise,
boot stalls as above.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest PA8800/PA8900 cache flush patch
  2022-05-07  1:55                 ` John David Anglin
@ 2022-05-07 18:49                   ` Helge Deller
  2022-05-07 19:18                     ` John David Anglin
  0 siblings, 1 reply; 6+ messages in thread
From: Helge Deller @ 2022-05-07 18:49 UTC (permalink / raw)
  To: John David Anglin; +Cc: Deller, linux-parisc

Hi Dave,

On 5/7/22 03:55, John David Anglin wrote:
> On 2022-05-06 6:34 p.m., John David Anglin wrote:
>> On 2022-05-06 5:30 p.m., John David Anglin wrote:
>>>> I've built 5.17.5 (32bit). Boots ok on c3000. No segfaults.
>>>> But I do see the stalls as well:
>>>> ...
>>>> Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon.
>>>> Starting periodic command scheduler: cron.
>>>> [   31.472708] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>>>> [   31.543577]  (detected by 0, t=2102 jiffies, g=7361, q=10)
>>>> [   31.609191] rcu: All QSes seen, last rcu_sched kthread activity 2102 (-22271--24373), jiffies_till_next_fqs=1, root ->qsmask 0x0
>>>> [   31.747614] rcu: rcu_sched kthread starved for 2102 jiffies! g7361 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
>>>> [   31.867313] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
>>>> [   31.974535] rcu: RCU grace-period kthread stack dump:
>>>> [   32.034962] task:rcu_sched       state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000000
>>>> [   32.153733] Backtrace:
>>>> [   32.181916]  [<1094c21c>] __schedule+0x2dc/0x964
>>>> [   32.237240]  [<1094c90c>] schedule+0x68/0x138
>>>> [   32.289340]  [<10953068>] schedule_timeout+0x84/0x178
>>>> [   32.349762]  [<102472b4>] rcu_gp_fqs_loop+0x32c/0x428
>>>> [   32.410186]  [<10249660>] rcu_gp_kthread+0x10c/0x1e8
>>>> [   32.469569]  [<101ebc98>] kthread+0x100/0x108
>>>> [   32.521674]  [<1019b01c>] ret_from_kernel_thread+0x1c/0x24
>>>>
>>>> ARGH!!!
>>> This was introduced by the following commit:
>>>
>>> commit d97180ad68bdb7ee10f327205a649bc2f558741d
>>> Author: Helge Deller <deller@gmx.de>
>>> Date:   Wed Sep 8 23:27:00 2021 +0200
>>>
>>>     parisc: Mark sched_clock unstable only if clocks are not syncronized
>>>
>>>     We check at runtime if the cr16 clocks are stable across CPUs. Only mark
>>>     the sched_clock unstable by calling clear_sched_clock_stable() if we
>>>     know that we run on a system which isn't syncronized across CPUs.
>>>
>>>     Signed-off-by: Helge Deller <deller@gmx.de>
>>>
>>> In searching for the cause, I also noticed this commit:
>>>
>>> commit e4f2006f1287e7ea17660490569cff323772dac4
>>> Author: Helge Deller <deller@gmx.de>
>>> Date:   Tue Sep 7 05:03:29 2021 +0200
>>>
>>>     parisc: Reduce sigreturn trampoline to 3 instructions
>>>
>>>     We can move the INSN_LDI_R20 instruction into the branch delay slot.
>>>
>>>     Signed-off-by: Helge Deller <deller@gmx.de>
>>>
>>> Changing the sigreturn trampoline breaks gdb's detection of signal frames.
>>> I suspect the INSN_LDI_R20 instruction was intentionally put before the
>>> branch to make the sequence more unique.
>>
>> It appears the latter commit has been reverted.  The former commit has been modified.
> 32bit v5.15.37 boots successfully if setup.c and time.c are reverted to v5.14.  Otherwise,
> boot stalls as above.

Thank you for investing your time to find the problem!
The mentioned patches can easily be reverted - I have queued up the revert-patches now.
It seems commit d97180ad68bdb7ee10f327205a649bc2f558741d was wrong, and the follow-up patch
made it even worse.

Ok, so we now need to find the cause why v5.18-rc crashes... :-(

Helge

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest PA8800/PA8900 cache flush patch
  2022-05-07 18:49                   ` Helge Deller
@ 2022-05-07 19:18                     ` John David Anglin
  2022-05-08  1:02                       ` John David Anglin
  0 siblings, 1 reply; 6+ messages in thread
From: John David Anglin @ 2022-05-07 19:18 UTC (permalink / raw)
  To: Helge Deller; +Cc: Deller, linux-parisc

On 2022-05-07 2:49 p.m., Helge Deller wrote:
> Ok, so we now need to find the cause why v5.18-rc crashes...:-(
Working on it.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Latest PA8800/PA8900 cache flush patch
  2022-05-07 19:18                     ` John David Anglin
@ 2022-05-08  1:02                       ` John David Anglin
  0 siblings, 0 replies; 6+ messages in thread
From: John David Anglin @ 2022-05-08  1:02 UTC (permalink / raw)
  To: Helge Deller; +Cc: Deller, linux-parisc

On 2022-05-07 3:18 p.m., John David Anglin wrote:
> On 2022-05-07 2:49 p.m., Helge Deller wrote:
>> Ok, so we now need to find the cause why v5.18-rc crashes...:-(
> Working on it.
Mainline 32bit v5.17 boots okay.  v5.18-rc1 fails.  Looks like another extended regression hunt.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-08  1:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <YmhSJPy1MVBYWSrB@mx3210.localdomain>
     [not found] ` <791a2aee-1a9a-6680-c9d4-438c6b5519a6@gmx.de>
     [not found]   ` <2ee9360d-7ddc-8cf1-46c5-b29663e10193@bell.net>
     [not found]     ` <0cafa13b-336c-a425-7284-162467349bc9@bell.net>
     [not found]       ` <2b28f20d-0183-96a7-463f-492cc972b3fa@gmx.de>
     [not found]         ` <418213d7-a306-d3cd-2b78-50f0c96b1b8d@gmx.de>
     [not found]           ` <0449745b-f3fd-eb86-31f9-1e26cc8bc0fd@gmx.de>
2022-05-06 21:30             ` Latest PA8800/PA8900 cache flush patch John David Anglin
2022-05-06 22:34               ` John David Anglin
2022-05-07  1:55                 ` John David Anglin
2022-05-07 18:49                   ` Helge Deller
2022-05-07 19:18                     ` John David Anglin
2022-05-08  1:02                       ` John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.