All of lore.kernel.org
 help / color / mirror / Atom feed
* pagefaults and hang with 5.15.11
@ 2021-12-26 16:21 Rolf Eike Beer
  2021-12-26 17:22 ` John David Anglin
  0 siblings, 1 reply; 21+ messages in thread
From: Rolf Eike Beer @ 2021-12-26 16:21 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 13370 bytes --]

[127262.486834] do_page_fault() command='ld' type=15 address=0x00000000 in libbfd-2.37.gentoo-sys-devel-binutils-st.so[f77a6000+d7000]
127262.486834] trap #15: Data TLB miss fault
[127262.734918] CPU: 0 PID: 4840 Comm: ld Not tainted 5.15.11-gentoo-parisc64 #1
[127262.734948] Hardware name: 9000/785/C8000
[127262.734954]
[127262.734958]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[127262.734962] PSW: 00000000000001101111110000001111 Not tainted
[127262.734967] r00-03  000000ff0006fc0f 00000000f7b358a0 00000000f7814f87 00000000f7b35688
[127262.734978] r04-07  00000000f78872ac 0000000000000000 0000000042b7f5f8 0000000000000001
[127262.734988] r08-11  0000000000000000 00000000000042b2 00000000f7882ad8 00000000f7882938
[127262.734998] r12-15  0000000042887328 00000000f7b35698 0000000000000000 0000000000208000
[127262.735008] r16-19  00000000f7b356b4 000000004221c5c4 0000000042cd2b18 00000000f78872ac
[127262.735018] r20-23  0000000000000000 00000000f77fab10 00000000f77fab10 00000000f7b356b8
[127262.735028] r24-27  00000000f7b35688 00000000f7b35688 0000000000000000 000000004221a000
[127262.735038] r28-31  0000000000000000 000000000004288c 00000000f7b35880 00000000f77a711c
[127262.735048] sr00-03  0000000005156c00 0000000000000000 0000000000000000 0000000005156c00
[127262.735057] sr04-07  0000000005156c00 0000000005156c00 0000000005156c00 0000000005156c00
[127262.735067]
[127262.735071]       VZOUICununcqcqcqcqcqcrmunTDVZOUI
[127262.735076] FPSR: 00000000000000000000000000000000
[127262.735080] FPER1: 00000000
[127262.735085] fr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[127262.735095] fr04-07  0000000700000000 401c000000000000 bf800000f8300000 0000000000000000
[127262.735105] fr08-11  000000063ffffff6 0000000000000000 0000000000000000 0000000000000000
[127262.735115] fr12-15  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[127262.735125] fr16-19  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[127262.735135] fr20-23  0000000000000000 0000000000000000 f77fab10f78872ac 0000000000000004
[127262.735146] fr24-27  0000000000000000 0000000000000000 0000000000000010 0000000000000000
[127262.735156] fr28-31  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[127262.735165]
[127262.735169] IASQ: 0000000005156c00 0000000005156c00 IAOQ: 00000000f77fab13 00000000f77fab17
[127262.735177]  IIR: 0f401015    ISR: 0000000005156c00  IOR: 0000000000000000
[127262.735185]  CPU:        0   CR30: 00000040b544c000 CR31: fffffffffffdffff
[127262.735192]  ORIG_R28: 0000000000000000
[127262.735197]  IAOQ[0]: 00000000f77fab13
[127262.735203]  IAOQ[1]: 00000000f77fab17
[127262.735208]  RP(r2): 00000000f7814f87
[128494.230858] do_page_fault() command='cmake' type=15 address=0x00000000 in cmake[41026000+1f3d000]
128494.230858] trap #15: Data TLB miss fault
[128494.418911] CPU: 0 PID: 2700 Comm: cmake Not tainted 5.15.11-gentoo-parisc64 #1
[128494.534852] Hardware name: 9000/785/C8000
[128494.626854]
[128494.646859]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[128494.646870] PSW: 00000000000001000000000000001111 Not tainted
[128494.646876] r00-03  000000000004000f 0000000042fd0000 000000004167e16f 00000000f8a17700
[128494.646888] r04-07  0000000042fb2000 0000000042fb2000 00000000f8a163b0 00000000f99265f0
[128494.646898] r08-11  0000000000000000 0000000000000004 00000000f7af10d0 0000000000000001
[128494.646908] r12-15  00000000f9926218 00000000f9926558 0000000000000000 0000000000000000
[128494.646918] r16-19  000000004268dea1 0000000000000000 000000004268deaf 0000000042fb2000
[128494.646928] r20-23  00000000f8a16ac8 000000004167e10c 0000000042fc1642 00000000f8a17a8c
[128494.646939] r24-27  0000000000000000 00000000f8a16ac8 00000000f8a18fb8 0000000042fb2000
[128494.646949] r28-31  0000000000000000 000000004337600a 00000000f8a191c0 000000004167e16f
[128494.646961] sr00-03  0000000000399000 0000000000000000 0000000000000000 0000000000399000
[128494.646971] sr04-07  0000000000399000 0000000000399000 0000000000399000 0000000000399000
[128494.646981]
[128494.646985]       VZOUICununcqcqcqcqcqcrmunTDVZOUI
[128494.646991] FPSR: 00000000000000000000000000000000
[128494.646997] FPER1: 00000000
[128496.194873] fr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[128496.194894] fr04-07  0000000d00000000 402a000000000000 bf80000014b40000 0000000000000000
[128496.194904] fr08-11  0000000140000000 0000000000000000 0000000000000000 0000000000000000
[128496.194914] fr12-15  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[128496.194923] fr16-19  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[128496.194933] fr20-23  0000000000000000 0000000000000000 33818d188f2104d3 8f9a2ac700000000
[128496.194944] fr24-27  0000000500000001 0000000000000000 0000000000000026 0000000000000000
[128496.194955] fr28-31  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[128496.194965]
[128496.194970] IASQ: 0000000000399000 0000000000399000 IAOQ: 0000000041660677 000000004166067b
[128496.194979]  IIR: 0f80109c    ISR: 0000000000399000  IOR: 0000000000000000
[128496.194987]  CPU:        0   CR30: 00000040a56a8000 CR31: fffffffffffdffff
[128496.194996]  ORIG_R28: 0000000000000000
[128496.195001]  IAOQ[0]: 0000000041660677
[128496.195008]  IAOQ[1]: 000000004166067b
[128496.195014]  RP(r2): 000000004167e16f
[139181.966856] do_page_fault() command='make' type=6 address=0x00000003 in gmake[42454000+3d000]
139181.966856] trap #6: Instruction TLB miss fault
[139181.970814] CPU: 2 PID: 3108 Comm: make Not tainted 5.15.11-gentoo-parisc64 #1
[139181.970814] Hardware name: 9000/785/C8000
[139181.970814]
[139181.970814]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[139181.970814] PSW: 00000000000001101111111100001111 Not tainted
[139181.970814] r00-03  000000ff0006ff0f 00000000f72386d3 00000000f71eeb3f 0000000000000004
[139181.970814] r04-07  00000000f7273c00 0000000000009000 00000000f7a7d000 00000000f988db88
[139181.970814] r08-11  00000000f988dcd8 00000000f988e098 0000000042c273c8 0000000042c5c658
[139181.970814] r12-15  00000000f7275cd4 00000000f988e138 00000000f7271bf4 0000000042493000
[139181.970814] r16-19  0000000042c081f8 0000000000000000 0000000042c08206 00000000f7273c00
[139181.970814] r20-23  0000000000000078 0000000000000000 0000000000000000 0000000000000000
[139181.970814] r24-27  00000000ffffffff 00000000f7a7d040 0000000000000000 0000000042492000
[139181.970814] r28-31  0000000000000000 0000000042bc2008 00000000f7a7d040 00000000f71eeb3f
[139181.970814] sr00-03  00000000021a6c00 0000000000000000 0000000000000000 0000000007743400
[139181.970814] sr04-07  0000000007743400 0000000007743400 0000000007743400 0000000007743400
[139181.970814]
[139181.970814]       VZOUICununcqcqcqcqcqcrmunTDVZOUI
[139181.970814] FPSR: 00001100001010101011100000000000
[139181.970814] FPER1: 00000000
[139181.970814] fr00-03  0c2ab80000000000 0000000000000000 0000000000000000 0000000000000000
[139181.970814] fr04-07  400f99999999999a 000000009e0a33ce 0000000000000000 0000000000000000
[139181.970814] fr08-11  0000000000000000 00217fbcc7b4099a 000b7b3fb542a59c 000f9ab8407f439a
[139181.970814] fr12-15  400f99999999999a 0000000000000000 0000000000000000 0000000000000000
[139181.970814] fr16-19  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[139181.970814] fr20-23  0000000000000000 0000000000000000 0000000266666667 000000039999999a
[139181.970814] fr24-27  0000000000000000 3fc999b324f10111 0000000000000000 40026bb4485fa06f
[139181.970814] fr28-31  8000000000000000 3fe0000000000006 3d2ef35793c76730 000080620e036646
[139181.970814]
[139181.970814] IASQ: 0000000007743400 0000000007743400 IAOQ: 0000000000000003 0000000000000007
[139181.970814]  IIR: 43ffff80    ISR: 0000000007743400  IOR: 00000000f7a7d000
[139181.970814]  CPU:        2   CR30: 00000040d95f8000 CR31: ffffffffffffffff
[139181.970814]  ORIG_R28: 0000000000000000
[139181.970814]  IAOQ[0]: 0000000000000003
[139181.970814]  IAOQ[1]: 0000000000000007
[139181.970814]  RP(r2): 00000000f71eeb3f
[139181.966855]
[139181.966855] do_page_fault() command='git' type=6 address=0xffffffffffffffff
139181.966855] trap #6: Instruction TLB miss fault, vm_start = 0xf90f2000, vm_end = 0xf911e000
[139181.966855] CPU: 1 PID: 3101 Comm: git Not tainted 5.15.11-gentoo-parisc64 #1
[139181.966855] Hardware name: 9000/785/C8000
[139181.966855]
[139181.966855]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[139181.966855] PSW: 00001111110001111111111111111111 Not tainted
[139181.966855] r00-03  000000ff0fc7ffff ffffffffffffffff 0000000040199080 0000000004085268
[139181.966855] r04-07  ffffffffffffffff 000000000000000f fffffff0f0d40c38 0000000000000000
[139181.966855] r08-11  0000000000000001 000000000407efe8 fffffff0f040be20 c480001000800000
[139181.966855] r12-15  0000000000000000 000000004da42468 00000000434c98a0 0000000000400000
[139181.966855] r16-19  0000000004085268 000000005b3c3ee0 0000000040c0dd00 0000000000000000
[139181.966855] r20-23  0000000000000001 ff00000000000000 0000000000faa718 00007ea9a5e66854
[139181.966855] r24-27  000000000407fad0 0000000004085268 3030303030203030 0000000040b6a500
[139181.966855] r28-31  0000000002f4c997 ffffffffffffffff ffffffffffffffff 400000000000aedc
[139181.966855] sr00-03  00000000ffffffff 00000000ffffffff 00000000ffffffff 00000000ffffffff
[139181.966855] sr04-07  00000000ffffffff 00000000ffffffff 00000000ffffffff 00000000ffffffff
[139181.966855]
[139181.966855]       VZOUICununcqcqcqcqcqcrmunTDVZOUI
[139181.966855] FPSR: 00000000000000000000000000000000
[139181.966855] FPER1: 00000000
[139181.966855] fr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[139181.966855] fr04-07  400feb851eb851ec 000000009e0a33ce 0000000000000000 0000000000000000
[139181.966855] fr08-11  0000000000000000 00217fbcc7b4099a 000b7b3fb542a59c 000f9ab8407f439a
[139181.966855] fr12-15  400feb851eb851ec 0000000000000000 0000000000000000 0000000000000000
[139181.966855] fr16-19  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[139181.966855] fr20-23  0000000000000000 0000000000000000 f698ce70f6a73c00 32204a6800000000
[139181.966855] fr24-27  0000000000000000 0000000000000000 0000000000000000 40026bb4485fa06f
[139181.966855] fr28-31  8000000000000000 3fe0000000000006 3d2ef35793c76730 000080620e036646
[139181.966855]
[139181.966855] IASQ: 00000000ffffffff 00000000c0000000 IAOQ: ffffffffffffffff 0000000000000003
[139181.966855]  IIR: 43ffff80    ISR: 00000000021a6800  IOR: 00000000f6a592a8
[139181.966855]  CPU:        1   CR30: 00000040ae8b8000 CR31: fffa7fffff7f9fd7
[139181.966855]  ORIG_R28: 0000000000000000
[139181.966855]  IAOQ[0]: ffffffffffffffff
[139181.966855]  IAOQ[1]: 0000000000000003
[139181.966855]  RP(r2): 0000000040199080
[139181.966881] ------------[ cut here ]------------
[139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0x8c/0x98
[139181.966881] Modules linked in: 8021q ipmi_poweroff ipmi_si ipmi_devintf sata_via ipmi_msghandler cbc dm_zero dm_snapshot dm_mirror dm_region_hash dm_log dm_crypt dm_bufio pata_sil680 libata
[139181.966927] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.11-gentoo-parisc64 #1
[139181.966927] Hardware name: 9000/785/C8000
[139181.966927]
[139181.966927]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[139181.966927] PSW: 00001000000001000000000100001110 Not tainted
[139181.966927] r00-03  000000000804010e 0000000040c0d500 0000000040269a68 0000000040f846d0
[139181.966927] r04-07  0000000040b6a500 000000004a148260 0000000000000004 0000000000000001
[139181.966927] r08-11  0000000040daf5a8 0000000040daf620 0000000000000001 0000000040b8bd00
[139181.966927] r12-15  0000000000000000 0000000040c0a2a0 00000000495caed8 000000004d4805f0
[139181.966927] r16-19  fffffff0f0d00ae0 000000004d4803b8 00000000495caed8 400000000000aedc
[139181.966927] r20-23  0000000000000001 000000000800000e 00000000000002b5 0000000000200000
[139181.966927] r24-27  00000040ae8b8848 000000004a148000 000000014d283c00 0000000040b6a500
[139181.966927] r28-31  0000000041f6e090 000000004a148360 000000004a148410 4000000000000000
[139181.966927] sr00-03  00000000ffffffff 0000000000000000 0000000000000000 00000000ffffffff
[139181.966927] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[139181.966927]
[139181.966927] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040acf5ec 0000000040acf5f0
[139181.966927]  IIR: 03ffe01f    ISR: 0000000010340000  IOR: 0000002852148390
[139181.966927]  CPU:        1   CR30: 000000004a148000 CR31: fffa7fffff7f9fd7
[139181.966927]  ORIG_R28: 0000000040daf5a8
[139181.966927]  IAOQ[0]: rcu_eqs_enter.constprop.0+0x8c/0x98
[139181.966927]  IAOQ[1]: rcu_eqs_enter.constprop.0+0x90/0x98
[139181.966927]  RP(r2): rcu_idle_enter+0x20/0x30
[139181.966927] Backtrace:
[139181.966927]  [<0000000040269a68>] rcu_idle_enter+0x20/0x30
[139181.966927]  [<0000000040adb7f4>] cpu_idle_poll.isra.0+0x2c/0xa0
[139181.966927]  [<0000000040224274>] do_idle+0xdc/0x2f0
[139181.966927]  [<0000000040224728>] cpu_startup_entry+0x78/0x80
[139181.966927]  [<000000004010d654>] smp_callin+0x294/0x2c0
[139181.966927]
[139181.966927] ---[ end trace 6757e82f350830ac ]---

There were some intermediate intentional crashes by the CMake testsuite that 
are not shown here. At the end of this the machine was locked up.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2021-12-26 16:21 pagefaults and hang with 5.15.11 Rolf Eike Beer
@ 2021-12-26 17:22 ` John David Anglin
  2021-12-27 14:30   ` Rolf Eike Beer
  2022-01-23 11:53   ` Rolf Eike Beer
  0 siblings, 2 replies; 21+ messages in thread
From: John David Anglin @ 2021-12-26 17:22 UTC (permalink / raw)
  To: Rolf Eike Beer, linux-parisc

On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
> [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter.constprop.0+0x8c/0x98
This is probably not reproducible. You might try this change from Sven

diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index cf92ece20b75..0cd97fa004c5 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -228,11 +228,13 @@ static inline void
  send_IPI_allbutself(enum ipi_message_type op)
  {
         int i;
-
+
+       preempt_disable();
         for_each_online_cpu(i) {
                 if (i != smp_processor_id())
                         send_IPI_single(i, op);
         }
+       preempt_enable();
  }

  #ifdef CONFIG_KGDB

and my "[PATCH v3] parisc: Rewrite light-weight syscall and futex code" change. Page faults in the LWS code
can mess up scheduling.

I haven't found 5.15.11 to be stable.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2021-12-26 17:22 ` John David Anglin
@ 2021-12-27 14:30   ` Rolf Eike Beer
  2021-12-28 21:55     ` Rolf Eike Beer
  2022-01-23 11:53   ` Rolf Eike Beer
  1 sibling, 1 reply; 21+ messages in thread
From: Rolf Eike Beer @ 2021-12-27 14:30 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
> On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
> > [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
> > rcu_eqs_enter.constprop.0+0x8c/0x98
> This is probably not reproducible. You might try this change from Sven

At least this time the git testsuite has finished, but with some errors as 
usual.

> diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
> index cf92ece20b75..0cd97fa004c5 100644
> --- a/arch/parisc/kernel/smp.c
> +++ b/arch/parisc/kernel/smp.c
> @@ -228,11 +228,13 @@ static inline void
>   send_IPI_allbutself(enum ipi_message_type op)
>   {
>          int i;
> -
> +
> +       preempt_disable();
>          for_each_online_cpu(i) {
>                  if (i != smp_processor_id())
>                          send_IPI_single(i, op);
>          }
> +       preempt_enable();
>   }
> 
>   #ifdef CONFIG_KGDB

I'll add this and see what happens.

> and my "[PATCH v3] parisc: Rewrite light-weight syscall and futex code"
> change. Page faults in the LWS code can mess up scheduling.

But that would be nothing new. At least the machine has been quite stable in 
the last time.

> I haven't found 5.15.11 to be stable.

Ok, so I'll keep this in the queue for hppa and try again later.

Eike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2021-12-27 14:30   ` Rolf Eike Beer
@ 2021-12-28 21:55     ` Rolf Eike Beer
  2022-01-01 22:12       ` Sven Schnelle
  0 siblings, 1 reply; 21+ messages in thread
From: Rolf Eike Beer @ 2021-12-28 21:55 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 2514 bytes --]

Am Montag, 27. Dezember 2021, 15:30:10 CET schrieb Rolf Eike Beer:
> Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
> > On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
> > > [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
> > > rcu_eqs_enter.constprop.0+0x8c/0x98
> > 
> > This is probably not reproducible. You might try this change from Sven
> 
> At least this time the git testsuite has finished, but with some errors as
> usual.
> 
> > diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
> > index cf92ece20b75..0cd97fa004c5 100644
> > --- a/arch/parisc/kernel/smp.c
> > +++ b/arch/parisc/kernel/smp.c
> > @@ -228,11 +228,13 @@ static inline void
> > 
> >   send_IPI_allbutself(enum ipi_message_type op)
> >   {
> >   
> >          int i;
> > 
> > -
> > +
> > +       preempt_disable();
> > 
> >          for_each_online_cpu(i) {
> >          
> >                  if (i != smp_processor_id())
> >                  
> >                          send_IPI_single(i, op);
> >          
> >          }
> > 
> > +       preempt_enable();
> > 
> >   }
> >   
> >   #ifdef CONFIG_KGDB
> 
> I'll add this and see what happens.

The machine locked up again, but without many output:

[13093.642353] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
[13094.122900] INEQUIVALENT ALIASES 0x110000 and 0xf5a63000 in file find
[13260.968430] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
[16995.351108] ttyS ttyS1:[17649.655079] t[17650.739194] t[17658.174951] 
t[17659.307044] t[24039.432030] INEQUIVALENT ALIASES 0x113000 and 0xf5a66000 
in file find

And after reset it got trouble during boot:

  Configuration setting "allocation/zero_metadata" unknown.
[   76.490814] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [lvm:2612]

Luckily it worked on the next attempt.

> > and my "[PATCH v3] parisc: Rewrite light-weight syscall and futex code"
> > change. Page faults in the LWS code can mess up scheduling.
> 
> But that would be nothing new. At least the machine has been quite stable in
> the last time.
> 
> > I haven't found 5.15.11 to be stable.

Neither do I.

I assume it's some sort of backport, since 5.15.0 has been quite stable:

reboot   system boot  5.15.11-gentoo-p Sat Dec 25 00:18   still running
reboot   system boot  5.15.0-gentoo-pa Sun Dec 19 11:46 - 00:13 (5+12:27)
reboot   system boot  5.15.0-gentoo-pa Thu Nov 25 14:40 - 00:13 (29+09:33)
reboot   system boot  5.15.0-gentoo-pa Thu Nov  4 10:23 - 14:35 (21+04:11)

Eike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2021-12-28 21:55     ` Rolf Eike Beer
@ 2022-01-01 22:12       ` Sven Schnelle
  2022-01-01 22:28         ` Rolf Eike Beer
                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Sven Schnelle @ 2022-01-01 22:12 UTC (permalink / raw)
  To: Rolf Eike Beer; +Cc: linux-parisc

Hi Eike,

Rolf Eike Beer <eike-kernel@sf-tec.de> writes:

> Am Montag, 27. Dezember 2021, 15:30:10 CET schrieb Rolf Eike Beer:
>> Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
>> > On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
>> > > [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
>> > > rcu_eqs_enter.constprop.0+0x8c/0x98
>> > 
>> > This is probably not reproducible. You might try this change from Sven
>> 
>> At least this time the git testsuite has finished, but with some errors as
>> usual.
>> 
>> > diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
>> > index cf92ece20b75..0cd97fa004c5 100644
>> > --- a/arch/parisc/kernel/smp.c
>> > +++ b/arch/parisc/kernel/smp.c
>> > @@ -228,11 +228,13 @@ static inline void
>> > 
>> >   send_IPI_allbutself(enum ipi_message_type op)
>> >   {
>> >   
>> >          int i;
>> > 
>> > -
>> > +
>> > +       preempt_disable();
>> > 
>> >          for_each_online_cpu(i) {
>> >          
>> >                  if (i != smp_processor_id())
>> >                  
>> >                          send_IPI_single(i, op);
>> >          
>> >          }
>> > 
>> > +       preempt_enable();
>> > 
>> >   }
>> >   
>> >   #ifdef CONFIG_KGDB
>> 
>> I'll add this and see what happens.
>
> The machine locked up again, but without many output:
>
> [13093.642353] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
> [13094.122900] INEQUIVALENT ALIASES 0x110000 and 0xf5a63000 in file find
> [13260.968430] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
> [16995.351108] ttyS ttyS1:[17649.655079] t[17650.739194] t[17658.174951] 
> t[17659.307044] t[24039.432030] INEQUIVALENT ALIASES 0x113000 and 0xf5a66000 
> in file find

Looks like you have a serial console connected? If yes, could you trigger a 'TOC
s' from the BMC, and post the output from 'ser x 0 toc', where x is the
processer number? This could help debugging this.

Thanks
Sven

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-01 22:12       ` Sven Schnelle
@ 2022-01-01 22:28         ` Rolf Eike Beer
  2022-01-02 10:24           ` Sven Schnelle
  2022-01-05  7:42         ` Rolf Eike Beer
  2022-01-06  0:40         ` Rolf Eike Beer
  2 siblings, 1 reply; 21+ messages in thread
From: Rolf Eike Beer @ 2022-01-01 22:28 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

Am Samstag, 1. Januar 2022, 23:12:16 CET schrieb Sven Schnelle:
> Hi Eike,
> 
> Rolf Eike Beer <eike-kernel@sf-tec.de> writes:
> > Am Montag, 27. Dezember 2021, 15:30:10 CET schrieb Rolf Eike Beer:
> >> Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
> >> > On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
> >> > > [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
> >> > > rcu_eqs_enter.constprop.0+0x8c/0x98
> >> > 
> >> > This is probably not reproducible. You might try this change from Sven
> >> 
> >> At least this time the git testsuite has finished, but with some errors
> >> as
> >> usual.
> > The machine locked up again, but without many output:
> > 
> > [13093.642353] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
> > [13094.122900] INEQUIVALENT ALIASES 0x110000 and 0xf5a63000 in file find
> > [13260.968430] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
> > [16995.351108] ttyS ttyS1:[17649.655079] t[17650.739194] t[17658.174951]
> > t[17659.307044] t[24039.432030] INEQUIVALENT ALIASES 0x113000 and
> > 0xf5a66000 in file find
> 
> Looks like you have a serial console connected? If yes, could you trigger a
> 'TOC s' from the BMC, and post the output from 'ser x 0 toc', where x is
> the processer number? This could help debugging this.

Yes, this is all from serial. I guess this only works during the hang? That 
means I have to boot into the bad kernel again and wait until it breaks. 

Eike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-01 22:28         ` Rolf Eike Beer
@ 2022-01-02 10:24           ` Sven Schnelle
  2022-01-02 22:42             ` John David Anglin
  0 siblings, 1 reply; 21+ messages in thread
From: Sven Schnelle @ 2022-01-02 10:24 UTC (permalink / raw)
  To: Rolf Eike Beer; +Cc: linux-parisc

Rolf Eike Beer <eike-kernel@sf-tec.de> writes:

> Am Samstag, 1. Januar 2022, 23:12:16 CET schrieb Sven Schnelle:
>> Hi Eike,
>> 
>> Rolf Eike Beer <eike-kernel@sf-tec.de> writes:
>> > Am Montag, 27. Dezember 2021, 15:30:10 CET schrieb Rolf Eike Beer:
>> >> Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
>> >> > On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
>> >> > > [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
>> >> > > rcu_eqs_enter.constprop.0+0x8c/0x98
>> >> > 
>> >> > This is probably not reproducible. You might try this change from Sven
>> >> 
>> >> At least this time the git testsuite has finished, but with some errors
>> >> as
>> >> usual.
>> > The machine locked up again, but without many output:
>> > 
>> > [13093.642353] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
>> > [13094.122900] INEQUIVALENT ALIASES 0x110000 and 0xf5a63000 in file find
>> > [13260.968430] INEQUIVALENT ALIASES 0x96000 and 0xf5bba000 in file xargs
>> > [16995.351108] ttyS ttyS1:[17649.655079] t[17650.739194] t[17658.174951]
>> > t[17659.307044] t[24039.432030] INEQUIVALENT ALIASES 0x113000 and
>> > 0xf5a66000 in file find
>> 
>> Looks like you have a serial console connected? If yes, could you trigger a
>> 'TOC s' from the BMC, and post the output from 'ser x 0 toc', where x is
>> the processer number? This could help debugging this.
>
> Yes, this is all from serial. I guess this only works during the hang? That 
> means I have to boot into the bad kernel again and wait until it breaks. 

Yes, when it hangs, press ESC followed by '('. This should give you the
BMC prompt:

CLI>

Enter TOC s - this will take around 10s, and reboot the box. in the boot
menu, you can than take a look at the TOC data with the mentioned
service command.

/Sven

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-02 10:24           ` Sven Schnelle
@ 2022-01-02 22:42             ` John David Anglin
  2022-01-02 22:53               ` Helge Deller
  0 siblings, 1 reply; 21+ messages in thread
From: John David Anglin @ 2022-01-02 22:42 UTC (permalink / raw)
  To: Sven Schnelle, Rolf Eike Beer; +Cc: linux-parisc

On 2022-01-02 5:24 a.m., Sven Schnelle wrote:
> Yes, when it hangs, press ESC followed by '('. This should give you the
> BMC prompt:
>
> CLI>
>
> Enter TOC s - this will take around 10s, and reboot the box. in the boot
> menu, you can than take a look at the TOC data with the mentioned
> service command.
It might be helpful to have this in wiki if it's not there.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-02 22:42             ` John David Anglin
@ 2022-01-02 22:53               ` Helge Deller
  2022-01-02 23:14                 ` John David Anglin
  0 siblings, 1 reply; 21+ messages in thread
From: Helge Deller @ 2022-01-02 22:53 UTC (permalink / raw)
  To: John David Anglin, Sven Schnelle, Rolf Eike Beer; +Cc: linux-parisc

On 1/2/22 23:42, John David Anglin wrote:
> On 2022-01-02 5:24 a.m., Sven Schnelle wrote:
>> Yes, when it hangs, press ESC followed by '('. This should give you the
>> BMC prompt:
>>
>> CLI>
>>
>> Enter TOC s - this will take around 10s, and reboot the box. in the boot
>> menu, you can than take a look at the TOC data with the mentioned
>> service command.
> It might be helpful to have this in wiki if it's not there.

I partly added it a few hours ago to:
https://parisc.wiki.kernel.org/index.php/How_to_report_a_parisc-linux_kernel_problem
-> Hung kernels.

Any other ideas?

Helge

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-02 22:53               ` Helge Deller
@ 2022-01-02 23:14                 ` John David Anglin
  0 siblings, 0 replies; 21+ messages in thread
From: John David Anglin @ 2022-01-02 23:14 UTC (permalink / raw)
  To: Helge Deller, Sven Schnelle, Rolf Eike Beer; +Cc: linux-parisc

On 2022-01-02 5:53 p.m., Helge Deller wrote:
> On 1/2/22 23:42, John David Anglin wrote:
>> On 2022-01-02 5:24 a.m., Sven Schnelle wrote:
>>> Yes, when it hangs, press ESC followed by '('. This should give you the
>>> BMC prompt:
>>>
>>> CLI>
>>>
>>> Enter TOC s - this will take around 10s, and reboot the box. in the boot
>>> menu, you can than take a look at the TOC data with the mentioned
>>> service command.
>> It might be helpful to have this in wiki if it's not there.
> I partly added it a few hours ago to:
> https://parisc.wiki.kernel.org/index.php/How_to_report_a_parisc-linux_kernel_problem
> -> Hung kernels.
Looks good.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-01 22:12       ` Sven Schnelle
  2022-01-01 22:28         ` Rolf Eike Beer
@ 2022-01-05  7:42         ` Rolf Eike Beer
  2022-01-05 12:08           ` Helge Deller
  2022-01-06  0:40         ` Rolf Eike Beer
  2 siblings, 1 reply; 21+ messages in thread
From: Rolf Eike Beer @ 2022-01-05  7:42 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

Am Samstag, 1. Januar 2022, 23:12:16 CET schrieb Sven Schnelle:

> Looks like you have a serial console connected? If yes, could you trigger a
> 'TOC s' from the BMC, and post the output from 'ser x 0 toc', where x is
> the processer number? This could help debugging this.

It locked up again, but the important part is not in the mail or the wiki: 
clear the error log before :/ And even worse, the C8000 does not seem to 
support "ser clearpim". You can see the firmware commands of a C8000 here: 
https://parisc.wiki.kernel.org/index.php/BMC.

Fun fact: when doing a command in firmware that prints a lot of stuff, like 
"se", and then switching back to the normal system console via ESC-) seems not 
to stop the firmware from printing more of it's stuff, so you get junk on the 
serial line until the firmware is eventually finished. Is there a way to 
switch the C8000 firmware to more than 9600 baud?

Sorry for the bad mood, but if that is what you first find after wakeup…

Eike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-05  7:42         ` Rolf Eike Beer
@ 2022-01-05 12:08           ` Helge Deller
  0 siblings, 0 replies; 21+ messages in thread
From: Helge Deller @ 2022-01-05 12:08 UTC (permalink / raw)
  To: Rolf Eike Beer, linux-parisc

On 1/5/22 08:42, Rolf Eike Beer wrote:
> Am Samstag, 1. Januar 2022, 23:12:16 CET schrieb Sven Schnelle:
>
>> Looks like you have a serial console connected? If yes, could you trigger a
>> 'TOC s' from the BMC, and post the output from 'ser x 0 toc', where x is
>> the processer number? This could help debugging this.
>
> It locked up again, but the important part is not in the mail or the wiki:
> clear the error log before :/ And even worse, the C8000 does not seem to
> support "ser clearpim". You can see the firmware commands of a C8000 here:
> https://parisc.wiki.kernel.org/index.php/BMC.

I think you mix up the BMC console (for remote management, always available) and
the BCH (boot console handler, only available on machine reboot before starting
the OS).
I'm sure the C8000 has a "ser clearpim" in the BCH.

> Fun fact: when doing a command in firmware that prints a lot of stuff, like
> "se", and then switching back to the normal system console via ESC-) seems not
> to stop the firmware from printing more of it's stuff, so you get junk on the
> serial line until the firmware is eventually finished.

Yes, I saw this once too.

> Is there a way to
> switch the C8000 firmware to more than 9600 baud?

In the BCH I assume.

> Sorry for the bad mood, but if that is what you first find after wakeup…

:-)

Helge

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-01 22:12       ` Sven Schnelle
  2022-01-01 22:28         ` Rolf Eike Beer
  2022-01-05  7:42         ` Rolf Eike Beer
@ 2022-01-06  0:40         ` Rolf Eike Beer
  2 siblings, 0 replies; 21+ messages in thread
From: Rolf Eike Beer @ 2022-01-06  0:40 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 5035 bytes --]

Am Samstag, 1. Januar 2022, 23:12:16 CET schrieb Sven Schnelle:

> Looks like you have a serial console connected? If yes, could you trigger a
> 'TOC s' from the BMC, and post the output from 'ser x 0 toc', where x is
> the processer number? This could help debugging this.

That command does not exist, I guess you meant this?

Service Menu: Enter command > ser 0 0 toc

ERROR: Unknown command

Service Menu: Enter command > pim 0 toc

FIRMWARE INFORMATION

   Firmware Version:           2.13
        BMC Version:          02.32


PROCESSOR PIM INFORMATION

-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03  0000000000000000  0000000040c0d500  00000000402970f0  00000040b6e3cad0
04-07  0000000040b6a500  0000000041f50270  0000000040daf6c0  0000000040b22a00
08-11  0000000041f50278  0000000040c0d500  0000000040daf778  0000000000000004
12-15  0000000000000001  0000000041f50278  0000000040b22a00  00000000401902e0
16-19  0000000000000004  0000000040b92d00  0000000040b92d00  000000000000000e
20-23  0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27  0000000000000001  0000000041f50278  0000000000000002  0000000040b6a500
28-31  0000000000000001  00000040b6e3cad0  00000040b6e3cb00  0000000041f8f2e0


                                                                 
Control Registers 0 - 31
00-03  0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07  0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11  0000000000037c9a  0000000000000000  00000000000000c0  000000000000003d
12-15  0000000000000000  0000000000000000  0000000000197000  fff8000000000000
16-19  000037f770d420a2  0000000000000000  0000000040297124  000000000ff0109c
20-23  0000000000000000  0000000000000000  000000ff0804ff0f  8000000000000000
24-27  0000000000f87000  0000004076032000  fffffdfeffffdfff  00000000f7afde80
28-31  0000004076e6e374  fffffd7effffffff  00000040b6e3c000  fffffffffffdffff

Space Registers 0 - 7
00-03  0000000006f93400  0000000000000000  0000000000000000  0000000006f93400
04-07  0000000000000000  0000000000000000  0000000000000000  0000000000000000

IIA Space (back entry)       = 0x0000000000000000
IIA Offset (back entry)      = 0x0000000040297128
CPU State                    = 0x9e000000

And this is the kernel bug:

[61412.598820] watchdog: BUG: soft lockup - CPU#0 stuck for 2998s! [cc1:7634]
[61412.598820] Modules linked in: 8021q ipmi_poweroff ipmi_si sata_via ipmi_devintf ipmi_msghandler cbc dm_zero dm_snapshot dm_mirror dm_region_hash dm_log dm_crypt dm_bufio pata_sil680 libata
[61412.598820] CPU: 0 PID: 7634 Comm: cc1 Tainted: G             L    5.15.11-gentoo-parisc64 #2
[61412.598820] Hardware name: 9000/785/C8000
[61412.598820]
[61412.598820]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[61412.598820] PSW: 00001000000001001111111100001111 Tainted: G             L
[61412.598820] r00-03  000000ff0804ff0f 0000000040c0d500 00000000402970f0 00000040b6e3cad0
[61412.598820] r04-07  0000000040b6a500 0000000041f50270 0000000040daf6c0 0000000040b22a00
[61412.598820] r08-11  0000000041f50278 0000000040c0d500 0000000040daf778 0000000000000004
[61412.598820] r12-15  0000000000000001 0000000041f50278 0000000040b22a00 00000000401902e0
[61412.598820] r16-19  0000000000000004 0000000040b92d00 0000000040b92d00 000000000000000e
[61412.598820] r20-23  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[61412.598820] r24-27  0000000000000001 0000000041f50278 0000000000000002 0000000040b6a500
[61412.598820] r28-31  0000000000000001 00000040b6e3cad0 00000040b6e3cb00 0000000041f8f2e0
[61412.598820] sr00-03  0000000006f93400 0000000000000000 0000000000000000 0000000006f93400
[61412.598820] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[61412.598820]
[61412.598820] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040297124 0000000040297128
[61412.598820]  IIR: 0ff0109c    ISR: 00000040b6e3c8f0  IOR: 000000000000000f
[61412.598820]  CPU:        0   CR30: 00000040b6e3c000 CR31: fffffffffffdffff
[61412.598820]  ORIG_R28: 00000000401db218
[61412.598820]  IAOQ[0]: smp_call_function_many_cond+0x20c/0x508
[61412.598820]  IAOQ[1]: smp_call_function_many_cond+0x210/0x508
[61412.598820]  RP(r2): smp_call_function_many_cond+0x1d8/0x508
[61412.598820] Backtrace:
[61412.598820]  [<00000000402974ec>] on_each_cpu_cond_mask+0x3c/0x48
[61412.598820]  [<000000004019c3b8>] flush_tlb_all+0x188/0x270
[61412.598820]  [<000000004019e224>] __flush_tlb_range+0x16c/0x178
[61412.598820]  [<000000004019ecb4>] flush_cache_range+0x384/0x410
[61412.598820]  [<00000000403144c0>] unmap_page_range+0xb8/0xc58
[61412.598820]  [<00000000403154dc>] unmap_vmas+0x9c/0xe0
[61412.598820]  [<000000004031fa78>] unmap_region+0x108/0x1b8
[61412.598820]  [<00000000403238d4>] __do_munmap+0x284/0x728
[61412.598820]  [<0000000040324754>] __vm_munmap+0xb4/0x148
[61412.598820]  [<0000000040325784>] sys_munmap+0x24/0x30
[61412.598820]  [<0000000040199f68>] syscall_exit+0x0/0x14

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2021-12-26 17:22 ` John David Anglin
  2021-12-27 14:30   ` Rolf Eike Beer
@ 2022-01-23 11:53   ` Rolf Eike Beer
  2022-01-23 13:51     ` John David Anglin
  2022-01-24  6:41     ` Rolf Eike Beer
  1 sibling, 2 replies; 21+ messages in thread
From: Rolf Eike Beer @ 2022-01-23 11:53 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
> On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
> > [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
> > rcu_eqs_enter.constprop.0+0x8c/0x98
> This is probably not reproducible. You might try this change from Sven

> I haven't found 5.15.11 to be stable.

When I was running 5.15.0 I had uptimes of 21 and 29 days before crashes, and 
then 5 days before I rebooted into 5.15.11 to test that.

With 5.15.11 my longest uptime was 5 days.

I have switched to 5.15.4 afterwards, which is now already up for 2 weeks. I 
see regular userspace crashes with that, usually gcc or ld as the machine is 
mainly building things, which seems to happen way more often than it has 
happened with 5.15.0 for me.

So much for the moment.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-23 11:53   ` Rolf Eike Beer
@ 2022-01-23 13:51     ` John David Anglin
  2022-01-23 14:36       ` Helge Deller
  2022-01-24  6:41     ` Rolf Eike Beer
  1 sibling, 1 reply; 21+ messages in thread
From: John David Anglin @ 2022-01-23 13:51 UTC (permalink / raw)
  To: Rolf Eike Beer, linux-parisc

On 2022-01-23 6:53 a.m., Rolf Eike Beer wrote:
> Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
>> On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
>>> [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
>>> rcu_eqs_enter.constprop.0+0x8c/0x98
>> This is probably not reproducible. You might try this change from Sven
>> I haven't found 5.15.11 to be stable.
> When I was running 5.15.0 I had uptimes of 21 and 29 days before crashes, and
> then 5 days before I rebooted into 5.15.11 to test that.
>
> With 5.15.11 my longest uptime was 5 days.
>
> I have switched to 5.15.4 afterwards, which is now already up for 2 weeks. I
> see regular userspace crashes with that, usually gcc or ld as the machine is
> mainly building things, which seems to happen way more often than it has
> happened with 5.15.0 for me.
The problem is how to find the changes responsible for this instability?  I'm sure they aren't
caused by parisc specific changes.  It would take a very long time to bisect and test, and it would
be very easy to make a mistake bisecting because the issues are close to random.

So the best we can do is to analyze specific problems and try to fix them.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-23 13:51     ` John David Anglin
@ 2022-01-23 14:36       ` Helge Deller
  0 siblings, 0 replies; 21+ messages in thread
From: Helge Deller @ 2022-01-23 14:36 UTC (permalink / raw)
  To: John David Anglin, Rolf Eike Beer, linux-parisc

On 1/23/22 14:51, John David Anglin wrote:
> On 2022-01-23 6:53 a.m., Rolf Eike Beer wrote:
>> Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
>>> On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
>>>> [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
>>>> rcu_eqs_enter.constprop.0+0x8c/0x98
>>> This is probably not reproducible. You might try this change from Sven
>>> I haven't found 5.15.11 to be stable.
>> When I was running 5.15.0 I had uptimes of 21 and 29 days before crashes, and
>> then 5 days before I rebooted into 5.15.11 to test that.
>>
>> With 5.15.11 my longest uptime was 5 days.
>>
>> I have switched to 5.15.4 afterwards, which is now already up for 2 weeks. I
>> see regular userspace crashes with that, usually gcc or ld as the machine is
>> mainly building things, which seems to happen way more often than it has
>> happened with 5.15.0 for me.
> The problem is how to find the changes responsible for this instability?  I'm sure they aren't
> caused by parisc specific changes.  It would take a very long time to bisect and test, and it would
> be very easy to make a mistake bisecting because the issues are close to random.
>
> So the best we can do is to analyze specific problems and try to fix them.

I also faced problems with 5.15.
So I switched all hppa debian buildd servers to 5.10.90 and they are running stable since 14 days now.
I think 5.15 isn't used widely yet, so there are probably still many non-parisc related issues in it.

Helge

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-23 11:53   ` Rolf Eike Beer
  2022-01-23 13:51     ` John David Anglin
@ 2022-01-24  6:41     ` Rolf Eike Beer
  2022-01-24 17:24       ` John David Anglin
  1 sibling, 1 reply; 21+ messages in thread
From: Rolf Eike Beer @ 2022-01-24  6:41 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 3480 bytes --]

Am Sonntag, 23. Januar 2022, 12:53:22 CET schrieb Rolf Eike Beer:
> Am Sonntag, 26. Dezember 2021, 18:22:12 CET schrieb John David Anglin:
> > On 2021-12-26 11:21 a.m., Rolf Eike Beer wrote:
> > > [139181.966881] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613
> > > rcu_eqs_enter.constprop.0+0x8c/0x98
> > 
> > This is probably not reproducible. You might try this change from Sven
> > 
> > I haven't found 5.15.11 to be stable.
> 
> When I was running 5.15.0 I had uptimes of 21 and 29 days before crashes,
> and then 5 days before I rebooted into 5.15.11 to test that.
> 
> With 5.15.11 my longest uptime was 5 days.
> 
> I have switched to 5.15.4 afterwards, which is now already up for 2 weeks. I
> see regular userspace crashes with that, usually gcc or ld as the machine
> is mainly building things, which seems to happen way more often than it has
> happened with 5.15.0 for me.
> 
> So much for the moment.

That was yesterday. And now I just got this:

[1274934.746891] Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0000004140000018
[1274934.746891] CPU: 3 PID: 5549 Comm: cmake Not tainted 5.15.4-gentoo-parisc64 #4
[1274934.746891] Hardware name: 9000/785/C8000
[1274934.746891]
[1274934.746891]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[1274934.746891] PSW: 00001000000001001111111000001110 Not tainted
[1274934.746891] r00-03  000000ff0804fe0e 0000000040bc9bc0 00000000406760e4 0000004140000000
[1274934.746891] r04-07  0000000040b693c0 0000004140000000 000000004a2b08b0 0000000000000001
[1274934.746891] r08-11  0000000041f98810 0000000000000000 000000004a0a7000 0000000000000001
[1274934.746891] r12-15  0000000040bddbc0 0000000040c0cbc0 0000000040bddbc0 0000000040bddbc0
[1274934.746891] r16-19  0000000040bde3c0 0000000040bddbc0 0000000040bde3c0 0000000000000007
[1274934.746891] r20-23  0000000000000006 000000004a368950 0000000000000000 0000000000000001
[1274934.746891] r24-27  0000000000001fff 000000000800000e 000000004a1710f0 0000000040b693c0
[1274934.746891] r28-31  0000000000000001 0000000041f988b0 0000000041f98840 000000004a171118
[1274934.746891] sr00-03  00000000066e5800 0000000000000000 0000000000000000 00000000066e5800
[1274934.746891] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[1274934.746891]
[1274934.746891] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406760e8 00000000406760ec
[1274934.746891]  IIR: 48780030    ISR: 0000000000000000  IOR: 0000004140000018
[1274934.746891]  CPU:        3   CR30: 00000040e3a9c000 CR31: ffffffffffffffff
[1274934.746891]  ORIG_R28: 0000000040acdd58
[1274934.746891]  IAOQ[0]: sba_unmap_sg+0xb0/0x118
[1274934.746891]  IAOQ[1]: sba_unmap_sg+0xb4/0x118
[1274934.746891]  RP(r2): sba_unmap_sg+0xac/0x118
[1274934.746891] Backtrace:
[1274934.746891]  [<00000000402740cc>] dma_unmap_sg_attrs+0x6c/0x70
[1274934.746891]  [<000000004074d6bc>] scsi_dma_unmap+0x54/0x60
[1274934.746891]  [<00000000407a3488>] mptscsih_io_done+0x150/0xd70
[1274934.746891]  [<0000000040798600>] mpt_interrupt+0x168/0xa68
[1274934.746891]  [<0000000040255a48>] __handle_irq_event_percpu+0xc8/0x278
[1274934.746891]  [<0000000040255c34>] handle_irq_event_percpu+0x3c/0xd8
[1274934.746891]  [<000000004025ecb4>] handle_percpu_irq+0xb4/0xf0
[1274934.746891]  [<00000000402548e0>] generic_handle_irq+0x50/0x70
[1274934.746891]  [<000000004019a254>] call_on_stack+0x18/0x24
[1274934.746891]
[1274934.746891] Kernel panic - not syncing: Bad Address (null pointer deref?)


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-24  6:41     ` Rolf Eike Beer
@ 2022-01-24 17:24       ` John David Anglin
  2022-01-24 17:41         ` John David Anglin
  0 siblings, 1 reply; 21+ messages in thread
From: John David Anglin @ 2022-01-24 17:24 UTC (permalink / raw)
  To: Rolf Eike Beer, linux-parisc

On 2022-01-24 1:41 a.m., Rolf Eike Beer wrote:
>> So much for the moment.
> That was yesterday. And now I just got this:
>
> [1274934.746891] Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0000004140000018
> [1274934.746891] CPU: 3 PID: 5549 Comm: cmake Not tainted 5.15.4-gentoo-parisc64 #4
> [1274934.746891] Hardware name: 9000/785/C8000
> [1274934.746891]
> [1274934.746891]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [1274934.746891] PSW: 00001000000001001111111000001110 Not tainted
> [1274934.746891] r00-03  000000ff0804fe0e 0000000040bc9bc0 00000000406760e4 0000004140000000
> [1274934.746891] r04-07  0000000040b693c0 0000004140000000 000000004a2b08b0 0000000000000001
> [1274934.746891] r08-11  0000000041f98810 0000000000000000 000000004a0a7000 0000000000000001
> [1274934.746891] r12-15  0000000040bddbc0 0000000040c0cbc0 0000000040bddbc0 0000000040bddbc0
> [1274934.746891] r16-19  0000000040bde3c0 0000000040bddbc0 0000000040bde3c0 0000000000000007
> [1274934.746891] r20-23  0000000000000006 000000004a368950 0000000000000000 0000000000000001
> [1274934.746891] r24-27  0000000000001fff 000000000800000e 000000004a1710f0 0000000040b693c0
> [1274934.746891] r28-31  0000000000000001 0000000041f988b0 0000000041f98840 000000004a171118
> [1274934.746891] sr00-03  00000000066e5800 0000000000000000 0000000000000000 00000000066e5800
> [1274934.746891] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [1274934.746891]
> [1274934.746891] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406760e8 00000000406760ec
> [1274934.746891]  IIR: 48780030    ISR: 0000000000000000  IOR: 0000004140000018
> [1274934.746891]  CPU:        3   CR30: 00000040e3a9c000 CR31: ffffffffffffffff
> [1274934.746891]  ORIG_R28: 0000000040acdd58
> [1274934.746891]  IAOQ[0]: sba_unmap_sg+0xb0/0x118
> [1274934.746891]  IAOQ[1]: sba_unmap_sg+0xb4/0x118
> [1274934.746891]  RP(r2): sba_unmap_sg+0xac/0x118
> [1274934.746891] Backtrace:
> [1274934.746891]  [<00000000402740cc>] dma_unmap_sg_attrs+0x6c/0x70
> [1274934.746891]  [<000000004074d6bc>] scsi_dma_unmap+0x54/0x60
> [1274934.746891]  [<00000000407a3488>] mptscsih_io_done+0x150/0xd70
> [1274934.746891]  [<0000000040798600>] mpt_interrupt+0x168/0xa68
> [1274934.746891]  [<0000000040255a48>] __handle_irq_event_percpu+0xc8/0x278
> [1274934.746891]  [<0000000040255c34>] handle_irq_event_percpu+0x3c/0xd8
> [1274934.746891]  [<000000004025ecb4>] handle_percpu_irq+0xb4/0xf0
> [1274934.746891]  [<00000000402548e0>] generic_handle_irq+0x50/0x70
> [1274934.746891]  [<000000004019a254>] call_on_stack+0x18/0x24
Faulting instruction is "ldw 18(r3),r24".  Address in $r3 (and $r5) seems bad.  Think the sglist argument
to sba_unmap_sg() is bad.  Don't have a clue as to why this might be.

There are a number of debug flags in the code in drivers/parisc/sba_iommu.c:

/*
** The number of debug flags is a clue - this code is fragile.
** Don't even think about messing with it unless you have
** plenty of 710's to sacrifice to the computer gods. :^)
*/

Grant was expert in this code.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-24 17:24       ` John David Anglin
@ 2022-01-24 17:41         ` John David Anglin
  2022-01-25 16:54           ` Rolf Eike Beer
  0 siblings, 1 reply; 21+ messages in thread
From: John David Anglin @ 2022-01-24 17:41 UTC (permalink / raw)
  To: Rolf Eike Beer, linux-parisc

On 2022-01-24 12:24 p.m., John David Anglin wrote:
> Faulting instruction is "ldw 18(r3),r24".  Address in $r3 (and $r5) seems bad.  Think the sglist argument
> to sba_unmap_sg() is bad.  Don't have a clue as to why this might be.
Maybe try interchanging operands of following &&

         while (sg_dma_len(sglist) && nents--) {

so nents is checked first.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-24 17:41         ` John David Anglin
@ 2022-01-25 16:54           ` Rolf Eike Beer
  2022-01-25 17:26             ` John David Anglin
  0 siblings, 1 reply; 21+ messages in thread
From: Rolf Eike Beer @ 2022-01-25 16:54 UTC (permalink / raw)
  To: linux-parisc, John David Anglin

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]

Am Montag, 24. Januar 2022, 18:41:27 CET schrieb John David Anglin:
> On 2022-01-24 12:24 p.m., John David Anglin wrote:
> > Faulting instruction is "ldw 18(r3),r24".  Address in $r3 (and $r5) seems 
> > bad.  Think the sglist argument
> > to sba_unmap_sg() is bad.  Don't have a clue as to why this might be.
> Maybe try interchanging operands of following &&
> 
>          while (sg_dma_len(sglist) && nents--) {
> 
> so nents is checked first.

But nents would be increased then even for the case that sg_dma_len() returns 
false, which may or may not be wanted.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: pagefaults and hang with 5.15.11
  2022-01-25 16:54           ` Rolf Eike Beer
@ 2022-01-25 17:26             ` John David Anglin
  0 siblings, 0 replies; 21+ messages in thread
From: John David Anglin @ 2022-01-25 17:26 UTC (permalink / raw)
  To: Rolf Eike Beer, linux-parisc

On 2022-01-25 11:54 a.m., Rolf Eike Beer wrote:
> Am Montag, 24. Januar 2022, 18:41:27 CET schrieb John David Anglin:
>> On 2022-01-24 12:24 p.m., John David Anglin wrote:
>>> Faulting instruction is "ldw 18(r3),r24".  Address in $r3 (and $r5) seems
>>> bad.  Think the sglist argument
>>> to sba_unmap_sg() is bad.  Don't have a clue as to why this might be.
>> Maybe try interchanging operands of following &&
>>
>>           while (sg_dma_len(sglist) && nents--) {
>>
>> so nents is checked first.
> But nents would be increased then even for the case that sg_dma_len() returns
> false, which may or may not be wanted.
You are correct.  The decrement of nents needs to be in loop so count in following DBG_RUN_SG is correct:

         while (sg_dma_len(sglist) && nents--) {

                 sba_unmap_page(dev, sg_dma_address(sglist), sg_dma_len(sglist),
                                 direction, 0);
#ifdef SBA_COLLECT_STATS
                 ioc->usg_pages += ((sg_dma_address(sglist) & ~IOVP_MASK) + sg_dma_len(sglist) + IOVP_SIZE - 1) >> PAGE_SHIFT;
                 ioc->usingle_calls--;   /* kluge since call is unmap_sg() */
#endif
                 ++sglist;
         }

         DBG_RUN_SG("%s() DONE (nents %d)\n", __func__,  nents);

However, nents still needs to be checked first.

What has happened is the sglist pointer has crossed a page boundary causing the TLB miss.  The offset of sg_dma_len is 0x18
and checking sg_dma_len(sglist) first causes the fault.

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-01-25 17:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-26 16:21 pagefaults and hang with 5.15.11 Rolf Eike Beer
2021-12-26 17:22 ` John David Anglin
2021-12-27 14:30   ` Rolf Eike Beer
2021-12-28 21:55     ` Rolf Eike Beer
2022-01-01 22:12       ` Sven Schnelle
2022-01-01 22:28         ` Rolf Eike Beer
2022-01-02 10:24           ` Sven Schnelle
2022-01-02 22:42             ` John David Anglin
2022-01-02 22:53               ` Helge Deller
2022-01-02 23:14                 ` John David Anglin
2022-01-05  7:42         ` Rolf Eike Beer
2022-01-05 12:08           ` Helge Deller
2022-01-06  0:40         ` Rolf Eike Beer
2022-01-23 11:53   ` Rolf Eike Beer
2022-01-23 13:51     ` John David Anglin
2022-01-23 14:36       ` Helge Deller
2022-01-24  6:41     ` Rolf Eike Beer
2022-01-24 17:24       ` John David Anglin
2022-01-24 17:41         ` John David Anglin
2022-01-25 16:54           ` Rolf Eike Beer
2022-01-25 17:26             ` John David Anglin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.