All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: linux-xfs@vger.kernel.org
Subject: [Bug 217572] Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13
Date: Tue, 20 Jun 2023 15:10:49 +0000	[thread overview]
Message-ID: <bug-217572-201763-CDEPKlpRnd@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217572-201763@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=217572

--- Comment #1 from Christian Theune (ct@flyingcircus.io) ---
Hi,

is there anything I can do to raise interest on this one? :)

> On 19. Jun 2023, at 10:29, bugzilla-daemon@kernel.org wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=217572
> 
>            Bug ID: 217572
>           Summary: Initial blocked tasks causing deterioration over hours
>                    until (nearly) complete system lockup and data loss
>                    with PostgreSQL 13
>           Product: File System
>           Version: 2.5
>          Hardware: All
>                OS: Linux
>            Status: NEW
>          Severity: high
>          Priority: P3
>         Component: XFS
>          Assignee: filesystem_xfs@kernel-bugs.kernel.org
>          Reporter: ct@flyingcircus.io
>        Regression: No
> 
> Last Friday we experienced the following hung task messages with PostgreSQL
> while performing our nightly backup using pg_dump. Normally this takes at
> most
> a few minutes with IO being stressed. This time it caused high SYS CPU time,
> went on for almost 20 minutes and caused the PostgreSQL dump to fail with
> inconsistent data.
> 
> Around 3:50 AM we got this:
> 
> [330289.821046] INFO: task .postgres-wrapp:11884 blocked for more than 122
> seconds.
> [330289.821830]       Not tainted 6.1.31 #1-NixOS
> [330289.822285] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [330289.823098] task:.postgres-wrapp state:D stack:0     pid:11884 ppid:11881 
> flags:0x00000002
> [330289.823911] Call Trace:
> [330289.824221]  <TASK>
> [330289.824451]  __schedule+0x35d/0x1370
> [330289.824858]  ? mntput_no_expire+0x4a/0x250
> [330289.825307]  schedule+0x5d/0xe0
> [330289.825630]  rwsem_down_write_slowpath+0x34e/0x730
> [330289.826128]  xfs_ilock+0xeb/0xf0 [xfs]
> [330289.826599]  xfs_file_buffered_write+0x119/0x300 [xfs]
> [330289.827212]  ? selinux_file_permission+0x10b/0x150
> [330289.827683]  vfs_write+0x244/0x400
> [330289.828049]  __x64_sys_pwrite64+0x94/0xc0
> [330289.828459]  do_syscall_64+0x3a/0x90
> [330289.828801]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330289.829302] RIP: 0033:0x7ff8de90e7f7
> [330289.829681] RSP: 002b:00007fff52069b08 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000012
> [330289.830408] RAX: ffffffffffffffda RBX: 0000562bb434f510 RCX:
> 00007ff8de90e7f7
> [330289.831073] RDX: 0000000000002000 RSI: 00007ff888b52e80 RDI:
> 000000000000003b
> [330289.831762] RBP: 00000000000021b0 R08: 000000000a000010 R09:
> 0000000000000040
> [330289.832440] R10: 0000000004ed8000 R11: 0000000000000202 R12:
> 0000000000002000
> [330289.833130] R13: 0000000004ed8000 R14: 00007ff8de8176c8 R15:
> 0000562bb434af75
> [330289.833803]  </TASK>
> [330289.834064] INFO: task .postgres-wrapp:1245532 blocked for more than 122
> seconds.
> [330289.834771]       Not tainted 6.1.31 #1-NixOS
> [330289.835209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [330289.835926] task:.postgres-wrapp state:D stack:0     pid:1245532
> ppid:11881
> flags:0x00000002
> [330289.836752] Call Trace:
> [330289.837010]  <TASK>
> [330289.837258]  __schedule+0x35d/0x1370
> [330289.837622]  ? page_add_file_rmap+0xba/0x2f0
> [330289.838047]  ? do_set_pte+0x174/0x1c0
> [330289.838420]  ? unix_stream_read_generic+0x223/0xa60
> [330289.838887]  schedule+0x5d/0xe0
> [330289.839265]  schedule_preempt_disabled+0x14/0x30
> [330289.839758]  rwsem_down_read_slowpath+0x29e/0x4f0
> [330289.840521]  down_read+0x47/0xb0
> [330289.840853]  xfs_ilock+0x79/0xf0 [xfs]
> [330289.841346]  xfs_file_buffered_read+0x44/0xd0 [xfs]
> [330289.841945]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330289.842496]  vfs_read+0x23c/0x310
> [330289.842845]  __x64_sys_pread64+0x94/0xc0
> [330289.843303]  do_syscall_64+0x3a/0x90
> [330289.843857]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330289.844420] RIP: 0033:0x7ff8de90e747
> [330289.844898] RSP: 002b:00007fff5206aae8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330289.845705] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330289.846488] RDX: 0000000000002000 RSI: 00007ff8896e0e80 RDI:
> 0000000000000010
> [330289.847133] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330289.847627] R10: 0000000018fb8000 R11: 0000000000000202 R12:
> 00007ff8cdd91278
> [330289.848113] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5e89e70
> [330289.848631]  </TASK>
> 
> 
> This keeps going on for a few minutes until:
> 
> [330584.618978] rcu: INFO: rcu_preempt self-detected stall on CPU
> [330584.619413] rcu:    1-....: (20999 ticks this GP)
> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=5231
> [330584.620052]         (t=21000 jiffies g=142859597 q=21168 ncpus=3)
> [330584.620409] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [330584.620880] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [330584.621588] RIP: 0010:xas_load+0xb/0x40
> [330584.621844] Code: 04 48 8b 44 c2 08 c3 cc cc cc cc 48 8b 07 48 8b 40 08
> c3
> cc cc cc cc 0f 1f 84 00 00 00 00 00 e8 3b ff ff ff 48 89 c2 83 e2 03 <48> 83
> fa
> 02 75 08 48 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f
> [330584.622996] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000202
> [330584.623337] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
> 0000000000000002
> [330584.623787] RDX: 0000000000000002 RSI: ffff98871f9d2920 RDI:
> ffffb427c3387c00
> [330584.624239] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [330584.624686] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [330584.625136] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [330584.625591] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [330584.626097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [330584.626467] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [330584.626954] Call Trace:
> [330584.627125]  <IRQ>
> [330584.627281]  ? rcu_dump_cpu_stacks+0xc8/0x100
> [330584.627567]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [330584.627882]  ? sched_slice+0x87/0x140
> [330584.628126]  ? timekeeping_update+0xdd/0x130
> [330584.628414]  ? __cgroup_account_cputime_field+0x5b/0xa0
> [330584.628751]  ? update_process_times+0x77/0xb0
> [330584.629036]  ? update_wall_time+0xc/0x20
> [330584.629300]  ? tick_sched_handle+0x34/0x50
> [330584.629564]  ? tick_sched_timer+0x6f/0x80
> [330584.629823]  ? tick_sched_do_timer+0xa0/0xa0
> [330584.630103]  ? __hrtimer_run_queues+0x112/0x2b0
> [330584.630404]  ? hrtimer_interrupt+0xfe/0x220
> [330584.630678]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [330584.631016]  ? sysvec_apic_timer_interrupt+0x99/0xc0
> [330584.631339]  </IRQ>
> [330584.631485]  <TASK>
> [330584.631631]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [330584.631974]  ? xas_load+0xb/0x40
> [330584.632195]  ? xas_load+0x30/0x40
> [330584.632415]  filemap_get_read_batch+0x16e/0x250
> [330584.632710]  filemap_get_pages+0xa9/0x630
> [330584.632972]  ? memcg_check_events+0xda/0x210
> [330584.633259]  ? free_unref_page_commit+0x7c/0x170
> [330584.633560]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [330584.633880]  ? free_unref_page+0x1ac/0x220
> [330584.634146]  filemap_read+0xd2/0x340
> [330584.634389]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [330584.634778]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330584.635123]  vfs_read+0x23c/0x310
> [330584.635354]  __x64_sys_pread64+0x94/0xc0
> [330584.635609]  do_syscall_64+0x3a/0x90
> [330584.635846]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330584.636174] RIP: 0033:0x7ff8de90e747
> [330584.636437] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [330584.637582] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330584.638056] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330584.638506] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [330584.638954] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330584.639411] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [330584.639862] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [330584.640313]  </TASK>
> [330584.721995] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks:
> { 1-.... } 21015 jiffies s: 7297 root: 0x2/.
> [330584.722930] rcu: blocking rcu_node structures (internal RCU debug):
> [330584.723428] Sending NMI from CPU 0 to CPUs 1:
> [330584.723763] NMI backtrace for cpu 1
> [330584.723769] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [330584.723772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [330584.723775] RIP: 0010:xas_load+0x29/0x40
> [330584.723783] Code: 00 e8 3b ff ff ff 48 89 c2 83 e2 03 48 83 fa 02 75 08
> 48
> 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f 10 48 8d 70 fe 38 48 fe <72> ee
> e8
> 20 fe ff ff 80 3e 00 75 d0 c3 cc cc cc cc 66 0f 1f 44 00
> [330584.723785] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000246
> [330584.723787] RAX: ffff98871f8dbdaa RBX: ffffb427c3387d70 RCX:
> 0000000000000000
> [330584.723788] RDX: 0000000000000002 RSI: ffff98871f8dbda8 RDI:
> ffffb427c3387c00
> [330584.723789] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [330584.723790] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [330584.723791] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [330584.723793] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [330584.723794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [330584.723795] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [330584.723798] Call Trace:
> [330584.723801]  <NMI>
> [330584.723805]  ? nmi_cpu_backtrace.cold+0x1b/0x76
> [330584.723810]  ? nmi_cpu_backtrace_handler+0xd/0x20
> [330584.723814]  ? nmi_handle+0x5d/0x120
> [330584.723817]  ? xas_load+0x29/0x40
> [330584.723818]  ? default_do_nmi+0x69/0x170
> [330584.723821]  ? exc_nmi+0x13c/0x170
> [330584.723823]  ? end_repeat_nmi+0x16/0x67
> [330584.723828]  ? xas_load+0x29/0x40
> [330584.723830]  ? xas_load+0x29/0x40
> [330584.723832]  ? xas_load+0x29/0x40
> [330584.723834]  </NMI>
> [330584.723834]  <TASK>
> [330584.723835]  filemap_get_read_batch+0x16e/0x250
> [330584.723840]  filemap_get_pages+0xa9/0x630
> [330584.723842]  ? memcg_check_events+0xda/0x210
> [330584.723845]  ? free_unref_page_commit+0x7c/0x170
> [330584.723849]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [330584.723851]  ? free_unref_page+0x1ac/0x220
> [330584.723852]  filemap_read+0xd2/0x340
> [330584.723857]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [330584.723935]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330584.723990]  vfs_read+0x23c/0x310
> [330584.723995]  __x64_sys_pread64+0x94/0xc0
> [330584.723997]  do_syscall_64+0x3a/0x90
> [330584.724000]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330584.724003] RIP: 0033:0x7ff8de90e747
> [330584.724019] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [330584.724020] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330584.724021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330584.724022] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [330584.724023] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330584.724023] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [330584.724024] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [330584.724026]  </TASK>
> 
> 
> This keeps repeating until around 4:03 which ends in systemd-journal getting
> coredumped:
> 
> [331277.846966] rcu: INFO: rcu_preempt self-detected stall on CPU
> [331277.847413] rcu:    1-....: (713858 ticks this GP)
> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=311996
> [331277.848088]         (t=714253 jiffies g=142859597 q=1821602 ncpus=3)
> [331277.848462] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [331277.848941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [331277.849667] RIP: 0010:xas_descend+0x22/0x70
> [331277.849952] Code: cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea
> 83
> e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 <48> 83
> f9
> 02 75 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc
> [331277.851120] RSP: 0018:ffffb427c3387bf0 EFLAGS: 00000202
> [331277.851468] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
> 0000000000000002
> [331277.851928] RDX: 000000000000000f RSI: ffff98871f9d2920 RDI:
> ffffb427c3387c00
> [331277.852386] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [331277.852847] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [331277.853303] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [331277.853766] FS:  00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [331277.854278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [331277.854653] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [331277.855111] Call Trace:
> [331277.855284]  <IRQ>
> [331277.855433]  ? rcu_dump_cpu_stacks+0xc8/0x100
> [331277.855730]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [331277.856049]  ? sched_slice+0x87/0x140
> [331277.856294]  ? timekeeping_update+0xdd/0x130
> [331277.856587]  ? __cgroup_account_cputime_field+0x5b/0xa0
> [331277.856930]  ? update_process_times+0x77/0xb0
> [331277.857222]  ? update_wall_time+0xc/0x20
> [331277.857482]  ? tick_sched_handle+0x34/0x50
> [331277.857758]  ? tick_sched_timer+0x6f/0x80
> [331277.858024]  ? tick_sched_do_timer+0xa0/0xa0
> [331277.858306]  ? __hrtimer_run_queues+0x112/0x2b0
> [331277.858613]  ? hrtimer_interrupt+0xfe/0x220
> [331277.858891]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [331277.859235]  ? sysvec_apic_timer_interrupt+0x99/0xc0
> [331277.859558]  </IRQ>
> [331277.859710]  <TASK>
> [331277.859857]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [331277.860206]  ? xas_descend+0x22/0x70
> [331277.860446]  xas_load+0x30/0x40
> [331277.860665]  filemap_get_read_batch+0x16e/0x250
> [331277.860967]  filemap_get_pages+0xa9/0x630
> [331277.861233]  ? memcg_check_events+0xda/0x210
> [331277.861517]  ? free_unref_page_commit+0x7c/0x170
> [331277.861834]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [331277.862158]  ? free_unref_page+0x1ac/0x220
> [331277.862427]  filemap_read+0xd2/0x340
> [331277.862677]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [331277.863072]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [331277.863424]  vfs_read+0x23c/0x310
> [331277.863657]  __x64_sys_pread64+0x94/0xc0
> [331277.863917]  do_syscall_64+0x3a/0x90
> [331277.864159]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [331277.864488] RIP: 0033:0x7ff8de90e747
> [331277.864752] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [331277.865917] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [331277.866395] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [331277.866848] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [331277.867298] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [331277.867750] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [331277.868199] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [331277.868658]  </TASK>
> Fri Jun 16 04:03:23 AM CEST 2023 -- SERIAL CONSOLE IS LIVE --
> [331289.987247] systemd[1]: Starting Serial console liveness marker...
> [331289.992645] systemd[1]: nscd.service: Deactivated successfully.
> [331290.011838] systemd[1]: nscd.service: Consumed 3min 16.251s CPU time,
> received 12.0M IP traffic, sent 8.2M IP traffic.
> [331290.020289] systemd[1]: serial-console-liveness.service: Deactivated
> successfully.
> [331290.035818] systemd[1]: Finished Serial console liveness marker.
> [331290.068776] systemd[1]: Started Logrotate Service.
> [331290.069700] systemd[1]: systemd-journald.service: Watchdog timeout (limit
> 3min)!
> [331290.070475] systemd[1]: systemd-journald.service: Killing process 528
> (systemd-journal) with signal SIGABRT.
> [331290.096246] systemd[1]: logrotate.service: Deactivated successfully.
> [331290.285285] systemd-coredump[1267441]: Process 528 (systemd-journal) of
> user 0 dumped core.
> [331290.286002] systemd-coredump[1267441]: Coredump diverted to
>
> /var/lib/systemd/coredump/core.systemd-journal.0.39df468c96764e8c8fffc53a0d0a47d1.528.1686881003000000.zst
> [331290.287045] systemd-coredump[1267441]: Module libaudit.so.1 without
> build-id.
> [331290.287574] systemd-coredump[1267441]: Module libnftnl.so.11 without
> build-id.
> [331290.288111] systemd-coredump[1267441]: Module libmnl.so.0 without
> build-id.
> [331290.288587] systemd-coredump[1267441]: Module libgpg-error.so.0 without
> build-id.
> [331290.289076] systemd-coredump[1267441]: Module libattr.so.1 without
> build-id.
> [331290.289541] systemd-coredump[1267441]: Module libzstd.so.1 without
> build-id.
> [331290.289991] systemd-coredump[1267441]: Module liblzma.so.5 without
> build-id.
> [331290.290446] systemd-coredump[1267441]: Module libseccomp.so.2 without
> build-id.
> [331290.290940] systemd-coredump[1267441]: Module libpam.so.0 without
> build-id.
> [331290.291419] systemd-coredump[1267441]: Module liblz4.so.1 without
> build-id.
> [331290.291893] systemd-coredump[1267441]: Module libkmod.so.2 without
> build-id.
> [331290.292370] systemd-coredump[1267441]: Module libip4tc.so.2 without
> build-id.
> [331290.295165] systemd-coredump[1267441]: Module libgcrypt.so.20 without
> build-id.
> [331290.295678] systemd-coredump[1267441]: Module libcrypt.so.2 without
> build-id.
> [331290.296160] systemd-coredump[1267441]: Module libcap.so.2 without
> build-id.
> [331290.296708] systemd-coredump[1267441]: Module libacl.so.1 without
> build-id.
> [331290.297206] systemd-coredump[1267441]: Module libsystemd-shared-253.so
> without build-id.
> [331290.297768] systemd-coredump[1267441]: Module systemd-journald without
> build-id.
> [331290.298273] systemd-coredump[1267441]: Stack trace of thread 528:
> [331290.298714] systemd-coredump[1267441]: #0  0x00007f3e96431de8
> check_object_header (libsystemd-shared-253.so + 0x231de8)
> [331290.299435] systemd-coredump[1267441]: #1  0x00007f3e964330d2
> journal_file_move_to_object (libsystemd-shared-253.so + 0x2330d2)
> [331290.300199] systemd-coredump[1267441]: #2  0x00007f3e96434a71
> journal_file_find_data_object_with_hash (libsystemd-shared-253.so + 0x234a71)
> [331290.302207] systemd-coredump[1267441]: #3  0x00007f3e96434d3b
> journal_file_append_data (libsystemd-shared-253.so + 0x234d3b)
> [331290.302983] systemd-coredump[1267441]: #4  0x00007f3e96437243
> journal_file_append_entry (libsystemd-shared-253.so + 0x237243)
> [331290.303767] systemd-coredump[1267441]: #5  0x000056536a4a7cd5
> server_dispatch_message_real (systemd-journald + 0x10cd5)
> [331290.304504] systemd-coredump[1267441]: #6  0x000056536a4a134b
> dev_kmsg_record (systemd-journald + 0xa34b)
> [331290.305195] systemd-coredump[1267441]: #7  0x000056536a4a182b
> server_read_dev_kmsg (systemd-journald + 0xa82b)
> [331290.305922] systemd-coredump[1267441]: #8  0x00007f3e9645c140
> source_dispatch (libsystemd-shared-253.so + 0x25c140)
> [331290.306640] systemd-coredump[1267441]: #9  0x00007f3e9645c42d
> sd_event_dispatch (libsystemd-shared-253.so + 0x25c42d)
> [331290.307350] systemd-coredump[1267441]: #10 0x00007f3e9645cb48
> sd_event_run
> (libsystemd-shared-253.so + 0x25cb48)
> [331290.308043] systemd-coredump[1267441]: #11 0x000056536a4a0568 main
> (systemd-journald + 0x9568)
> [331290.308652] systemd-coredump[1267441]: #12 0x00007f3e9603dace
> __libc_start_call_main (libc.so.6 + 0x23ace)
> [331290.309292] systemd-coredump[1267441]: #13 0x00007f3e9603db89
> __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89)
> [331290.310048] systemd-coredump[1267441]: #14 0x000056536a4a0835 _start
> (systemd-journald + 0x9835)
> [331290.310692] systemd-coredump[1267441]: ELF object binary architecture:
> AMD
> x86-64
> 
> 
> This go back to normal until around 20:03 (around 14 hours later):
> 
> [389370.873026] rcu: INFO: rcu_preempt self-detected stall on CPU
> [389370.873481] rcu:    0-....: (21000 ticks this GP)
> idle=91fc/1/0x4000000000000000 softirq=85252827/85252827 fqs=4704
> [389370.874126]         (t=21002 jiffies g=167843445 q=13889 ncpus=3)
> [389370.874500] CPU: 0 PID: 2202919 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [389370.874979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [389370.875710] RIP: 0010:xas_descend+0x26/0x70
> [389370.875991] Code: cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89
> d0
> 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 <75> 08
> 48
> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02
> [389370.877164] RSP: 0018:ffffb427c4917bf0 EFLAGS: 00000246
> [389370.877512] RAX: ffff98871f8dbdaa RBX: ffffb427c4917d70 RCX:
> 0000000000000002
> [389370.877967] RDX: 0000000000000005 RSI: ffff988876d3c000 RDI:
> ffffb427c4917c00
> [389370.878426] RBP: 000000000000f177 R08: ffffb427c4917e68 R09:
> ffff988846485d38
> [389370.878880] R10: ffffb427c4917e60 R11: ffff988846485d38 R12:
> 000000000000f177
> [389370.879341] R13: ffff988827b4ae00 R14: 000000000000f176 R15:
> ffffb427c4917e90
> [389370.879798] FS:  00007ff8de817800(0000) GS:ffff98887ac00000(0000)
> knlGS:0000000000000000
> [389370.880308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [389370.880683] CR2: 00007ff881c8c000 CR3: 000000010dfea000 CR4:
> 00000000000006f0
> [389370.881139] Call Trace:
> [389370.881309]  <IRQ>
> [389370.881463]  ? rcu_dump_cpu_stacks+0xc8/0x100
> [389370.881755]  ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [389370.882077]  ? sched_slice+0x87/0x140
> [389370.882325]  ? perf_event_task_tick+0x64/0x370
> [389370.882629]  ? __cgroup_account_cputime_field+0x5b/0xa0
> [389370.882968]  ? update_process_times+0x77/0xb0
> [389370.883258]  ? tick_sched_handle+0x34/0x50
> [389370.883538]  ? tick_sched_timer+0x6f/0x80
> [389370.883802]  ? tick_sched_do_timer+0xa0/0xa0
> [389370.884084]  ? __hrtimer_run_queues+0x112/0x2b0
> [389370.884388]  ? hrtimer_interrupt+0xfe/0x220
> [389370.884666]  ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [389370.885006]  ? sysvec_apic_timer_interrupt+0x99/0xc0
> [389370.885338]  </IRQ>
> [389370.885488]  <TASK>
> [389370.885636]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [389370.885983]  ? xas_descend+0x26/0x70
> [389370.886224]  xas_load+0x30/0x40
> [389370.886456]  filemap_get_read_batch+0x16e/0x250
> [389370.886758]  filemap_get_pages+0xa9/0x630
> [389370.887026]  ? atime_needs_update+0x104/0x180
> [389370.887317]  ? touch_atime+0x46/0x1f0
> [389370.887573]  filemap_read+0xd2/0x340
> [389370.887817]  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [389370.888210]  xfs_file_read_iter+0x6a/0xd0 [xfs]
> [389370.888571]  vfs_read+0x23c/0x310
> [389370.888797]  __x64_sys_pread64+0x94/0xc0
> [389370.889058]  do_syscall_64+0x3a/0x90
> [389370.889299]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [389370.889634] RIP: 0033:0x7ff8de90e747
> [389370.889900] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [389370.891063] RSP: 002b:00007fff5206a848 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [389370.891549] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [389370.892003] RDX: 0000000000002000 RSI: 00007ff881c8ce80 RDI:
> 00000000000001c7
> [389370.892466] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [389370.892923] R10: 000000000f176000 R11: 0000000000000202 R12:
> 00007ff8cdd91278
> [389370.893386] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5e55828
> [389370.893843]  </TASK>
> 
> 
> At this point we have to restart the VM externally because it didn't respond
> properly any longer.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.


Liebe Grüße,
Christian Theune

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2023-06-20 15:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-19  8:29 [Bug 217572] New: Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13 bugzilla-daemon
2023-06-20 15:10 ` Christian Theune
2023-06-20 15:11   ` Christian Theune
2023-06-20 15:10 ` bugzilla-daemon [this message]
2023-06-20 15:13 ` [Bug 217572] " bugzilla-daemon
2023-06-20 15:21 ` bugzilla-daemon
2023-06-20 17:26 ` bugzilla-daemon
2023-07-03 14:10 ` bugzilla-daemon
2023-07-03 19:56 ` bugzilla-daemon
2023-07-03 22:30   ` Dave Chinner
2023-07-03 22:30 ` bugzilla-daemon
2023-07-04  4:22 ` bugzilla-daemon
2023-07-05 22:07 ` bugzilla-daemon
2023-09-28 12:39 ` bugzilla-daemon
2023-09-28 22:44   ` Dave Chinner
2023-09-28 13:06 ` bugzilla-daemon
2023-09-28 22:44 ` bugzilla-daemon
2023-09-29  4:54 ` bugzilla-daemon
2023-09-29  5:01 ` bugzilla-daemon
2023-10-05 14:31 ` bugzilla-daemon
2023-10-08 17:35 ` bugzilla-daemon
2023-10-08 22:13 ` bugzilla-daemon
2023-11-02 15:27 ` bugzilla-daemon
2023-11-02 20:58   ` Dave Chinner
2023-11-02 15:28 ` bugzilla-daemon
2023-11-02 15:29 ` bugzilla-daemon
2023-11-02 16:23 ` bugzilla-daemon
2023-11-02 20:59 ` bugzilla-daemon
2023-11-03 12:52 ` bugzilla-daemon
2023-11-07 10:11 ` bugzilla-daemon
2023-11-07 10:25 ` bugzilla-daemon
2023-11-07 14:12 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-217572-201763-CDEPKlpRnd@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.