All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING at asm/kfence.h:44 kfence_protect_page
@ 2021-06-14 17:02 ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-06-14 17:02 UTC (permalink / raw)
  To: Linux FS Devel, Btrfs BTRFS, Linux Memory Management List, Josef Bacik

5.12.9-300.fc34.x86_64

File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
workload is:
* bees doing dedup
* btrfs send
* digikam scanning files and writing image metadata to a sqlite database
* chrome doing some webby things

And then kaboom.

Call trace (easy to read, expires in 7 days):
https://pastebin.com/x5KRU23V

Same call trace (MUA mangled, for the archive):
[60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
[60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
k10temp rfkill i2c_piix4 fam15h_power
[60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
[60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
5.12.9-300.fc34.x86_64 #1
[60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
[60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
[60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
db 75 36 48
[60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
[60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
RCX: ffff9fb583453224
[60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
RDI: 0000000000000000
[60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000000000
[60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000002
[60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
R15: 0000000000000001
[60015.902408] kernel: FS:  00007f158e62d8c0(0000)
GS:ffff93bd37580000(0000) knlGS:0000000000000000
[60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
CR4: 00000000000506e0
[60015.902414] kernel: Call Trace:
[60015.902419] kernel:  kfence_unprotect+0x13/0x30
[60015.902423] kernel:  page_fault_oops+0x89/0x270
[60015.902427] kernel:  ? search_module_extables+0xf/0x40
[60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
[60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
[60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
[60015.902440] kernel:  exc_page_fault+0x67/0x150
[60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
[60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
[60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
09 c2 0f 88
[60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
[60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
RCX: 0000000000000000
[60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
RDI: 0000000000000039
[60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
R09: 0000000000000000
[60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
R12: 0000000000000001
[60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
R15: ffff93bc0c92b000
[60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
[60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
[60015.902476] kernel:  evict+0xd1/0x180
[60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
[60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
[60015.902487] kernel:  ? iput+0x1a0/0x1a0
[60015.902489] kernel:  ? iput+0x1a0/0x1a0
[60015.902491] kernel:  list_lru_walk_one+0x47/0x70
[60015.902495] kernel:  prune_icache_sb+0x39/0x50
[60015.902497] kernel:  super_cache_scan+0x161/0x1f0
[60015.902501] kernel:  do_shrink_slab+0x142/0x240
[60015.902505] kernel:  shrink_slab+0x164/0x280
[60015.902509] kernel:  shrink_node+0x2c8/0x6e0
[60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
[60015.902514] kernel:  try_to_free_pages+0xda/0x190
[60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
[60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
[60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
[60015.902528] kernel:  pipe_write+0x30b/0x5c0
[60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
[60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
[60015.902538] kernel:  __kernel_write+0x13a/0x2b0
[60015.902541] kernel:  kernel_write+0x73/0x150
[60015.902543] kernel:  send_cmd+0x7b/0xd0
[60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
[60015.902549] kernel:  process_extent+0x19b/0xed0
[60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
[60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
[60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
[60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
[60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
[60015.902564] kernel:  ? psi_task_change+0x84/0xc0
[60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
[60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
[60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
[60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
[60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
[60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
[60015.902588] kernel:  do_syscall_64+0x33/0x40
[60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
[60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
[60015.902599] kernel: Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff
5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10
00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 bd 0c 00 f7 d8
64 89 01 48
[60015.902602] kernel: RSP: 002b:00007ffcb2519bf8 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
[60015.902605] kernel: RAX: ffffffffffffffda RBX: 00007ffcb251ae00
RCX: 00007f158e38f0ab
[60015.902607] kernel: RDX: 00007ffcb2519cf0 RSI: 0000000040489426
RDI: 0000000000000004
[60015.902608] kernel: RBP: 0000000000000004 R08: 00007f158e297640
R09: 00007f158e297640
[60015.902610] kernel: R10: 0000000000000008 R11: 0000000000000246
R12: 0000000000000000
[60015.902612] kernel: R13: 0000000000000002 R14: 00007ffcb251aee0
R15: 0000558c1a83e2a0
[60015.902615] kernel: ---[ end trace 7bbc33e23bb887ae ]---


The full dmesg is attached to this downstream bug as "
journal-nobees.log (4.12 MB, text/plain)" which has all the beesd
entries removed (over 300M).
https://bugzilla.redhat.com/show_bug.cgi?id=1971327

The time between this oops and systemd-oomd starting to kill things
due to both swap and memory pressure is 81 minutes. I'm not sure the
two events are related. Or what the above stack trace is saying. Or
the consequences of it. But the system got wedged in totally
unresponsive and had to be forced off with the power button.

Thanks,


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* WARNING at asm/kfence.h:44 kfence_protect_page
@ 2021-06-14 17:02 ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-06-14 17:02 UTC (permalink / raw)
  To: Linux FS Devel, Btrfs BTRFS, Linux Memory Management List, Josef Bacik

5.12.9-300.fc34.x86_64

File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
workload is:
* bees doing dedup
* btrfs send
* digikam scanning files and writing image metadata to a sqlite database
* chrome doing some webby things

And then kaboom.

Call trace (easy to read, expires in 7 days):
https://pastebin.com/x5KRU23V

Same call trace (MUA mangled, for the archive):
[60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
[60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
k10temp rfkill i2c_piix4 fam15h_power
[60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
[60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
5.12.9-300.fc34.x86_64 #1
[60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
[60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
[60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
db 75 36 48
[60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
[60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
RCX: ffff9fb583453224
[60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
RDI: 0000000000000000
[60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000000000
[60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000002
[60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
R15: 0000000000000001
[60015.902408] kernel: FS:  00007f158e62d8c0(0000)
GS:ffff93bd37580000(0000) knlGS:0000000000000000
[60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
CR4: 00000000000506e0
[60015.902414] kernel: Call Trace:
[60015.902419] kernel:  kfence_unprotect+0x13/0x30
[60015.902423] kernel:  page_fault_oops+0x89/0x270
[60015.902427] kernel:  ? search_module_extables+0xf/0x40
[60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
[60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
[60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
[60015.902440] kernel:  exc_page_fault+0x67/0x150
[60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
[60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
[60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
09 c2 0f 88
[60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
[60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
RCX: 0000000000000000
[60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
RDI: 0000000000000039
[60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
R09: 0000000000000000
[60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
R12: 0000000000000001
[60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
R15: ffff93bc0c92b000
[60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
[60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
[60015.902476] kernel:  evict+0xd1/0x180
[60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
[60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
[60015.902487] kernel:  ? iput+0x1a0/0x1a0
[60015.902489] kernel:  ? iput+0x1a0/0x1a0
[60015.902491] kernel:  list_lru_walk_one+0x47/0x70
[60015.902495] kernel:  prune_icache_sb+0x39/0x50
[60015.902497] kernel:  super_cache_scan+0x161/0x1f0
[60015.902501] kernel:  do_shrink_slab+0x142/0x240
[60015.902505] kernel:  shrink_slab+0x164/0x280
[60015.902509] kernel:  shrink_node+0x2c8/0x6e0
[60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
[60015.902514] kernel:  try_to_free_pages+0xda/0x190
[60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
[60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
[60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
[60015.902528] kernel:  pipe_write+0x30b/0x5c0
[60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
[60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
[60015.902538] kernel:  __kernel_write+0x13a/0x2b0
[60015.902541] kernel:  kernel_write+0x73/0x150
[60015.902543] kernel:  send_cmd+0x7b/0xd0
[60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
[60015.902549] kernel:  process_extent+0x19b/0xed0
[60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
[60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
[60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
[60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
[60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
[60015.902564] kernel:  ? psi_task_change+0x84/0xc0
[60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
[60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
[60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
[60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
[60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
[60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
[60015.902588] kernel:  do_syscall_64+0x33/0x40
[60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
[60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
[60015.902599] kernel: Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff
5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10
00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 bd 0c 00 f7 d8
64 89 01 48
[60015.902602] kernel: RSP: 002b:00007ffcb2519bf8 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
[60015.902605] kernel: RAX: ffffffffffffffda RBX: 00007ffcb251ae00
RCX: 00007f158e38f0ab
[60015.902607] kernel: RDX: 00007ffcb2519cf0 RSI: 0000000040489426
RDI: 0000000000000004
[60015.902608] kernel: RBP: 0000000000000004 R08: 00007f158e297640
R09: 00007f158e297640
[60015.902610] kernel: R10: 0000000000000008 R11: 0000000000000246
R12: 0000000000000000
[60015.902612] kernel: R13: 0000000000000002 R14: 00007ffcb251aee0
R15: 0000558c1a83e2a0
[60015.902615] kernel: ---[ end trace 7bbc33e23bb887ae ]---


The full dmesg is attached to this downstream bug as "
journal-nobees.log (4.12 MB, text/plain)" which has all the beesd
entries removed (over 300M).
https://bugzilla.redhat.com/show_bug.cgi?id=1971327

The time between this oops and systemd-oomd starting to kill things
due to both swap and memory pressure is 81 minutes. I'm not sure the
two events are related. Or what the above stack trace is saying. Or
the consequences of it. But the system got wedged in totally
unresponsive and had to be forced off with the power button.

Thanks,


-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
  2021-06-14 17:02 ` Chris Murphy
@ 2021-06-14 17:50   ` Filipe Manana
  -1 siblings, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2021-06-14 17:50 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Linux FS Devel, Btrfs BTRFS, Linux Memory Management List, Josef Bacik

[-- Attachment #1: Type: text/plain, Size: 9379 bytes --]

On Mon, Jun 14, 2021 at 6:05 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> 5.12.9-300.fc34.x86_64
>
> File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
> workload is:
> * bees doing dedup
> * btrfs send
> * digikam scanning files and writing image metadata to a sqlite database
> * chrome doing some webby things
>
> And then kaboom.
>
> Call trace (easy to read, expires in 7 days):
> https://pastebin.com/x5KRU23V
>
> Same call trace (MUA mangled, for the archive):
> [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
> arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
> [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> 8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
> iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
> edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
> snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
> snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
> snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
> snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
> k10temp rfkill i2c_piix4 fam15h_power
> [60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
> acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
> crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
> i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
> [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
> 5.12.9-300.fc34.x86_64 #1
> [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
> be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
> [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
> [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
> 89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
> 24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
> db 75 36 48
> [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
> [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
> RCX: ffff9fb583453224
> [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
> RDI: 0000000000000000
> [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
> R09: 0000000000000000
> [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
> R12: 0000000000000002
> [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
> R15: 0000000000000001
> [60015.902408] kernel: FS:  00007f158e62d8c0(0000)
> GS:ffff93bd37580000(0000) knlGS:0000000000000000
> [60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
> CR4: 00000000000506e0
> [60015.902414] kernel: Call Trace:
> [60015.902419] kernel:  kfence_unprotect+0x13/0x30
> [60015.902423] kernel:  page_fault_oops+0x89/0x270
> [60015.902427] kernel:  ? search_module_extables+0xf/0x40
> [60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
> [60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
> [60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
> [60015.902440] kernel:  exc_page_fault+0x67/0x150
> [60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
> [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
> [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
> 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
> 49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
> 09 c2 0f 88
> [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
> [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
> RCX: 0000000000000000
> [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
> RDI: 0000000000000039
> [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
> R09: 0000000000000000
> [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
> R12: 0000000000000001
> [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
> R15: ffff93bc0c92b000
> [60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
> [60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
> [60015.902476] kernel:  evict+0xd1/0x180
> [60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
> [60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
> [60015.902487] kernel:  ? iput+0x1a0/0x1a0
> [60015.902489] kernel:  ? iput+0x1a0/0x1a0
> [60015.902491] kernel:  list_lru_walk_one+0x47/0x70
> [60015.902495] kernel:  prune_icache_sb+0x39/0x50
> [60015.902497] kernel:  super_cache_scan+0x161/0x1f0
> [60015.902501] kernel:  do_shrink_slab+0x142/0x240
> [60015.902505] kernel:  shrink_slab+0x164/0x280
> [60015.902509] kernel:  shrink_node+0x2c8/0x6e0
> [60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
> [60015.902514] kernel:  try_to_free_pages+0xda/0x190
> [60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
> [60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
> [60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
> [60015.902528] kernel:  pipe_write+0x30b/0x5c0
> [60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
> [60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
> [60015.902538] kernel:  __kernel_write+0x13a/0x2b0
> [60015.902541] kernel:  kernel_write+0x73/0x150
> [60015.902543] kernel:  send_cmd+0x7b/0xd0
> [60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
> [60015.902549] kernel:  process_extent+0x19b/0xed0
> [60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
> [60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
> [60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
> [60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
> [60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
> [60015.902564] kernel:  ? psi_task_change+0x84/0xc0
> [60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
> [60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
> [60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
> [60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
> [60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
> [60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
> [60015.902588] kernel:  do_syscall_64+0x33/0x40
> [60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab

The problem is we end up trying to start a transaction while in a send
context, which is unexpected. But it does happen if memory allocation
triggers inode eviction due to memory pressure.

You don't have CONFIG_BTRFS_ASSERT=y in your kernel config right?
That would trigger an assertion right away:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584

With assertions disabled, we just cast current->journal_info into a
transaction handle and dereference it, which is obviously wrong since
->jornal_info has a value that is not a pointer to a transaction
handle, resulting in the crash.

How easy is it to reproduce for you?

Can you try the following patch and see if everything works as expected?

https://pastebin.com/raw/F4giWLVt
(also attached)

Thanks.

> [60015.902599] kernel: Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff
> 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10
> 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 bd 0c 00 f7 d8
> 64 89 01 48
> [60015.902602] kernel: RSP: 002b:00007ffcb2519bf8 EFLAGS: 00000246
> ORIG_RAX: 0000000000000010
> [60015.902605] kernel: RAX: ffffffffffffffda RBX: 00007ffcb251ae00
> RCX: 00007f158e38f0ab
> [60015.902607] kernel: RDX: 00007ffcb2519cf0 RSI: 0000000040489426
> RDI: 0000000000000004
> [60015.902608] kernel: RBP: 0000000000000004 R08: 00007f158e297640
> R09: 00007f158e297640
> [60015.902610] kernel: R10: 0000000000000008 R11: 0000000000000246
> R12: 0000000000000000
> [60015.902612] kernel: R13: 0000000000000002 R14: 00007ffcb251aee0
> R15: 0000558c1a83e2a0
> [60015.902615] kernel: ---[ end trace 7bbc33e23bb887ae ]---
>
>
> The full dmesg is attached to this downstream bug as "
> journal-nobees.log (4.12 MB, text/plain)" which has all the beesd
> entries removed (over 300M).
> https://bugzilla.redhat.com/show_bug.cgi?id=1971327
>
> The time between this oops and systemd-oomd starting to kill things
> due to both swap and memory pressure is 81 minutes. I'm not sure the
> two events are related. Or what the above stack trace is saying. Or
> the consequences of it. But the system got wedged in totally
> unresponsive and had to be forced off with the power button.
>
> Thanks,
>
>
> --
> Chris Murphy



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

[-- Attachment #2: send_evict_fix.patch --]
[-- Type: text/x-patch, Size: 1026 bytes --]

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3cd5286572d4..02bf2fbe3e84 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5427,18 +5427,20 @@ static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
 
 void btrfs_evict_inode(struct inode *inode)
 {
+	const bool in_send = (current->journal_info == BTRFS_SEND_TRANS_STUB);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_block_rsv *rsv;
 	int ret;
 
+	if (in_send)
+		current->journal_info = NULL;
+
 	trace_btrfs_inode_evict(inode);
 
-	if (!root) {
-		clear_inode(inode);
-		return;
-	}
+	if (!root)
+		goto out;
 
 	evict_inode_truncate_pages(inode);
 
@@ -5518,7 +5520,11 @@ void btrfs_evict_inode(struct inode *inode)
 	 * to retry these periodically in the future.
 	 */
 	btrfs_remove_delayed_node(BTRFS_I(inode));
+out:
 	clear_inode(inode);
+	if (in_send)
+		current->journal_info = BTRFS_SEND_TRANS_STUB;
+
 }
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
@ 2021-06-14 17:50   ` Filipe Manana
  0 siblings, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2021-06-14 17:50 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Linux FS Devel, Btrfs BTRFS, Linux Memory Management List, Josef Bacik

[-- Attachment #1: Type: text/plain, Size: 9379 bytes --]

On Mon, Jun 14, 2021 at 6:05 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> 5.12.9-300.fc34.x86_64
>
> File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
> workload is:
> * bees doing dedup
> * btrfs send
> * digikam scanning files and writing image metadata to a sqlite database
> * chrome doing some webby things
>
> And then kaboom.
>
> Call trace (easy to read, expires in 7 days):
> https://pastebin.com/x5KRU23V
>
> Same call trace (MUA mangled, for the archive):
> [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
> arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
> [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> 8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
> iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
> edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
> snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
> snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
> snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
> snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
> k10temp rfkill i2c_piix4 fam15h_power
> [60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
> acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
> crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
> i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
> [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
> 5.12.9-300.fc34.x86_64 #1
> [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
> be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
> [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
> [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
> 89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
> 24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
> db 75 36 48
> [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
> [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
> RCX: ffff9fb583453224
> [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
> RDI: 0000000000000000
> [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
> R09: 0000000000000000
> [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
> R12: 0000000000000002
> [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
> R15: 0000000000000001
> [60015.902408] kernel: FS:  00007f158e62d8c0(0000)
> GS:ffff93bd37580000(0000) knlGS:0000000000000000
> [60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
> CR4: 00000000000506e0
> [60015.902414] kernel: Call Trace:
> [60015.902419] kernel:  kfence_unprotect+0x13/0x30
> [60015.902423] kernel:  page_fault_oops+0x89/0x270
> [60015.902427] kernel:  ? search_module_extables+0xf/0x40
> [60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
> [60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
> [60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
> [60015.902440] kernel:  exc_page_fault+0x67/0x150
> [60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
> [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
> [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
> 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
> 49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
> 09 c2 0f 88
> [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
> [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
> RCX: 0000000000000000
> [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
> RDI: 0000000000000039
> [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
> R09: 0000000000000000
> [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
> R12: 0000000000000001
> [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
> R15: ffff93bc0c92b000
> [60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
> [60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
> [60015.902476] kernel:  evict+0xd1/0x180
> [60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
> [60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
> [60015.902487] kernel:  ? iput+0x1a0/0x1a0
> [60015.902489] kernel:  ? iput+0x1a0/0x1a0
> [60015.902491] kernel:  list_lru_walk_one+0x47/0x70
> [60015.902495] kernel:  prune_icache_sb+0x39/0x50
> [60015.902497] kernel:  super_cache_scan+0x161/0x1f0
> [60015.902501] kernel:  do_shrink_slab+0x142/0x240
> [60015.902505] kernel:  shrink_slab+0x164/0x280
> [60015.902509] kernel:  shrink_node+0x2c8/0x6e0
> [60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
> [60015.902514] kernel:  try_to_free_pages+0xda/0x190
> [60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
> [60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
> [60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
> [60015.902528] kernel:  pipe_write+0x30b/0x5c0
> [60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
> [60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
> [60015.902538] kernel:  __kernel_write+0x13a/0x2b0
> [60015.902541] kernel:  kernel_write+0x73/0x150
> [60015.902543] kernel:  send_cmd+0x7b/0xd0
> [60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
> [60015.902549] kernel:  process_extent+0x19b/0xed0
> [60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
> [60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
> [60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
> [60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
> [60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
> [60015.902564] kernel:  ? psi_task_change+0x84/0xc0
> [60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
> [60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
> [60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
> [60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
> [60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
> [60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
> [60015.902588] kernel:  do_syscall_64+0x33/0x40
> [60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab

The problem is we end up trying to start a transaction while in a send
context, which is unexpected. But it does happen if memory allocation
triggers inode eviction due to memory pressure.

You don't have CONFIG_BTRFS_ASSERT=y in your kernel config right?
That would trigger an assertion right away:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584

With assertions disabled, we just cast current->journal_info into a
transaction handle and dereference it, which is obviously wrong since
->jornal_info has a value that is not a pointer to a transaction
handle, resulting in the crash.

How easy is it to reproduce for you?

Can you try the following patch and see if everything works as expected?

https://pastebin.com/raw/F4giWLVt
(also attached)

Thanks.

> [60015.902599] kernel: Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff
> 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10
> 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 95 bd 0c 00 f7 d8
> 64 89 01 48
> [60015.902602] kernel: RSP: 002b:00007ffcb2519bf8 EFLAGS: 00000246
> ORIG_RAX: 0000000000000010
> [60015.902605] kernel: RAX: ffffffffffffffda RBX: 00007ffcb251ae00
> RCX: 00007f158e38f0ab
> [60015.902607] kernel: RDX: 00007ffcb2519cf0 RSI: 0000000040489426
> RDI: 0000000000000004
> [60015.902608] kernel: RBP: 0000000000000004 R08: 00007f158e297640
> R09: 00007f158e297640
> [60015.902610] kernel: R10: 0000000000000008 R11: 0000000000000246
> R12: 0000000000000000
> [60015.902612] kernel: R13: 0000000000000002 R14: 00007ffcb251aee0
> R15: 0000558c1a83e2a0
> [60015.902615] kernel: ---[ end trace 7bbc33e23bb887ae ]---
>
>
> The full dmesg is attached to this downstream bug as "
> journal-nobees.log (4.12 MB, text/plain)" which has all the beesd
> entries removed (over 300M).
> https://bugzilla.redhat.com/show_bug.cgi?id=1971327
>
> The time between this oops and systemd-oomd starting to kill things
> due to both swap and memory pressure is 81 minutes. I'm not sure the
> two events are related. Or what the above stack trace is saying. Or
> the consequences of it. But the system got wedged in totally
> unresponsive and had to be forced off with the power button.
>
> Thanks,
>
>
> --
> Chris Murphy



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

[-- Attachment #2: send_evict_fix.patch --]
[-- Type: text/x-patch, Size: 1026 bytes --]

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3cd5286572d4..02bf2fbe3e84 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5427,18 +5427,20 @@ static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
 
 void btrfs_evict_inode(struct inode *inode)
 {
+	const bool in_send = (current->journal_info == BTRFS_SEND_TRANS_STUB);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_block_rsv *rsv;
 	int ret;
 
+	if (in_send)
+		current->journal_info = NULL;
+
 	trace_btrfs_inode_evict(inode);
 
-	if (!root) {
-		clear_inode(inode);
-		return;
-	}
+	if (!root)
+		goto out;
 
 	evict_inode_truncate_pages(inode);
 
@@ -5518,7 +5520,11 @@ void btrfs_evict_inode(struct inode *inode)
 	 * to retry these periodically in the future.
 	 */
 	btrfs_remove_delayed_node(BTRFS_I(inode));
+out:
 	clear_inode(inode);
+	if (in_send)
+		current->journal_info = BTRFS_SEND_TRANS_STUB;
+
 }
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
  2021-06-14 17:50   ` Filipe Manana
@ 2021-06-14 20:07     ` Chris Murphy
  -1 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-06-14 20:07 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Chris Murphy, Linux FS Devel, Btrfs BTRFS,
	Linux Memory Management List, Josef Bacik

On Mon, Jun 14, 2021 at 11:50 AM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 6:05 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > 5.12.9-300.fc34.x86_64
> >
> > File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
> > workload is:
> > * bees doing dedup
> > * btrfs send
> > * digikam scanning files and writing image metadata to a sqlite database
> > * chrome doing some webby things
> >
> > And then kaboom.
> >
> > Call trace (easy to read, expires in 7 days):
> > https://pastebin.com/x5KRU23V
> >
> > Same call trace (MUA mangled, for the archive):
> > [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
> > arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
> > [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
> > snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> > nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
> > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > 8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> > nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
> > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
> > iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
> > edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
> > snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
> > snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
> > snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
> > snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
> > k10temp rfkill i2c_piix4 fam15h_power
> > [60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
> > acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
> > crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
> > i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
> > [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
> > 5.12.9-300.fc34.x86_64 #1
> > [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
> > be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
> > [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
> > [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
> > 89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
> > 24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
> > db 75 36 48
> > [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
> > [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
> > RCX: ffff9fb583453224
> > [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
> > RDI: 0000000000000000
> > [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
> > R09: 0000000000000000
> > [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
> > R12: 0000000000000002
> > [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
> > R15: 0000000000000001
> > [60015.902408] kernel: FS:  00007f158e62d8c0(0000)
> > GS:ffff93bd37580000(0000) knlGS:0000000000000000
> > [60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
> > CR4: 00000000000506e0
> > [60015.902414] kernel: Call Trace:
> > [60015.902419] kernel:  kfence_unprotect+0x13/0x30
> > [60015.902423] kernel:  page_fault_oops+0x89/0x270
> > [60015.902427] kernel:  ? search_module_extables+0xf/0x40
> > [60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
> > [60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
> > [60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
> > [60015.902440] kernel:  exc_page_fault+0x67/0x150
> > [60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
> > [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
> > [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
> > 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
> > 49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
> > 09 c2 0f 88
> > [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
> > [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
> > RCX: 0000000000000000
> > [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
> > RDI: 0000000000000039
> > [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
> > R09: 0000000000000000
> > [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
> > R12: 0000000000000001
> > [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
> > R15: ffff93bc0c92b000
> > [60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
> > [60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
> > [60015.902476] kernel:  evict+0xd1/0x180
> > [60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
> > [60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
> > [60015.902487] kernel:  ? iput+0x1a0/0x1a0
> > [60015.902489] kernel:  ? iput+0x1a0/0x1a0
> > [60015.902491] kernel:  list_lru_walk_one+0x47/0x70
> > [60015.902495] kernel:  prune_icache_sb+0x39/0x50
> > [60015.902497] kernel:  super_cache_scan+0x161/0x1f0
> > [60015.902501] kernel:  do_shrink_slab+0x142/0x240
> > [60015.902505] kernel:  shrink_slab+0x164/0x280
> > [60015.902509] kernel:  shrink_node+0x2c8/0x6e0
> > [60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
> > [60015.902514] kernel:  try_to_free_pages+0xda/0x190
> > [60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
> > [60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
> > [60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
> > [60015.902528] kernel:  pipe_write+0x30b/0x5c0
> > [60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
> > [60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
> > [60015.902538] kernel:  __kernel_write+0x13a/0x2b0
> > [60015.902541] kernel:  kernel_write+0x73/0x150
> > [60015.902543] kernel:  send_cmd+0x7b/0xd0
> > [60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
> > [60015.902549] kernel:  process_extent+0x19b/0xed0
> > [60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
> > [60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
> > [60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
> > [60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
> > [60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
> > [60015.902564] kernel:  ? psi_task_change+0x84/0xc0
> > [60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
> > [60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
> > [60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
> > [60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
> > [60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
> > [60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
> > [60015.902588] kernel:  do_syscall_64+0x33/0x40
> > [60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
>
> The problem is we end up trying to start a transaction while in a send
> context, which is unexpected. But it does happen if memory allocation
> triggers inode eviction due to memory pressure.
>
> You don't have CONFIG_BTRFS_ASSERT=y in your kernel config right?

No, but Fedora debug kernels do have it enabled so I can ask the
reporter to use 5.12.10-debug or 5.13-rc6. Any preference?


> That would trigger an assertion right away:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
>
> With assertions disabled, we just cast current->journal_info into a
> transaction handle and dereference it, which is obviously wrong since
> ->jornal_info has a value that is not a pointer to a transaction
> handle, resulting in the crash.
>
> How easy is it to reproduce for you?

I'm not sure.

Do you think following this splat we could have somehow been IO
limited or stalled without any other kernel messages appearing? That
would then cause reclaim activity, and high memory and swap usage? I'm
wondering if it's likely or unlikely, because at the moment I think
it's unlikely since /var/log/journal was successfully recording
systemd-journald journals for the 81 minutes between this call trace
and when systemd-oomd started killing things. I'm trying to separate
out what's causing what.

> Can you try the following patch and see if everything works as expected?

I'll ask the reporter. I don't have a setup even remotely approximate
to this one for testing.

Thanks!

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
@ 2021-06-14 20:07     ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-06-14 20:07 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Chris Murphy, Linux FS Devel, Btrfs BTRFS,
	Linux Memory Management List, Josef Bacik

On Mon, Jun 14, 2021 at 11:50 AM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 6:05 PM Chris Murphy <lists@colorremedies.com> wrote:
> >
> > 5.12.9-300.fc34.x86_64
> >
> > File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
> > workload is:
> > * bees doing dedup
> > * btrfs send
> > * digikam scanning files and writing image metadata to a sqlite database
> > * chrome doing some webby things
> >
> > And then kaboom.
> >
> > Call trace (easy to read, expires in 7 days):
> > https://pastebin.com/x5KRU23V
> >
> > Same call trace (MUA mangled, for the archive):
> > [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
> > arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
> > [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
> > snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> > nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
> > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > 8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> > nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
> > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
> > iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
> > edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
> > snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
> > snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
> > snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
> > snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
> > k10temp rfkill i2c_piix4 fam15h_power
> > [60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
> > acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
> > crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
> > i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
> > [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
> > 5.12.9-300.fc34.x86_64 #1
> > [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
> > be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
> > [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
> > [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
> > 89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
> > 24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
> > db 75 36 48
> > [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
> > [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
> > RCX: ffff9fb583453224
> > [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
> > RDI: 0000000000000000
> > [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
> > R09: 0000000000000000
> > [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
> > R12: 0000000000000002
> > [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
> > R15: 0000000000000001
> > [60015.902408] kernel: FS:  00007f158e62d8c0(0000)
> > GS:ffff93bd37580000(0000) knlGS:0000000000000000
> > [60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
> > CR4: 00000000000506e0
> > [60015.902414] kernel: Call Trace:
> > [60015.902419] kernel:  kfence_unprotect+0x13/0x30
> > [60015.902423] kernel:  page_fault_oops+0x89/0x270
> > [60015.902427] kernel:  ? search_module_extables+0xf/0x40
> > [60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
> > [60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
> > [60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
> > [60015.902440] kernel:  exc_page_fault+0x67/0x150
> > [60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
> > [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
> > [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
> > 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
> > 49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
> > 09 c2 0f 88
> > [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
> > [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
> > RCX: 0000000000000000
> > [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
> > RDI: 0000000000000039
> > [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
> > R09: 0000000000000000
> > [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
> > R12: 0000000000000001
> > [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
> > R15: ffff93bc0c92b000
> > [60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
> > [60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
> > [60015.902476] kernel:  evict+0xd1/0x180
> > [60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
> > [60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
> > [60015.902487] kernel:  ? iput+0x1a0/0x1a0
> > [60015.902489] kernel:  ? iput+0x1a0/0x1a0
> > [60015.902491] kernel:  list_lru_walk_one+0x47/0x70
> > [60015.902495] kernel:  prune_icache_sb+0x39/0x50
> > [60015.902497] kernel:  super_cache_scan+0x161/0x1f0
> > [60015.902501] kernel:  do_shrink_slab+0x142/0x240
> > [60015.902505] kernel:  shrink_slab+0x164/0x280
> > [60015.902509] kernel:  shrink_node+0x2c8/0x6e0
> > [60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
> > [60015.902514] kernel:  try_to_free_pages+0xda/0x190
> > [60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
> > [60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
> > [60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
> > [60015.902528] kernel:  pipe_write+0x30b/0x5c0
> > [60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
> > [60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
> > [60015.902538] kernel:  __kernel_write+0x13a/0x2b0
> > [60015.902541] kernel:  kernel_write+0x73/0x150
> > [60015.902543] kernel:  send_cmd+0x7b/0xd0
> > [60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
> > [60015.902549] kernel:  process_extent+0x19b/0xed0
> > [60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
> > [60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
> > [60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
> > [60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
> > [60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
> > [60015.902564] kernel:  ? psi_task_change+0x84/0xc0
> > [60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
> > [60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
> > [60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
> > [60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
> > [60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
> > [60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
> > [60015.902588] kernel:  do_syscall_64+0x33/0x40
> > [60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
>
> The problem is we end up trying to start a transaction while in a send
> context, which is unexpected. But it does happen if memory allocation
> triggers inode eviction due to memory pressure.
>
> You don't have CONFIG_BTRFS_ASSERT=y in your kernel config right?

No, but Fedora debug kernels do have it enabled so I can ask the
reporter to use 5.12.10-debug or 5.13-rc6. Any preference?


> That would trigger an assertion right away:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
>
> With assertions disabled, we just cast current->journal_info into a
> transaction handle and dereference it, which is obviously wrong since
> ->jornal_info has a value that is not a pointer to a transaction
> handle, resulting in the crash.
>
> How easy is it to reproduce for you?

I'm not sure.

Do you think following this splat we could have somehow been IO
limited or stalled without any other kernel messages appearing? That
would then cause reclaim activity, and high memory and swap usage? I'm
wondering if it's likely or unlikely, because at the moment I think
it's unlikely since /var/log/journal was successfully recording
systemd-journald journals for the 81 minutes between this call trace
and when systemd-oomd started killing things. I'm trying to separate
out what's causing what.

> Can you try the following patch and see if everything works as expected?

I'll ask the reporter. I don't have a setup even remotely approximate
to this one for testing.

Thanks!

-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
  2021-06-14 20:07     ` Chris Murphy
@ 2021-06-14 20:42       ` Filipe Manana
  -1 siblings, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2021-06-14 20:42 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Linux FS Devel, Btrfs BTRFS, Linux Memory Management List, Josef Bacik

[-- Attachment #1: Type: text/plain, Size: 10353 bytes --]

On Mon, Jun 14, 2021 at 9:07 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Mon, Jun 14, 2021 at 11:50 AM Filipe Manana <fdmanana@gmail.com> wrote:
> >
> > On Mon, Jun 14, 2021 at 6:05 PM Chris Murphy <lists@colorremedies.com> wrote:
> > >
> > > 5.12.9-300.fc34.x86_64
> > >
> > > File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
> > > workload is:
> > > * bees doing dedup
> > > * btrfs send
> > > * digikam scanning files and writing image metadata to a sqlite database
> > > * chrome doing some webby things
> > >
> > > And then kaboom.
> > >
> > > Call trace (easy to read, expires in 7 days):
> > > https://pastebin.com/x5KRU23V
> > >
> > > Same call trace (MUA mangled, for the archive):
> > > [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
> > > arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
> > > [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
> > > snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> > > nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
> > > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > > 8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> > > nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
> > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > > iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
> > > iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
> > > edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
> > > snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
> > > snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
> > > snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
> > > snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
> > > k10temp rfkill i2c_piix4 fam15h_power
> > > [60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
> > > acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
> > > crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
> > > i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
> > > [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
> > > 5.12.9-300.fc34.x86_64 #1
> > > [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
> > > be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
> > > [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
> > > [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
> > > 89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
> > > 24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
> > > db 75 36 48
> > > [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
> > > [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
> > > RCX: ffff9fb583453224
> > > [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
> > > RDI: 0000000000000000
> > > [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
> > > R09: 0000000000000000
> > > [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
> > > R12: 0000000000000002
> > > [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
> > > R15: 0000000000000001
> > > [60015.902408] kernel: FS:  00007f158e62d8c0(0000)
> > > GS:ffff93bd37580000(0000) knlGS:0000000000000000
> > > [60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
> > > CR4: 00000000000506e0
> > > [60015.902414] kernel: Call Trace:
> > > [60015.902419] kernel:  kfence_unprotect+0x13/0x30
> > > [60015.902423] kernel:  page_fault_oops+0x89/0x270
> > > [60015.902427] kernel:  ? search_module_extables+0xf/0x40
> > > [60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
> > > [60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
> > > [60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
> > > [60015.902440] kernel:  exc_page_fault+0x67/0x150
> > > [60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
> > > [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
> > > [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
> > > 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
> > > 49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
> > > 09 c2 0f 88
> > > [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
> > > [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
> > > RCX: 0000000000000000
> > > [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
> > > RDI: 0000000000000039
> > > [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
> > > R09: 0000000000000000
> > > [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
> > > R12: 0000000000000001
> > > [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
> > > R15: ffff93bc0c92b000
> > > [60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
> > > [60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
> > > [60015.902476] kernel:  evict+0xd1/0x180
> > > [60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
> > > [60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
> > > [60015.902487] kernel:  ? iput+0x1a0/0x1a0
> > > [60015.902489] kernel:  ? iput+0x1a0/0x1a0
> > > [60015.902491] kernel:  list_lru_walk_one+0x47/0x70
> > > [60015.902495] kernel:  prune_icache_sb+0x39/0x50
> > > [60015.902497] kernel:  super_cache_scan+0x161/0x1f0
> > > [60015.902501] kernel:  do_shrink_slab+0x142/0x240
> > > [60015.902505] kernel:  shrink_slab+0x164/0x280
> > > [60015.902509] kernel:  shrink_node+0x2c8/0x6e0
> > > [60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
> > > [60015.902514] kernel:  try_to_free_pages+0xda/0x190
> > > [60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
> > > [60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
> > > [60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
> > > [60015.902528] kernel:  pipe_write+0x30b/0x5c0
> > > [60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
> > > [60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
> > > [60015.902538] kernel:  __kernel_write+0x13a/0x2b0
> > > [60015.902541] kernel:  kernel_write+0x73/0x150
> > > [60015.902543] kernel:  send_cmd+0x7b/0xd0
> > > [60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
> > > [60015.902549] kernel:  process_extent+0x19b/0xed0
> > > [60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
> > > [60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
> > > [60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
> > > [60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
> > > [60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
> > > [60015.902564] kernel:  ? psi_task_change+0x84/0xc0
> > > [60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
> > > [60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
> > > [60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
> > > [60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
> > > [60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
> > > [60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
> > > [60015.902588] kernel:  do_syscall_64+0x33/0x40
> > > [60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
> >
> > The problem is we end up trying to start a transaction while in a send
> > context, which is unexpected. But it does happen if memory allocation
> > triggers inode eviction due to memory pressure.
> >
> > You don't have CONFIG_BTRFS_ASSERT=y in your kernel config right?
>
> No, but Fedora debug kernels do have it enabled so I can ask the
> reporter to use 5.12.10-debug or 5.13-rc6. Any preference?

No preference.
It's a problem that has been around for several years now, and not
something recent.

Ok, so it was reported before. Anywhere public, or was it something private?

>
>
> > That would trigger an assertion right away:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
> >
> > With assertions disabled, we just cast current->journal_info into a
> > transaction handle and dereference it, which is obviously wrong since
> > ->jornal_info has a value that is not a pointer to a transaction
> > handle, resulting in the crash.
> >
> > How easy is it to reproduce for you?
>
> I'm not sure.
>
> Do you think following this splat we could have somehow been IO
> limited or stalled without any other kernel messages appearing? That
> would then cause reclaim activity, and high memory and swap usage? I'm
> wondering if it's likely or unlikely, because at the moment I think
> it's unlikely since /var/log/journal was successfully recording
> systemd-journald journals for the 81 minutes between this call trace
> and when systemd-oomd started killing things. I'm trying to separate
> out what's causing what.

Don't know about that. And honestly I don't think that it matters.
The send task writes to a pipe and writing to a pipe allocates pages
without GFP_NOFS (with GFP_HIGHUSER | __GFP_ACCOUNT to be precise),
meaning we can end up evicting an inode, which in turn needs to start
a transaction.

That patch would be my preferred way to fix it.
If it doesn't work for any reason, then we can simply set up a NOFS
context surrounding the calls in send to write to the pipe, like this:

https://pastebin.com/raw/tTrfAw0m
(also attached)

I would have to test the first patch in a scenario under heavy memory
pressure to see if there isn't anything I might have missed, but I
think it should work just fine.
I'll test it tomorrow too.

Thanks.

>
> > Can you try the following patch and see if everything works as expected?
>
> I'll ask the reporter. I don't have a setup even remotely approximate
> to this one for testing.
>
> Thanks!
>
> --
> Chris Murphy



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

[-- Attachment #2: send_nofs_pipe_write.patch --]
[-- Type: text/x-patch, Size: 1053 bytes --]

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index a2b3c594379d..d8667c07c4eb 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -548,24 +548,32 @@ static struct btrfs_path *alloc_path_for_send(void)
 
 static int write_buf(struct file *filp, const void *buf, u32 len, loff_t *off)
 {
-	int ret;
+	int ret = 0;
 	u32 pos = 0;
+	unsigned nofs_flag;
 
+	/*
+	 * Pipe writes can allocate pages which can trigger reclaim and cause
+	 * a crash if reclaim triggers inode eviction and that needs to start a
+	 * transaction. So use a NOFS context.
+	 */
+	nofs_flag = memalloc_nofs_save();
 	while (pos < len) {
 		ret = kernel_write(filp, buf + pos, len - pos, off);
 		/* TODO handle that correctly */
 		/*if (ret == -ERESTARTSYS) {
 			continue;
 		}*/
+		if (ret == 0)
+			ret = -EIO;
 		if (ret < 0)
-			return ret;
-		if (ret == 0) {
-			return -EIO;
-		}
+			break;
 		pos += ret;
+		ret = 0;
 	}
+	memalloc_nofs_restore(nofs_flag);
 
-	return 0;
+	return ret;
 }
 
 static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
@ 2021-06-14 20:42       ` Filipe Manana
  0 siblings, 0 replies; 10+ messages in thread
From: Filipe Manana @ 2021-06-14 20:42 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Linux FS Devel, Btrfs BTRFS, Linux Memory Management List, Josef Bacik

[-- Attachment #1: Type: text/plain, Size: 10353 bytes --]

On Mon, Jun 14, 2021 at 9:07 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Mon, Jun 14, 2021 at 11:50 AM Filipe Manana <fdmanana@gmail.com> wrote:
> >
> > On Mon, Jun 14, 2021 at 6:05 PM Chris Murphy <lists@colorremedies.com> wrote:
> > >
> > > 5.12.9-300.fc34.x86_64
> > >
> > > File system is btrfs, 16G RAM, 8G /dev/zram0 is used for swap, and the
> > > workload is:
> > > * bees doing dedup
> > > * btrfs send
> > > * digikam scanning files and writing image metadata to a sqlite database
> > > * chrome doing some webby things
> > >
> > > And then kaboom.
> > >
> > > Call trace (easy to read, expires in 7 days):
> > > https://pastebin.com/x5KRU23V
> > >
> > > Same call trace (MUA mangled, for the archive):
> > > [60015.902283] kernel: WARNING: CPU: 3 PID: 58159 at
> > > arch/x86/include/asm/kfence.h:44 kfence_protect_page+0x21/0x80
> > > [60015.902292] kernel: Modules linked in: uinput rfcomm snd_seq_dummy
> > > snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
> > > nf_nat_tftp nf_conntrack_tftp nft_objref nf_conntrack_netbios_ns
> > > nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> > > 8021q garp mrp nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> > > nft_reject bridge stp llc nft_ct nft_chain_nat ip6table_nat
> > > ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
> > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
> > > iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables
> > > iptable_filter cmac bnep sunrpc vfat fat iwldvm mac80211 btusb btrtl
> > > edac_mce_amd libarc4 btbcm snd_hda_codec_realtek iwlwifi
> > > snd_hda_codec_generic ledtrig_audio btintel snd_hda_codec_hdmi
> > > snd_hda_intel bluetooth kvm_amd snd_intel_dspcfg cfg80211
> > > snd_intel_sdw_acpi ccp snd_hda_codec snd_hda_core kvm snd_hwdep
> > > snd_seq joydev irqbypass snd_seq_device snd_pcm ecdh_generic ecc
> > > k10temp rfkill i2c_piix4 fam15h_power
> > > [60015.902361] kernel:  pcspkr snd_timer snd r8169 soundcore
> > > acpi_cpufreq zram ip_tables amdgpu iommu_v2 gpu_sched crct10dif_pclmul
> > > crc32_pclmul crc32c_intel radeon ghash_clmulni_intel sp5100_tco
> > > i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec drm video fuse
> > > [60015.902384] kernel: CPU: 3 PID: 58159 Comm: btrfs Not tainted
> > > 5.12.9-300.fc34.x86_64 #1
> > > [60015.902387] kernel: Hardware name: Gigabyte Technology Co., Ltd. To
> > > be filled by O.E.M./F2A88XN-WIFI, BIOS F6 12/24/2015
> > > [60015.902389] kernel: RIP: 0010:kfence_protect_page+0x21/0x80
> > > [60015.902393] kernel: Code: ff 0f 1f 84 00 00 00 00 00 55 48 89 fd 53
> > > 89 f3 48 83 ec 08 48 8d 74 24 04 e8 cb e0 d7 ff 48 85 c0 74 07 83 7c
> > > 24 04 01 74 0b <0f> 0b 48 83 c4 08 31 c0 5b 5d c3 48 8b 38 48 89 c2 84
> > > db 75 36 48
> > > [60015.902396] kernel: RSP: 0018:ffff9fb583453220 EFLAGS: 00010246
> > > [60015.902399] kernel: RAX: 0000000000000000 RBX: 0000000000000000
> > > RCX: ffff9fb583453224
> > > [60015.902401] kernel: RDX: ffff9fb583453224 RSI: 0000000000000000
> > > RDI: 0000000000000000
> > > [60015.902402] kernel: RBP: 0000000000000000 R08: 0000000000000000
> > > R09: 0000000000000000
> > > [60015.902404] kernel: R10: 0000000000000000 R11: 0000000000000000
> > > R12: 0000000000000002
> > > [60015.902406] kernel: R13: ffff9fb583453348 R14: 0000000000000000
> > > R15: 0000000000000001
> > > [60015.902408] kernel: FS:  00007f158e62d8c0(0000)
> > > GS:ffff93bd37580000(0000) knlGS:0000000000000000
> > > [60015.902410] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [60015.902412] kernel: CR2: 0000000000000039 CR3: 00000001256d2000
> > > CR4: 00000000000506e0
> > > [60015.902414] kernel: Call Trace:
> > > [60015.902419] kernel:  kfence_unprotect+0x13/0x30
> > > [60015.902423] kernel:  page_fault_oops+0x89/0x270
> > > [60015.902427] kernel:  ? search_module_extables+0xf/0x40
> > > [60015.902431] kernel:  ? search_bpf_extables+0x57/0x70
> > > [60015.902435] kernel:  kernelmode_fixup_or_oops+0xd6/0xf0
> > > [60015.902437] kernel:  __bad_area_nosemaphore+0x142/0x180
> > > [60015.902440] kernel:  exc_page_fault+0x67/0x150
> > > [60015.902445] kernel:  asm_exc_page_fault+0x1e/0x30
> > > [60015.902450] kernel: RIP: 0010:start_transaction+0x71/0x580
> > > [60015.902454] kernel: Code: d3 0f 84 92 00 00 00 80 e7 06 0f 85 63 04
> > > 00 00 65 48 8b 04 25 c0 7b 01 00 4c 8b a0 70 0c 00 00 b8 01 00 00 00
> > > 49 8d 7c 24 38 <f0> 41 0f c1 44 24 38 85 c0 0f 84 41 04 00 00 8d 50 01
> > > 09 c2 0f 88
> > > [60015.902456] kernel: RSP: 0018:ffff9fb5834533f8 EFLAGS: 00010246
> > > [60015.902458] kernel: RAX: 0000000000000001 RBX: 0000000000000001
> > > RCX: 0000000000000000
> > > [60015.902460] kernel: RDX: 0000000000000801 RSI: 0000000000000000
> > > RDI: 0000000000000039
> > > [60015.902462] kernel: RBP: ffff93bc0a7eb800 R08: 0000000000000001
> > > R09: 0000000000000000
> > > [60015.902463] kernel: R10: 0000000000098a00 R11: 0000000000000001
> > > R12: 0000000000000001
> > > [60015.902464] kernel: R13: 0000000000000000 R14: ffff93bc0c92b000
> > > R15: ffff93bc0c92b000
> > > [60015.902468] kernel:  btrfs_commit_inode_delayed_inode+0x5d/0x120
> > > [60015.902473] kernel:  btrfs_evict_inode+0x2c5/0x3f0
> > > [60015.902476] kernel:  evict+0xd1/0x180
> > > [60015.902480] kernel:  inode_lru_isolate+0xe7/0x180
> > > [60015.902483] kernel:  __list_lru_walk_one+0x77/0x150
> > > [60015.902487] kernel:  ? iput+0x1a0/0x1a0
> > > [60015.902489] kernel:  ? iput+0x1a0/0x1a0
> > > [60015.902491] kernel:  list_lru_walk_one+0x47/0x70
> > > [60015.902495] kernel:  prune_icache_sb+0x39/0x50
> > > [60015.902497] kernel:  super_cache_scan+0x161/0x1f0
> > > [60015.902501] kernel:  do_shrink_slab+0x142/0x240
> > > [60015.902505] kernel:  shrink_slab+0x164/0x280
> > > [60015.902509] kernel:  shrink_node+0x2c8/0x6e0
> > > [60015.902512] kernel:  do_try_to_free_pages+0xcb/0x4b0
> > > [60015.902514] kernel:  try_to_free_pages+0xda/0x190
> > > [60015.902516] kernel:  __alloc_pages_slowpath.constprop.0+0x373/0xcc0
> > > [60015.902521] kernel:  ? __memcg_kmem_charge_page+0xc2/0x1e0
> > > [60015.902525] kernel:  __alloc_pages_nodemask+0x30a/0x340
> > > [60015.902528] kernel:  pipe_write+0x30b/0x5c0
> > > [60015.902531] kernel:  ? set_next_entity+0xad/0x1e0
> > > [60015.902534] kernel:  ? switch_mm_irqs_off+0x58/0x440
> > > [60015.902538] kernel:  __kernel_write+0x13a/0x2b0
> > > [60015.902541] kernel:  kernel_write+0x73/0x150
> > > [60015.902543] kernel:  send_cmd+0x7b/0xd0
> > > [60015.902545] kernel:  send_extent_data+0x5a3/0x6b0
> > > [60015.902549] kernel:  process_extent+0x19b/0xed0
> > > [60015.902551] kernel:  btrfs_ioctl_send+0x1434/0x17e0
> > > [60015.902554] kernel:  ? _btrfs_ioctl_send+0xe1/0x100
> > > [60015.902557] kernel:  _btrfs_ioctl_send+0xbf/0x100
> > > [60015.902559] kernel:  ? enqueue_entity+0x18c/0x7b0
> > > [60015.902562] kernel:  btrfs_ioctl+0x185f/0x2f80
> > > [60015.902564] kernel:  ? psi_task_change+0x84/0xc0
> > > [60015.902569] kernel:  ? _flat_send_IPI_mask+0x21/0x40
> > > [60015.902572] kernel:  ? check_preempt_curr+0x2f/0x70
> > > [60015.902576] kernel:  ? selinux_file_ioctl+0x137/0x1e0
> > > [60015.902579] kernel:  ? expand_files+0x1cb/0x1d0
> > > [60015.902582] kernel:  ? __x64_sys_ioctl+0x82/0xb0
> > > [60015.902585] kernel:  __x64_sys_ioctl+0x82/0xb0
> > > [60015.902588] kernel:  do_syscall_64+0x33/0x40
> > > [60015.902591] kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [60015.902595] kernel: RIP: 0033:0x7f158e38f0ab
> >
> > The problem is we end up trying to start a transaction while in a send
> > context, which is unexpected. But it does happen if memory allocation
> > triggers inode eviction due to memory pressure.
> >
> > You don't have CONFIG_BTRFS_ASSERT=y in your kernel config right?
>
> No, but Fedora debug kernels do have it enabled so I can ask the
> reporter to use 5.12.10-debug or 5.13-rc6. Any preference?

No preference.
It's a problem that has been around for several years now, and not
something recent.

Ok, so it was reported before. Anywhere public, or was it something private?

>
>
> > That would trigger an assertion right away:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
> >
> > With assertions disabled, we just cast current->journal_info into a
> > transaction handle and dereference it, which is obviously wrong since
> > ->jornal_info has a value that is not a pointer to a transaction
> > handle, resulting in the crash.
> >
> > How easy is it to reproduce for you?
>
> I'm not sure.
>
> Do you think following this splat we could have somehow been IO
> limited or stalled without any other kernel messages appearing? That
> would then cause reclaim activity, and high memory and swap usage? I'm
> wondering if it's likely or unlikely, because at the moment I think
> it's unlikely since /var/log/journal was successfully recording
> systemd-journald journals for the 81 minutes between this call trace
> and when systemd-oomd started killing things. I'm trying to separate
> out what's causing what.

Don't know about that. And honestly I don't think that it matters.
The send task writes to a pipe and writing to a pipe allocates pages
without GFP_NOFS (with GFP_HIGHUSER | __GFP_ACCOUNT to be precise),
meaning we can end up evicting an inode, which in turn needs to start
a transaction.

That patch would be my preferred way to fix it.
If it doesn't work for any reason, then we can simply set up a NOFS
context surrounding the calls in send to write to the pipe, like this:

https://pastebin.com/raw/tTrfAw0m
(also attached)

I would have to test the first patch in a scenario under heavy memory
pressure to see if there isn't anything I might have missed, but I
think it should work just fine.
I'll test it tomorrow too.

Thanks.

>
> > Can you try the following patch and see if everything works as expected?
>
> I'll ask the reporter. I don't have a setup even remotely approximate
> to this one for testing.
>
> Thanks!
>
> --
> Chris Murphy



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

[-- Attachment #2: send_nofs_pipe_write.patch --]
[-- Type: text/x-patch, Size: 1053 bytes --]

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index a2b3c594379d..d8667c07c4eb 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -548,24 +548,32 @@ static struct btrfs_path *alloc_path_for_send(void)
 
 static int write_buf(struct file *filp, const void *buf, u32 len, loff_t *off)
 {
-	int ret;
+	int ret = 0;
 	u32 pos = 0;
+	unsigned nofs_flag;
 
+	/*
+	 * Pipe writes can allocate pages which can trigger reclaim and cause
+	 * a crash if reclaim triggers inode eviction and that needs to start a
+	 * transaction. So use a NOFS context.
+	 */
+	nofs_flag = memalloc_nofs_save();
 	while (pos < len) {
 		ret = kernel_write(filp, buf + pos, len - pos, off);
 		/* TODO handle that correctly */
 		/*if (ret == -ERESTARTSYS) {
 			continue;
 		}*/
+		if (ret == 0)
+			ret = -EIO;
 		if (ret < 0)
-			return ret;
-		if (ret == 0) {
-			return -EIO;
-		}
+			break;
 		pos += ret;
+		ret = 0;
 	}
+	memalloc_nofs_restore(nofs_flag);
 
-	return 0;
+	return ret;
 }
 
 static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
  2021-06-14 20:42       ` Filipe Manana
@ 2021-06-14 21:29         ` Chris Murphy
  -1 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-06-14 21:29 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Chris Murphy, Linux FS Devel, Btrfs BTRFS,
	Linux Memory Management List, Josef Bacik

On Mon, Jun 14, 2021 at 2:43 PM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 9:07 PM Chris Murphy <lists@colorremedies.com> wrote:
> > No, but Fedora debug kernels do have it enabled so I can ask the
> > reporter to use 5.12.10-debug or 5.13-rc6. Any preference?
>
> No preference.
> It's a problem that has been around for several years now, and not
> something recent.
>
> Ok, so it was reported before. Anywhere public, or was it something private?

https://bugzilla.redhat.com/show_bug.cgi?id=1971327



>
> >
> >
> > > That would trigger an assertion right away:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
> > >
> > > With assertions disabled, we just cast current->journal_info into a
> > > transaction handle and dereference it, which is obviously wrong since
> > > ->jornal_info has a value that is not a pointer to a transaction
> > > handle, resulting in the crash.
> > >
> > > How easy is it to reproduce for you?
> >
> > I'm not sure.
> >
> > Do you think following this splat we could have somehow been IO
> > limited or stalled without any other kernel messages appearing? That
> > would then cause reclaim activity, and high memory and swap usage? I'm
> > wondering if it's likely or unlikely, because at the moment I think
> > it's unlikely since /var/log/journal was successfully recording
> > systemd-journald journals for the 81 minutes between this call trace
> > and when systemd-oomd started killing things. I'm trying to separate
> > out what's causing what.
>
> Don't know about that. And honestly I don't think that it matters.
> The send task writes to a pipe and writing to a pipe allocates pages
> without GFP_NOFS (with GFP_HIGHUSER | __GFP_ACCOUNT to be precise),
> meaning we can end up evicting an inode, which in turn needs to start
> a transaction.
>
> That patch would be my preferred way to fix it.
> If it doesn't work for any reason, then we can simply set up a NOFS
> context surrounding the calls in send to write to the pipe, like this:
>
> https://pastebin.com/raw/tTrfAw0m
> (also attached)
>
> I would have to test the first patch in a scenario under heavy memory
> pressure to see if there isn't anything I might have missed, but I
> think it should work just fine.
> I'll test it tomorrow too.
>

The system was under heavy swap and memory pressure, but the logs
don't tell us the contributors or their relative amounts. But logs
show that Bees continued to make progress for almost an hour and a
half after the splat, up until the wayland+shell service was killed by
oomd. I'd expect any blocked tasks to result in warnings about it
every 2 minutes in the journal.

Swap being on zram during the memory pressure period might be a
complicating factor, combining IO, CPU and memory pressure.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: WARNING at asm/kfence.h:44 kfence_protect_page
@ 2021-06-14 21:29         ` Chris Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2021-06-14 21:29 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Chris Murphy, Linux FS Devel, Btrfs BTRFS,
	Linux Memory Management List, Josef Bacik

On Mon, Jun 14, 2021 at 2:43 PM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 9:07 PM Chris Murphy <lists@colorremedies.com> wrote:
> > No, but Fedora debug kernels do have it enabled so I can ask the
> > reporter to use 5.12.10-debug or 5.13-rc6. Any preference?
>
> No preference.
> It's a problem that has been around for several years now, and not
> something recent.
>
> Ok, so it was reported before. Anywhere public, or was it something private?

https://bugzilla.redhat.com/show_bug.cgi?id=1971327



>
> >
> >
> > > That would trigger an assertion right away:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/btrfs/transaction.c?h=v5.12.10#n584
> > >
> > > With assertions disabled, we just cast current->journal_info into a
> > > transaction handle and dereference it, which is obviously wrong since
> > > ->jornal_info has a value that is not a pointer to a transaction
> > > handle, resulting in the crash.
> > >
> > > How easy is it to reproduce for you?
> >
> > I'm not sure.
> >
> > Do you think following this splat we could have somehow been IO
> > limited or stalled without any other kernel messages appearing? That
> > would then cause reclaim activity, and high memory and swap usage? I'm
> > wondering if it's likely or unlikely, because at the moment I think
> > it's unlikely since /var/log/journal was successfully recording
> > systemd-journald journals for the 81 minutes between this call trace
> > and when systemd-oomd started killing things. I'm trying to separate
> > out what's causing what.
>
> Don't know about that. And honestly I don't think that it matters.
> The send task writes to a pipe and writing to a pipe allocates pages
> without GFP_NOFS (with GFP_HIGHUSER | __GFP_ACCOUNT to be precise),
> meaning we can end up evicting an inode, which in turn needs to start
> a transaction.
>
> That patch would be my preferred way to fix it.
> If it doesn't work for any reason, then we can simply set up a NOFS
> context surrounding the calls in send to write to the pipe, like this:
>
> https://pastebin.com/raw/tTrfAw0m
> (also attached)
>
> I would have to test the first patch in a scenario under heavy memory
> pressure to see if there isn't anything I might have missed, but I
> think it should work just fine.
> I'll test it tomorrow too.
>

The system was under heavy swap and memory pressure, but the logs
don't tell us the contributors or their relative amounts. But logs
show that Bees continued to make progress for almost an hour and a
half after the splat, up until the wayland+shell service was killed by
oomd. I'd expect any blocked tasks to result in warnings about it
every 2 minutes in the journal.

Swap being on zram during the memory pressure period might be a
complicating factor, combining IO, CPU and memory pressure.


-- 
Chris Murphy


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-14 21:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14 17:02 WARNING at asm/kfence.h:44 kfence_protect_page Chris Murphy
2021-06-14 17:02 ` Chris Murphy
2021-06-14 17:50 ` Filipe Manana
2021-06-14 17:50   ` Filipe Manana
2021-06-14 20:07   ` Chris Murphy
2021-06-14 20:07     ` Chris Murphy
2021-06-14 20:42     ` Filipe Manana
2021-06-14 20:42       ` Filipe Manana
2021-06-14 21:29       ` Chris Murphy
2021-06-14 21:29         ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.