cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

* cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5
@ 2020-10-14 15:22 David Hildenbrand
  2020-10-14 16:15 ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2020-10-14 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Michal Privoznik, Michael S. Tsirkin, Michal Hocko, Muchun Song,
	Aneesh Kumar K.V, Tejun Heo, Mina Almasry

Hi everybody,

Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and reported that this results in [1]

1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2. Any hugetlbfs allocations failing. (I assume because some accounting is wrong)

QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
to discard pages that are reported as free by a VM. The reporting
granularity is in pageblock granularity. So when the guest reports
2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.

I was also able to reproduce (also with virtio-mem, which similarly
uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
(and on v5.7.X from F32).

Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
is broken with cgroups. I did *not* try without cgroups yet.

Any ideas?

Here is report #1:

[  315.251417] ------------[ cut here ]------------
[  315.251424] WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[  315.251425] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp rfcomm tun bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep hwmon_vid sunrpc squashfs vfat fat loop snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec edac_mce_amd snd_hda_core btusb btrtl btbcm snd_hwdep snd_seq btintel kvm_amd snd_seq_device bluetooth kvm snd_pcm ecdh_generic sp5100_tco irqbypass rfkill snd_timer rapl ecc pcspkr wmi_bmof joydev i2c_piix4 k10temp snd
[  315.251454]  soundcore acpi_cpufreq ip_tables xfs libcrc32c dm_crypt igb hid_logitech_hidpp video dca amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi drm ghash_clmulni_intel ccp nvme nvme_core wmi pinctrl_amd hid_logitech_dj fuse
[  315.251466] CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[  315.251467] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[  315.251469] RIP: 0010:page_counter_uncharge+0x4b/0x50
[  315.251471] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[  315.251472] RSP: 0018:ffffb60f01ed3b20 EFLAGS: 00010286
[  315.251473] RAX: fffffffffffd0600 RBX: fffffffffffd0600 RCX: ffff8de8272e3200
[  315.251473] RDX: 000000000000028e RSI: fffffffffffd0600 RDI: ffff8de838452e40
[  315.251474] RBP: ffff8de838452e40 R08: ffff8de838452e40 R09: ffff8de837f86c80
[  315.251475] R10: ffffb60f01ed3b58 R11: 0000000000000001 R12: 0000000000051c00
[  315.251475] R13: fffffffffffae400 R14: ffff8de8272e3240 R15: 0000000000000571
[  315.251476] FS:  00007f9c2edfd700(0000) GS:ffff8de83ebc0000(0000) knlGS:0000000000000000
[  315.251477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  315.251478] CR2: 00007f2a76787e78 CR3: 0000000fcbb1c000 CR4: 0000000000350ee0
[  315.251479] Call Trace:
[  315.251485]  hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[  315.251487]  region_del+0x1d3/0x300
[  315.251489]  hugetlb_unreserve_pages+0x39/0xb0
[  315.251492]  remove_inode_hugepages+0x1a8/0x3d0
[  315.251495]  ? tlb_finish_mmu+0x7a/0x1d0
[  315.251497]  hugetlbfs_fallocate+0x3c4/0x5c0
[  315.251519]  ? kvm_arch_vcpu_ioctl_run+0x614/0x1700 [kvm]
[  315.251522]  ? file_has_perm+0xa2/0xb0
[  315.251524]  ? inode_security+0xc/0x60
[  315.251525]  ? selinux_file_permission+0x4e/0x120
[  315.251527]  vfs_fallocate+0x146/0x290
[  315.251529]  __x64_sys_fallocate+0x3e/0x70
[  315.251531]  do_syscall_64+0x33/0x40
[  315.251533]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  315.251535] RIP: 0033:0x7f9d3fb5641f
[  315.251536] Code: 89 7c 24 08 48 89 4c 24 18 e8 5d fc f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 7c 24 08 b8 1d 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 08 e8 8d fc f8 ff 8b 44
[  315.251537] RSP: 002b:00007f9c2edfc470 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[  315.251538] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f9d3fb5641f
[  315.251539] RDX: 00000000ae200000 RSI: 0000000000000003 RDI: 000000000000000c
[  315.251539] RBP: 0000557389d6736c R08: 0000000000000000 R09: 000000000000000c
[  315.251540] R10: 0000000000200000 R11: 0000000000000293 R12: 0000000000200000
[  315.251540] R13: 00000000ffffffff R14: 00000000ae200000 R15: 00007f9cde000000
[  315.251542] ---[ end trace 4c88c62ccb1349c9 ]---

Here is report #2:

[  400.920702] ------------[ cut here ]------------
[  400.920711] WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[  400.920712] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp rfcomm tun bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep hwmon_vid sunrpc squashfs vfat fat loop btusb btrtl btbcm btintel edac_mce_amd bluetooth snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec kvm_amd snd_hda_core snd_hwdep kvm snd_seq ecdh_generic snd_seq_device rfkill irqbypass snd_pcm ecc joydev sp5100_tco rapl pcspkr
[  400.920743]  wmi_bmof i2c_piix4 k10temp snd_timer snd soundcore acpi_cpufreq ip_tables xfs libcrc32c dm_crypt igb hid_logitech_hidpp video dca amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm mxm_wmi ghash_clmulni_intel ccp nvme nvme_core wmi pinctrl_amd hid_logitech_dj fuse
[  400.920759] CPU: 13 PID: 2438 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[  400.920760] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[  400.920763] RIP: 0010:page_counter_uncharge+0x4b/0x50
[  400.920765] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[  400.920766] RSP: 0018:ffffb89e01f5fa20 EFLAGS: 00010286
[  400.920767] RAX: fffffffffff01200 RBX: fffffffffff01200 RCX: 0000000080400000
[  400.920768] RDX: 0000000000000800 RSI: fffffffffff01200 RDI: ffff910b78452e40
[  400.920769] RBP: ffff910b78452e40 R08: ffff910b78452e40 R09: ffff910b70b2a700
[  400.920769] R10: 0000000000000001 R11: ffff910b5e079300 R12: 0000000000100000
[  400.920770] R13: fffffffffff00000 R14: ffff910b76185908 R15: 0000000000000000
[  400.920771] FS:  0000000000000000(0000) GS:ffff910b7ed40000(0000) knlGS:0000000000000000
[  400.920772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  400.920773] CR2: 00007f90b2d898bc CR3: 000000056ca0e000 CR4: 0000000000350ee0
[  400.920774] Call Trace:
[  400.920780]  hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[  400.920783]  region_del+0x11b/0x300
[  400.920786]  hugetlb_unreserve_pages+0x39/0xb0
[  400.920788]  remove_inode_hugepages+0x3c2/0x3d0
[  400.920792]  hugetlbfs_evict_inode+0x1a/0x40
[  400.920795]  evict+0xd1/0x1a0
[  400.920797]  __dentry_kill+0xe4/0x180
[  400.920799]  __fput+0xec/0x240
[  400.920802]  task_work_run+0x65/0xa0
[  400.920804]  do_exit+0x34c/0xad0
[  400.920806]  do_group_exit+0x33/0xa0
[  400.920808]  get_signal+0x179/0x8d0
[  400.920811]  arch_do_signal+0x30/0x700
[  400.920832]  ? kvm_vcpu_ioctl+0x29f/0x590 [kvm]
[  400.920835]  exit_to_user_mode_prepare+0xf7/0x160
[  400.920838]  syscall_exit_to_user_mode+0x31/0x1b0
[  400.920841]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  400.920843] RIP: 0033:0x7f90b3008e92
[  400.920843] Code: Bad RIP value.
[  400.920844] RSP: 002b:00007f8d45ffa770 EFLAGS: 00000282 ORIG_RAX: 00000000000000ca
[  400.920845] RAX: fffffffffffffe00 RBX: 0000000000000014 RCX: 00007f90b3008e92
[  400.920846] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000055efd2951db8
[  400.920846] RBP: 000055efd2951d90 R08: 0000000000000000 R09: 000055efd18f29a0
[  400.920847] R10: 0000000000000000 R11: 0000000000000282 R12: 0000000000000000
[  400.920848] R13: 000055efd190ff60 R14: 000055efd2951db8 R15: 00007f8d45ffa7a0
[  400.920850] ---[ end trace bd4d1b0930afe999 ]---

[1] https://www.redhat.com/archives/libvir-list/2020-October/msg00872.html

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 18+ messages in thread