linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Michal Privoznik <mprivozn@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Muchun Song <songmuchun@bytedance.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Tejun Heo <tj@kernel.org>, Mina Almasry <almasrymina@google.com>
Subject: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5
Date: Wed, 14 Oct 2020 17:22:57 +0200	[thread overview]
Message-ID: <c1ea7548-622c-eda7-66f4-e4ae5b6ee8fc@redhat.com> (raw)

Hi everybody,

Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and reported that this results in [1]

1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2. Any hugetlbfs allocations failing. (I assume because some accounting is wrong)


QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
to discard pages that are reported as free by a VM. The reporting
granularity is in pageblock granularity. So when the guest reports
2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.

I was also able to reproduce (also with virtio-mem, which similarly
uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
(and on v5.7.X from F32).

Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
is broken with cgroups. I did *not* try without cgroups yet.

Any ideas?


Here is report #1:

[  315.251417] ------------[ cut here ]------------
[  315.251424] WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[  315.251425] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp rfcomm tun bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep hwmon_vid sunrpc squashfs vfat fat loop snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec edac_mce_amd snd_hda_core btusb btrtl btbcm snd_hwdep snd_seq btintel kvm_amd snd_seq_device bluetooth kvm snd_pcm ecdh_generic sp5100_tco irqbypass rfkill snd_timer rapl ecc pcspkr wmi_bmof joydev i2c_piix4 k10temp snd
[  315.251454]  soundcore acpi_cpufreq ip_tables xfs libcrc32c dm_crypt igb hid_logitech_hidpp video dca amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi drm ghash_clmulni_intel ccp nvme nvme_core wmi pinctrl_amd hid_logitech_dj fuse
[  315.251466] CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[  315.251467] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[  315.251469] RIP: 0010:page_counter_uncharge+0x4b/0x50
[  315.251471] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[  315.251472] RSP: 0018:ffffb60f01ed3b20 EFLAGS: 00010286
[  315.251473] RAX: fffffffffffd0600 RBX: fffffffffffd0600 RCX: ffff8de8272e3200
[  315.251473] RDX: 000000000000028e RSI: fffffffffffd0600 RDI: ffff8de838452e40
[  315.251474] RBP: ffff8de838452e40 R08: ffff8de838452e40 R09: ffff8de837f86c80
[  315.251475] R10: ffffb60f01ed3b58 R11: 0000000000000001 R12: 0000000000051c00
[  315.251475] R13: fffffffffffae400 R14: ffff8de8272e3240 R15: 0000000000000571
[  315.251476] FS:  00007f9c2edfd700(0000) GS:ffff8de83ebc0000(0000) knlGS:0000000000000000
[  315.251477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  315.251478] CR2: 00007f2a76787e78 CR3: 0000000fcbb1c000 CR4: 0000000000350ee0
[  315.251479] Call Trace:
[  315.251485]  hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[  315.251487]  region_del+0x1d3/0x300
[  315.251489]  hugetlb_unreserve_pages+0x39/0xb0
[  315.251492]  remove_inode_hugepages+0x1a8/0x3d0
[  315.251495]  ? tlb_finish_mmu+0x7a/0x1d0
[  315.251497]  hugetlbfs_fallocate+0x3c4/0x5c0
[  315.251519]  ? kvm_arch_vcpu_ioctl_run+0x614/0x1700 [kvm]
[  315.251522]  ? file_has_perm+0xa2/0xb0
[  315.251524]  ? inode_security+0xc/0x60
[  315.251525]  ? selinux_file_permission+0x4e/0x120
[  315.251527]  vfs_fallocate+0x146/0x290
[  315.251529]  __x64_sys_fallocate+0x3e/0x70
[  315.251531]  do_syscall_64+0x33/0x40
[  315.251533]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  315.251535] RIP: 0033:0x7f9d3fb5641f
[  315.251536] Code: 89 7c 24 08 48 89 4c 24 18 e8 5d fc f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 7c 24 08 b8 1d 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 08 e8 8d fc f8 ff 8b 44
[  315.251537] RSP: 002b:00007f9c2edfc470 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[  315.251538] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f9d3fb5641f
[  315.251539] RDX: 00000000ae200000 RSI: 0000000000000003 RDI: 000000000000000c
[  315.251539] RBP: 0000557389d6736c R08: 0000000000000000 R09: 000000000000000c
[  315.251540] R10: 0000000000200000 R11: 0000000000000293 R12: 0000000000200000
[  315.251540] R13: 00000000ffffffff R14: 00000000ae200000 R15: 00007f9cde000000
[  315.251542] ---[ end trace 4c88c62ccb1349c9 ]---



Here is report #2:

[  400.920702] ------------[ cut here ]------------
[  400.920711] WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[  400.920712] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp rfcomm tun bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep hwmon_vid sunrpc squashfs vfat fat loop btusb btrtl btbcm btintel edac_mce_amd bluetooth snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec kvm_amd snd_hda_core snd_hwdep kvm snd_seq ecdh_generic snd_seq_device rfkill irqbypass snd_pcm ecc joydev sp5100_tco rapl pcspkr
[  400.920743]  wmi_bmof i2c_piix4 k10temp snd_timer snd soundcore acpi_cpufreq ip_tables xfs libcrc32c dm_crypt igb hid_logitech_hidpp video dca amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm mxm_wmi ghash_clmulni_intel ccp nvme nvme_core wmi pinctrl_amd hid_logitech_dj fuse
[  400.920759] CPU: 13 PID: 2438 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[  400.920760] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[  400.920763] RIP: 0010:page_counter_uncharge+0x4b/0x50
[  400.920765] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[  400.920766] RSP: 0018:ffffb89e01f5fa20 EFLAGS: 00010286
[  400.920767] RAX: fffffffffff01200 RBX: fffffffffff01200 RCX: 0000000080400000
[  400.920768] RDX: 0000000000000800 RSI: fffffffffff01200 RDI: ffff910b78452e40
[  400.920769] RBP: ffff910b78452e40 R08: ffff910b78452e40 R09: ffff910b70b2a700
[  400.920769] R10: 0000000000000001 R11: ffff910b5e079300 R12: 0000000000100000
[  400.920770] R13: fffffffffff00000 R14: ffff910b76185908 R15: 0000000000000000
[  400.920771] FS:  0000000000000000(0000) GS:ffff910b7ed40000(0000) knlGS:0000000000000000
[  400.920772] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  400.920773] CR2: 00007f90b2d898bc CR3: 000000056ca0e000 CR4: 0000000000350ee0
[  400.920774] Call Trace:
[  400.920780]  hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[  400.920783]  region_del+0x11b/0x300
[  400.920786]  hugetlb_unreserve_pages+0x39/0xb0
[  400.920788]  remove_inode_hugepages+0x3c2/0x3d0
[  400.920792]  hugetlbfs_evict_inode+0x1a/0x40
[  400.920795]  evict+0xd1/0x1a0
[  400.920797]  __dentry_kill+0xe4/0x180
[  400.920799]  __fput+0xec/0x240
[  400.920802]  task_work_run+0x65/0xa0
[  400.920804]  do_exit+0x34c/0xad0
[  400.920806]  do_group_exit+0x33/0xa0
[  400.920808]  get_signal+0x179/0x8d0
[  400.920811]  arch_do_signal+0x30/0x700
[  400.920832]  ? kvm_vcpu_ioctl+0x29f/0x590 [kvm]
[  400.920835]  exit_to_user_mode_prepare+0xf7/0x160
[  400.920838]  syscall_exit_to_user_mode+0x31/0x1b0
[  400.920841]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  400.920843] RIP: 0033:0x7f90b3008e92
[  400.920843] Code: Bad RIP value.
[  400.920844] RSP: 002b:00007f8d45ffa770 EFLAGS: 00000282 ORIG_RAX: 00000000000000ca
[  400.920845] RAX: fffffffffffffe00 RBX: 0000000000000014 RCX: 00007f90b3008e92
[  400.920846] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000055efd2951db8
[  400.920846] RBP: 000055efd2951d90 R08: 0000000000000000 R09: 000055efd18f29a0
[  400.920847] R10: 0000000000000000 R11: 0000000000000282 R12: 0000000000000000
[  400.920848] R13: 000055efd190ff60 R14: 000055efd2951db8 R15: 00007f8d45ffa7a0
[  400.920850] ---[ end trace bd4d1b0930afe999 ]---



[1] https://www.redhat.com/archives/libvir-list/2020-October/msg00872.html

-- 
Thanks,

David / dhildenb

             reply	other threads:[~2020-10-14 15:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-14 15:22 David Hildenbrand [this message]
2020-10-14 16:15 ` cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5 David Hildenbrand
2020-10-14 17:56   ` Mina Almasry
2020-10-14 18:18     ` David Hildenbrand
2020-10-14 18:31       ` Mike Kravetz
2020-10-15  7:56         ` David Hildenbrand
2020-10-15  8:57           ` David Hildenbrand
2020-10-15  9:01             ` David Hildenbrand
2020-10-15 23:14         ` Mike Kravetz
2020-10-20 13:38           ` David Hildenbrand
2020-10-21  3:35             ` Mike Kravetz
2020-10-21 12:42               ` David Hildenbrand
2020-10-21 12:57               ` Michal Privoznik
2020-10-21 13:11                 ` David Hildenbrand
2020-10-21 13:34                   ` David Hildenbrand
2020-10-21 13:38                     ` David Hildenbrand
2020-10-21 16:58                     ` Mike Kravetz
2020-10-21 17:30                       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1ea7548-622c-eda7-66f4-e4ae5b6ee8fc@redhat.com \
    --to=david@redhat.com \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mprivozn@redhat.com \
    --cc=mst@redhat.com \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).