KVM Archive on lore.kernel.org
 help / color / Atom feed
* Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5
       [not found]         ` <32ea3107-b1bc-f39e-3cf8-f6ef427235ef@redhat.com>
@ 2020-10-15  8:57           ` David Hildenbrand
  2020-10-15  9:01             ` David Hildenbrand
  0 siblings, 1 reply; 2+ messages in thread
From: David Hildenbrand @ 2020-10-15  8:57 UTC (permalink / raw)
  To: Mike Kravetz, Mina Almasry
  Cc: linux-kernel, linux-mm, Michal Privoznik, Michael S. Tsirkin,
	Michal Hocko, Muchun Song, Aneesh Kumar K.V, Tejun Heo, KVM

On 15.10.20 09:56, David Hildenbrand wrote:
> On 14.10.20 20:31, Mike Kravetz wrote:
>> On 10/14/20 11:18 AM, David Hildenbrand wrote:
>>> On 14.10.20 19:56, Mina Almasry wrote:
>>>> On Wed, Oct 14, 2020 at 9:15 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 14.10.20 17:22, David Hildenbrand wrote:
>>>>>> Hi everybody,
>>>>>>
>>>>>> Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
>>>>>> with hugetlbfs and reported that this results in [1]
>>>>>>
>>>>>> 1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5
>>>>>>
>>>>>> 2. Any hugetlbfs allocations failing. (I assume because some accounting is wrong)
>>>>>>
>>>>>>
>>>>>> QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
>>>>>> to discard pages that are reported as free by a VM. The reporting
>>>>>> granularity is in pageblock granularity. So when the guest reports
>>>>>> 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.
>>>>>>
>>>>>> I was also able to reproduce (also with virtio-mem, which similarly
>>>>>> uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
>>>>>> (and on v5.7.X from F32).
>>>>>>
>>>>>> Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
>>>>>> is broken with cgroups. I did *not* try without cgroups yet.
>>>>>>
>>>>>> Any ideas?
>>>>
>>>> Hi David,
>>>>
>>>> I may be able to dig in and take a look. How do I reproduce this
>>>> though? I just fallocate(FALLOC_FL_PUNCH_HOLE) one 2MB page in a
>>>> hugetlb region?
>>>>
>>>
>>> Hi Mina,
>>>
>>> thanks for having a look. I started poking around myself but,
>>> being new to cgroup code, I even failed to understand why that code gets
>>> triggered though the hugetlb controller isn't even enabled.
>>>
>>> I assume you at least have to make sure that there is
>>> a page populated (MMAP_POPULATE, or read/write it). But I am not
>>> sure yet if a single fallocate(FALLOC_FL_PUNCH_HOLE) is
>>> sufficient, or if it will require a sequence of
>>> populate+discard(punch) (or multi-threading).
>>
>> FWIW - I ran libhugetlbfs tests which do a bunch of hole punching
>> with (and without) hugetlb controller enabled and did not see this issue.
>>
>> May need to reproduce via QEMU as below.
> 
> Not sure if relevant, but QEMU should be using
> memfd_create(MFD_HUGETLB|MFD_HUGE_2MB) to obtain a hugetlbfs file.
> 
> Also, QEMU fallocate(FALLOC_FL_PUNCH_HOLE)'s a significant of memory of
> the md (e.g., > 90%).
> 

I just tried to reproduce by doing random accesses + random fallocate(FALLOC_FL_PUNCH_HOLE) within a file - without success.

So could be
1. KVM is involved messing this up
2. Multi-threading is involved

However, I am also able to reproduce with only a single VCPU (there is still the QEMU main thread, but it limits the chance for races).

Even KVM spits fire after a while, which could be a side effect of allocations failing:

error: kvm run failed Bad address
RAX=0000000000000000 RBX=ffff8c12c9c217c0 RCX=ffff8c12fb1b8fc0 RDX=0000000000000007
RSI=ffff8c12c9c217c0 RDI=ffff8c12c9c217c8 RBP=000000000000000d RSP=ffffb3964040fa68
R8 =0000000000000008 R9 =ffff8c12c9c20000 R10=ffff8c12fffd5000 R11=00000000000303c0
R12=ffff8c12c9c217c0 R13=0000000000000008 R14=0000000000000001 R15=fffff31d44270800
RIP=ffffffffaf33ba0f RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 00007f8fabc87040 00000000 00000000
GS =0000 ffff8c12fbc00000 00000000 00000000
LDT=0000 fffffe0000000000 00000000 00000000
TR =0040 fffffe0000003000 00004087 00008b00 DPL=0 TSS64-busy
GDT=     fffffe0000001000 0000007f
IDT=     fffffe0000000000 00000fff
CR0=80050033 CR2=0000560e10895398 CR3=00000001073b2000 CR4=00350ef0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=0f 0b eb e2 90 0f 1f 44 00 00 53 48 89 fb 31 c0 48 8d 7f 08 <48> c7 47 f8 00 00 00 00 48 89 d9 48 c7 c2 44 d3 52

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5
  2020-10-15  8:57           ` cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5 David Hildenbrand
@ 2020-10-15  9:01             ` David Hildenbrand
  0 siblings, 0 replies; 2+ messages in thread
From: David Hildenbrand @ 2020-10-15  9:01 UTC (permalink / raw)
  To: Mike Kravetz, Mina Almasry
  Cc: linux-kernel, linux-mm, Michal Privoznik, Michael S. Tsirkin,
	Michal Hocko, Muchun Song, Aneesh Kumar K.V, Tejun Heo, KVM

On 15.10.20 10:57, David Hildenbrand wrote:
> On 15.10.20 09:56, David Hildenbrand wrote:
>> On 14.10.20 20:31, Mike Kravetz wrote:
>>> On 10/14/20 11:18 AM, David Hildenbrand wrote:
>>>> On 14.10.20 19:56, Mina Almasry wrote:
>>>>> On Wed, Oct 14, 2020 at 9:15 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>> On 14.10.20 17:22, David Hildenbrand wrote:
>>>>>>> Hi everybody,
>>>>>>>
>>>>>>> Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
>>>>>>> with hugetlbfs and reported that this results in [1]
>>>>>>>
>>>>>>> 1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5
>>>>>>>
>>>>>>> 2. Any hugetlbfs allocations failing. (I assume because some accounting is wrong)
>>>>>>>
>>>>>>>
>>>>>>> QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
>>>>>>> to discard pages that are reported as free by a VM. The reporting
>>>>>>> granularity is in pageblock granularity. So when the guest reports
>>>>>>> 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.
>>>>>>>
>>>>>>> I was also able to reproduce (also with virtio-mem, which similarly
>>>>>>> uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
>>>>>>> (and on v5.7.X from F32).
>>>>>>>
>>>>>>> Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
>>>>>>> is broken with cgroups. I did *not* try without cgroups yet.
>>>>>>>
>>>>>>> Any ideas?
>>>>>
>>>>> Hi David,
>>>>>
>>>>> I may be able to dig in and take a look. How do I reproduce this
>>>>> though? I just fallocate(FALLOC_FL_PUNCH_HOLE) one 2MB page in a
>>>>> hugetlb region?
>>>>>
>>>>
>>>> Hi Mina,
>>>>
>>>> thanks for having a look. I started poking around myself but,
>>>> being new to cgroup code, I even failed to understand why that code gets
>>>> triggered though the hugetlb controller isn't even enabled.
>>>>
>>>> I assume you at least have to make sure that there is
>>>> a page populated (MMAP_POPULATE, or read/write it). But I am not
>>>> sure yet if a single fallocate(FALLOC_FL_PUNCH_HOLE) is
>>>> sufficient, or if it will require a sequence of
>>>> populate+discard(punch) (or multi-threading).
>>>
>>> FWIW - I ran libhugetlbfs tests which do a bunch of hole punching
>>> with (and without) hugetlb controller enabled and did not see this issue.
>>>
>>> May need to reproduce via QEMU as below.
>>
>> Not sure if relevant, but QEMU should be using
>> memfd_create(MFD_HUGETLB|MFD_HUGE_2MB) to obtain a hugetlbfs file.
>>
>> Also, QEMU fallocate(FALLOC_FL_PUNCH_HOLE)'s a significant of memory of
>> the md (e.g., > 90%).
>>
> 
> I just tried to reproduce by doing random accesses + random fallocate(FALLOC_FL_PUNCH_HOLE) within a file - without success.
> 
> So could be
> 1. KVM is involved messing this up
> 2. Multi-threading is involved
> 

Able to reproduce with TCG under QEMU, so not a KVM issue.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <c1ea7548-622c-eda7-66f4-e4ae5b6ee8fc@redhat.com>
     [not found] ` <563d1eef-b780-835a-ebf0-88ae111b20c2@redhat.com>
     [not found]   ` <CAHS8izPEHZunoeXYS5ONfRoSRMpC7DQwtpjJ8g4nXiddTfNoaA@mail.gmail.com>
     [not found]     ` <65a1946f-dbf9-5767-5b51-9c1b786051d1@redhat.com>
     [not found]       ` <5f196069-8b98-0ad3-55e8-19af03d715cd@oracle.com>
     [not found]         ` <32ea3107-b1bc-f39e-3cf8-f6ef427235ef@redhat.com>
2020-10-15  8:57           ` cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5 David Hildenbrand
2020-10-15  9:01             ` David Hildenbrand

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git