LKML Archive on lore.kernel.org
 help / Atom feed
From: Zenghui Yu <yuzenghui@huawei.com>
To: Suzuki K Poulose <suzuki.poulose@arm.com>, <zhengxiang9@huawei.com>
Cc: <marc.zyngier@arm.com>, <christoffer.dall@arm.com>,
	<catalin.marinas@arm.com>, <will.deacon@arm.com>,
	<james.morse@arm.com>, <linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.cs.columbia.edu>, <linux-kernel@vger.kernel.org>,
	<wanghaibin.wang@huawei.com>, <lious.lilei@hisilicon.com>,
	<lishuo1@hisilicon.com>
Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages
Date: Sun, 17 Mar 2019 21:55:12 +0800
Message-ID: <0fce42bd-1295-45a4-46b6-81b128ee343e@huawei.com> (raw)
In-Reply-To: <6aea4049-7860-7144-a7be-14f856cdc789@arm.com>

Hi Suzuki,

On 2019/3/15 22:56, Suzuki K Poulose wrote:
> Hi Zhengui,
> 
> On 15/03/2019 08:21, Zheng Xiang wrote:
>> Hi Suzuki,
>>
>> I have tested this patch, VM doesn't hang and we get expected WARNING 
>> log:
> 
> Thanks for the quick testing !
> 
>> However, we also get the following unexpected log:
>>
>> [  908.329900] BUG: Bad page state in process qemu-kvm  pfn:a2fb41cf
>> [  908.339415] page:ffff7e28bed073c0 count:-4 mapcount:0 
>> mapping:0000000000000000 index:0x0
>> [  908.339416] flags: 0x4ffffe0000000000()
>> [  908.339418] raw: 4ffffe0000000000 dead000000000100 dead000000000200 
>> 0000000000000000
>> [  908.339419] raw: 0000000000000000 0000000000000000 fffffffcffffffff 
>> 0000000000000000
>> [  908.339420] page dumped because: nonzero _refcount
>> [  908.339437] CPU: 32 PID: 72599 Comm: qemu-kvm Kdump: loaded 
>> Tainted: G    B  W        5.0.0+ #1
>> [  908.339438] Call trace:
>> [  908.339439]  dump_backtrace+0x0/0x188
>> [  908.339441]  show_stack+0x24/0x30
>> [  908.339442]  dump_stack+0xa8/0xcc
>> [  908.339443]  bad_page+0xf0/0x150
>> [  908.339445]  free_pages_check_bad+0x84/0xa0
>> [  908.339446]  free_pcppages_bulk+0x4b8/0x750
>> [  908.339448]  free_unref_page_commit+0x13c/0x198
>> [  908.339449]  free_unref_page+0x84/0xa0
>> [  908.339451]  __free_pages+0x58/0x68
>> [  908.339452]  zap_huge_pmd+0x290/0x2d8
>> [  908.339454]  unmap_page_range+0x2b4/0x470
>> [  908.339455]  unmap_single_vma+0x94/0xe8
>> [  908.339457]  unmap_vmas+0x8c/0x108
>> [  908.339458]  exit_mmap+0xd4/0x178
>> [  908.339459]  mmput+0x74/0x180
>> [  908.339460]  do_exit+0x2b4/0x5b0
>> [  908.339462]  do_group_exit+0x3c/0xe0
>> [  908.339463]  __arm64_sys_exit_group+0x24/0x28
>> [  908.339465]  el0_svc_common+0xa0/0x180
>> [  908.339466]  el0_svc_handler+0x38/0x78
>> [  908.339467]  el0_svc+0x8/0xc
> 
> Thats bad, we seem to be making upto 4 unbalanced put_page().
> 
>>>> ---
>>>>    virt/kvm/arm/mmu.c | 51 
>>>> +++++++++++++++++++++++++++++++++++----------------
>>>>    1 file changed, 35 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 66e0fbb5..04b0f9b 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -1076,24 +1076,38 @@ static int stage2_set_pmd_huge(struct kvm 
>>>> *kvm, struct kvm_mmu_memory_cache
>>>>             * Skip updating the page table if the entry is
>>>>             * unchanged.
>>>>             */
>>>> -        if (pmd_val(old_pmd) == pmd_val(*new_pmd))
>>>> +        if (pmd_val(old_pmd) == pmd_val(*new_pmd)) {
>>>>                return 0;
>>>> -
>>>> +        } else if (WARN_ON_ONCE(!pmd_thp_or_huge(old_pmd))) {
>>>>            /*
>>>> -         * Mapping in huge pages should only happen through a
>>>> -         * fault.  If a page is merged into a transparent huge
>>>> -         * page, the individual subpages of that huge page
>>>> -         * should be unmapped through MMU notifiers before we
>>>> -         * get here.
>>>> -         *
>>>> -         * Merging of CompoundPages is not supported; they
>>>> -         * should become splitting first, unmapped, merged,
>>>> -         * and mapped back in on-demand.
>>>> +         * If we have PTE level mapping for this block,
>>>> +         * we must unmap it to avoid inconsistent TLB
>>>> +         * state. We could end up in this situation if
>>>> +         * the memory slot was marked for dirty logging
>>>> +         * and was reverted, leaving PTE level mappings
>>>> +         * for the pages accessed during the period.
>>>> +         * Normal THP split/merge follows mmu_notifier
>>>> +         * callbacks and do get handled accordingly.
>>>>             */
>>>> -        VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
>>>> +            unmap_stage2_range(kvm, (addr & S2_PMD_MASK), 
>>>> S2_PMD_SIZE);
>>
>> It seems that kvm decreases the _refcount of the page twice in 
>> transparent_hugepage_adjust()
>> and unmap_stage2_range().
> 
> But I thought we should be doing that on the head_page already, as this 
> is THP.
> I will take a look and get back to you on this. Btw, is it possible for you
> to turn on CONFIG_DEBUG_VM and re-run with the above patch ?

And for detailed debugging info:

I've turned on CONFIG_DEBUG_VM and re-run with your patch -- Run a guest
with stage2 PUD hugepage, enable then disable the dirty logging, and
then shutdown this guest. The result is: Host hit a kernel BUG with the
below log (when shutdown-ing guest):


[  486.997640] kernel BUG at ./include/linux/mm.h:547!
[  487.005524] Internal error: Oops - BUG: 0 [#1] SMP
[  487.013455] Modules linked in: ...
[  487.104072] CPU: 14 PID: 60747 Comm: qemu-kvm Kdump: loaded Tainted: 
G        W         5.0.0+ #2
[  487.117150] ...
[  487.135433] pstate: 40400009 (nZcv daif +PAN -UAO)
[  487.144849] pc : unmap_stage2_puds+0x480/0x6e0
[  487.153756] lr : unmap_stage2_puds+0x480/0x6e0
[  487.162507] sp : ffff00002c72bb10
[  487.179630] x27: 0000000041a00000 x26: ffff8027bbb56060
[  487.183465] openvswitch: netlink: Tunnel attr 5 has unexpected len 1 
expected 0
[  487.189184] x25: ffff802769cbe008 x24: ffff7e0000000000
[  487.189185] x23: ffff802769cbe008 x22: ffff00004b0af000
[  487.189186] x21: ffff80279da06060 x20: 00400027332007fd
[  487.189188] x19: 0000000080000000 x18: 0000000000000010
[  487.189189] x17: 0000000000000000 x16: 0000000000000000
[  487.189190] x15: ffff00001182d708 x14: 3030303030303030
[  487.189191] x13: 3030303030302066 x12: ffff000011857000
[  487.189192] x11: 0000000000000000 x10: ffff000011a48000
[  487.189193] x9 : 0000000000000000 x8 : 0000000000000003
[  487.189194] x7 : 000000000000095b x6 : 0000000212557560
[  487.189196] x3 : ffff802fc0b08260 x2 : b2513adc3568f800
[  487.189197] x1 : 0000000000000000 x0 : 000000000000003e
[  487.189200] Process qemu-kvm (pid: 60747, stack limit = 
0x000000004342b298)
[  487.189201] Call trace:
[  487.189203]  unmap_stage2_puds+0x480/0x6e0
[  487.189205]  unmap_stage2_range+0xa4/0x190
[  487.189208]  kvm_free_stage2_pgd+0x64/0x100
[  487.363897]  kvm_arch_flush_shadow_all+0x20/0x30
[  487.372095]  kvm_mmu_notifier_release+0x3c/0x80
[  487.380092]  __mmu_notifier_release+0x50/0x100
[  487.387914]  exit_mmap+0x170/0x178
[  487.394567]  mmput+0x70/0x180
[  487.400653]  do_exit+0x2b4/0x5c8
[  487.406849]  do_group_exit+0x3c/0xe0
--[ end trace 55c414a329c80b63 ]---
[  487.454174] Kernel panic - not syncing: Fatal exception
[  487.461967] SMP: stopping secondary CPUs
[  487.468473] Kernel Offset: disabled
[  487.474520] CPU features: 0x002,22208a38
[  487.481008] Memory Limit: none
[  487.489095] Starting crashdump kernel...
[  487.495457] Bye!


> 
> Kind regards
> Suzuki
> 
> 
> 
> .


      parent reply index

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-11 16:31 Zheng Xiang
2019-03-12 11:32 ` Marc Zyngier
2019-03-12 15:30   ` Zheng Xiang
2019-03-12 18:18     ` Marc Zyngier
2019-03-13  9:45       ` Zheng Xiang
2019-03-14 10:55         ` Suzuki K Poulose
2019-03-14 15:50           ` Zenghui Yu
2019-03-15  8:21             ` Zheng Xiang
2019-03-15 14:56               ` Suzuki K Poulose
2019-03-17 13:34                 ` Zenghui Yu
2019-03-18 17:34                   ` Suzuki K Poulose
2019-03-19  9:05                     ` Zenghui Yu
2019-03-19 14:11                       ` [PATCH] kvm: arm: Fix handling of stage2 huge mappings Suzuki K Poulose
2019-03-19 16:02                         ` Zenghui Yu
2019-03-20  8:15                         ` Marc Zyngier
2019-03-20  9:44                           ` Suzuki K Poulose
2019-03-20 10:11                             ` Marc Zyngier
2019-03-20 10:23                               ` Suzuki K Poulose
2019-03-20 10:35                                 ` Marc Zyngier
2019-03-20 11:12                                   ` Suzuki K Poulose
2019-03-20 17:24                                     ` Marc Zyngier
2019-03-17 13:55                 ` Zenghui Yu [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0fce42bd-1295-45a4-46b6-81b128ee343e@huawei.com \
    --to=yuzenghui@huawei.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=james.morse@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lious.lilei@hisilicon.com \
    --cc=lishuo1@hisilicon.com \
    --cc=marc.zyngier@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=will.deacon@arm.com \
    --cc=zhengxiang9@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox