On 10/10/19 18:14, Paolo Bonzini wrote:
> On 10/10/19 07:53, speck for Pawan Gupta wrote:
>> We can help debug the crash. Can you please share the series,
>> reproduction steps and the crash signature.

The bug is a race condition between kvm_mmu_zap_all and, well,
everything else.  It is triggered when nx_huge_pages is clear/set while
the recovery thread runs.

Paolo

> The reproduction steps for v5 are as follows:
> 
> - grab the next branch of kvm-unit-tests.git[1] and build it
> 
> - create a lot of hugepages, on my machine I use 40 GiB worth of them:
> 
>   echo 20480 > /proc/sys/vm/nr_hugepages
> 
> - load KVM with kvm.nx_huge_pages_recovery_period_secs=3
> 
> - run the following script
> 
>   while true; do
>     echo N > /sys/module/kvm/parameters/nx_huge_pages; sleep 1
>     echo Y > /sys/module/kvm/parameters/nx_huge_pages; sleep 5
>   done
> 
> - run the testcase with
> 
>   MEM=40960  # in megabytes
>   qemu-kvm -nodefaults -vnc none -serial stdio -kernel x86/hugetext.flat
>     -m $MEM -mem-path /dev/hugepages
> 
> You can also add a WARN_ON_ONCE(!sp->lpage_disallowed) to
> kvm_recover_nx_lpages before the call to kvm_mmu_prepare_zap_page.  As
> soon as it triggers, of course everything will go downhill.
> 
> Paolo
> 
> [1] git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
>