All of lore.kernel.org
 help / color / mirror / Atom feed
From: Friedrich Weber <f.weber@proxmox.com>
To: kvm@vger.kernel.org
Cc: Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer
Date: Thu, 11 Jan 2024 13:43:15 +0100	[thread overview]
Message-ID: <2c73db3f-2465-49e1-9d57-ef8d978849b6@proxmox.com> (raw)
In-Reply-To: <832697b9-3652-422d-a019-8c0574a188ac@proxmox.com>

Hi,

On 04/01/2024 14:42, Friedrich Weber wrote:
> f47e5bbb ("KVM: x86/mmu: Zap only TDP MMU leafs in zap range and
> mmu_notifier unmap")

As this commit mentions tdp_mmu_zap_leafs, I ran the reproducer again on
610a9b8f (6.7-rc8) while tracing >500ms calls to that function (printing
a stacktrace) and task_numa_work via a bpftrace script [1].

Again, there are several invocations of task_numa_work that take >500ms
(the VM appears to be frozen during that time). For instance, one
invocation that takes ~20 seconds:

[1704971602] task_numa_work (tid=52035) took 19995 ms

For this particular thread and in the 20 seconds before that, there were
8 invocations of tdp_mmu_zap_leafs with >500ms:

[1704971584] tdp_mmu_zap_leafs (tid=52035) took 2291 ms
[1704971586] tdp_mmu_zap_leafs (tid=52035) took 2343 ms
[1704971589] tdp_mmu_zap_leafs (tid=52035) took 2316 ms
[1704971590] tdp_mmu_zap_leafs (tid=52035) took 1663 ms
[1704971591] tdp_mmu_zap_leafs (tid=52035) took 682 ms
[1704971594] tdp_mmu_zap_leafs (tid=52035) took 2706 ms
[1704971597] tdp_mmu_zap_leafs (tid=52035) took 3132 ms
[1704971602] tdp_mmu_zap_leafs (tid=52035) took 4846 ms

They roughly sum up to 20s. The stacktrace is the same for all:

bpf_prog_5ca52691cb9e9fbd_tdp_mmu_zap_lea+345
bpf_prog_5ca52691cb9e9fbd_tdp_mmu_zap_lea+345
bpf_trampoline_380104735946+208
tdp_mmu_zap_leafs+5
kvm_unmap_gfn_range+347
kvm_mmu_notifier_invalidate_range_start+394
__mmu_notifier_invalidate_range_start+156
change_protection+3908
change_prot_numa+105
task_numa_work+1029
bpf_trampoline_6442457341+117
task_numa_work+9
xfer_to_guest_mode_handle_work+261
kvm_arch_vcpu_ioctl_run+1553
kvm_vcpu_ioctl+667
__x64_sys_ioctl+164
do_syscall_64+96
entry_SYSCALL_64_after_hwframe+110

AFAICT this pattern repeats several times. I uploaded the last 150kb of
the bpftrace output to [2]. I can provide the full output if needed.

To me, not knowing much about the KVM/KSM/NUMA balancer interplay, this
looks like task_numa_work triggers several invocations of
tdp_mmu_zap_leafs, each of which takes an unusually long time. If anyone
has a hunch why this might happen, or an idea where to look next, it
would be much appreciated.

Best,

Friedrich

[1]

kfunc:tdp_mmu_zap_leafs { @start_zap[tid] = nsecs; }
kretfunc:tdp_mmu_zap_leafs /@start_zap[tid]/ {
	$diff = nsecs - @start_zap[tid];
	if ($diff > 500000000) { // 500ms
		time("[%s] ");
		printf("tdp_mmu_zap_leafs (tid=%d) took %d ms\n", tid, $diff / 1000000);
		print(kstack());
	}
	delete(@start_zap[tid]);
}

kfunc:task_numa_work { @start_numa[tid] = nsecs; }
kretfunc:task_numa_work /@start_numa[tid]/ {
	$diff = nsecs - @start_numa[tid];
	if ($diff > 500000000) { // 500ms
		time("[%s] ");
		printf("task_numa_work (tid=%d) took %d ms\n", tid, $diff / 1000000);
	}
	delete(@start_numa[tid]);
}

[2] https://paste.debian.net/1303767/


  reply	other threads:[~2024-01-11 12:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-04 13:42 Temporary KVM guest hangs connected to KSM and NUMA balancer Friedrich Weber
2024-01-11 12:43 ` Friedrich Weber [this message]
2024-01-11 16:00 ` Sean Christopherson
2024-01-12 16:08   ` Friedrich Weber
2024-01-16 15:37   ` Friedrich Weber
2024-01-16 17:20     ` Sean Christopherson
2024-01-17 13:09       ` Friedrich Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c73db3f-2465-49e1-9d57-ef8d978849b6@proxmox.com \
    --to=f.weber@proxmox.com \
    --cc=akpm@linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.