From: Alexey Kardashevskiy <aik@ozlabs.ru> To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, Leonardo Bras <leobras.c@gmail.com>, kvm-ppc@vger.kernel.org Subject: [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode Date: Fri, 27 Aug 2021 14:07:06 +1000 [thread overview] Message-ID: <20210827040706.517652-1-aik@ozlabs.ru> (raw) Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated only when needed. This allows skipping any update when clearing TCEs. This works mostly fine as TCE updates are handled when MMU is enabled. The realmode handlers fail with H_TOO_HARD when pages are not yet allocated except when clearing a TCE in which case KVM prints a warning but proceeds to dereference a NULL pointer which crashes the host OS. This has not been caught so far as the change is reasonably new, POWER9 runs mostly radix which does not use realmode handlers. With hash, the default TCE table is memset() by QEMU the machine reset which triggers page faults and the KVM TCE device's kvm_spapr_tce_fault() handles those with MMU on. And the huge DMA windows are not cleared by VMs whicn instead successfully create a DMA window big enough to map the VM memory 1:1 and then VMs just map everything without clearing. This started crashing now as upcoming sriov-under-powervm support added a mode when a dymanic DMA window not big enough to map the VM memory 1:1 but it is used anyway, and the VM now is the first (i.e. not QEMU) to clear a just created table. Note that the upstream QEMU needs to be modified to trigger the VM to trigger the host OS crash. This replaces WARN_ON_ONCE_RM() with a check and return. This adds another warning if TCE is not being cleared. Cc: Leonardo Bras <leobras.c@gmail.com> Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand too") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- With recent changes in the printk() department, calling pr_err() when MMU off causes lockdep lockups which I did not dig any further so we should start getting rid of the realmode's WARN_ON_ONCE_RM(). --- arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 083a4e037718..e5ba96c41f3f 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct kvmppc_spapr_tce_table *stt, idx -= stt->offset; page = stt->pages[idx / TCES_PER_PAGE]; /* - * page must not be NULL in real mode, - * kvmppc_rm_ioba_validate() must have taken care of this. + * kvmppc_rm_ioba_validate() allows pages not be allocated if TCE is + * being cleared, otherwise it returns H_TOO_HARD and we skip this. */ - WARN_ON_ONCE_RM(!page); + if (!page) { + WARN_ON_ONCE_RM(tce != 0); + return; + } tbl = kvmppc_page_address(page); tbl[idx % TCES_PER_PAGE] = tce; -- 2.30.2
WARNING: multiple messages have this Message-ID (diff)
From: Alexey Kardashevskiy <aik@ozlabs.ru> To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, Leonardo Bras <leobras.c@gmail.com>, kvm-ppc@vger.kernel.org Subject: [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode Date: Fri, 27 Aug 2021 04:07:06 +0000 [thread overview] Message-ID: <20210827040706.517652-1-aik@ozlabs.ru> (raw) Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated only when needed. This allows skipping any update when clearing TCEs. This works mostly fine as TCE updates are handled when MMU is enabled. The realmode handlers fail with H_TOO_HARD when pages are not yet allocated except when clearing a TCE in which case KVM prints a warning but proceeds to dereference a NULL pointer which crashes the host OS. This has not been caught so far as the change is reasonably new, POWER9 runs mostly radix which does not use realmode handlers. With hash, the default TCE table is memset() by QEMU the machine reset which triggers page faults and the KVM TCE device's kvm_spapr_tce_fault() handles those with MMU on. And the huge DMA windows are not cleared by VMs whicn instead successfully create a DMA window big enough to map the VM memory 1:1 and then VMs just map everything without clearing. This started crashing now as upcoming sriov-under-powervm support added a mode when a dymanic DMA window not big enough to map the VM memory 1:1 but it is used anyway, and the VM now is the first (i.e. not QEMU) to clear a just created table. Note that the upstream QEMU needs to be modified to trigger the VM to trigger the host OS crash. This replaces WARN_ON_ONCE_RM() with a check and return. This adds another warning if TCE is not being cleared. Cc: Leonardo Bras <leobras.c@gmail.com> Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand too") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- With recent changes in the printk() department, calling pr_err() when MMU off causes lockdep lockups which I did not dig any further so we should start getting rid of the realmode's WARN_ON_ONCE_RM(). --- arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 083a4e037718..e5ba96c41f3f 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct kvmppc_spapr_tce_table *stt, idx -= stt->offset; page = stt->pages[idx / TCES_PER_PAGE]; /* - * page must not be NULL in real mode, - * kvmppc_rm_ioba_validate() must have taken care of this. + * kvmppc_rm_ioba_validate() allows pages not be allocated if TCE is + * being cleared, otherwise it returns H_TOO_HARD and we skip this. */ - WARN_ON_ONCE_RM(!page); + if (!page) { + WARN_ON_ONCE_RM(tce != 0); + return; + } tbl = kvmppc_page_address(page); tbl[idx % TCES_PER_PAGE] = tce; -- 2.30.2
next reply other threads:[~2021-08-27 4:07 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-27 4:07 Alexey Kardashevskiy [this message] 2021-08-27 4:07 ` [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode Alexey Kardashevskiy 2021-08-31 20:30 ` Leonardo Brás 2021-08-31 20:30 ` Leonardo Brás 2021-09-04 1:01 ` Michael Ellerman 2021-09-04 1:01 ` Michael Ellerman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210827040706.517652-1-aik@ozlabs.ru \ --to=aik@ozlabs.ru \ --cc=kvm-ppc@vger.kernel.org \ --cc=leobras.c@gmail.com \ --cc=linuxppc-dev@lists.ozlabs.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.