From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E9E6C4363A for ; Mon, 26 Oct 2020 23:58:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 965BA20773 for ; Mon, 26 Oct 2020 23:58:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="RRO3Tzgd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 965BA20773 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0C4446B005C; Mon, 26 Oct 2020 19:58:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04BDF6B005D; Mon, 26 Oct 2020 19:58:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E55646B0062; Mon, 26 Oct 2020 19:58:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4]) by kanga.kvack.org (Postfix) with ESMTP id ADA666B005C for ; Mon, 26 Oct 2020 19:58:41 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 53BEF8249980 for ; Mon, 26 Oct 2020 23:58:41 +0000 (UTC) X-FDA: 77415744042.10.mind95_140129727277 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 2BDBA16A0DD for ; Mon, 26 Oct 2020 23:58:41 +0000 (UTC) X-HE-Tag: mind95_140129727277 X-Filterd-Recvd-Size: 6241 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Mon, 26 Oct 2020 23:58:40 +0000 (UTC) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 458A220773 for ; Mon, 26 Oct 2020 23:58:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1603756719; bh=uLnVjN+bYvNodXPMnfZTZnXM5m0PwYS9okq64LRETLs=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=RRO3TzgdvpmKVD3bbCDkGZZSR1+Fm8OK5aueWvEPYKjc7d8a93MtphTmsqbvyTiDV vFafi3oaHWEo+2qejFic18T+MDKTHSgZ5hWamXYCb2fVZsL+8d4q6JlAmQzfeoEiMv i3ZtXn1kgq7xiQy4KNuv/PVAp7RliU3VNVBzODwI= Received: by mail-lf1-f53.google.com with SMTP id 77so14740420lfl.2 for ; Mon, 26 Oct 2020 16:58:39 -0700 (PDT) X-Gm-Message-State: AOAM532Fdu6WnL75AgLOIjJKQ8u0Udus0HBzNlYyvWN7NE60saOt90kc MDRQMBadpWUaXoKOdARd/svpTU5Kdm/1rbMcXReAUA== X-Google-Smtp-Source: ABdhPJwmJw2aLpySxeErc3LeZpMLqKpmzVGi3onJZb+wqiiUEro4x2ppyNy4SmMvqM3k0DIOULzz/xDwNF+C9355huA= X-Received: by 2002:a5d:6744:: with SMTP id l4mr20569606wrw.18.1603756716752; Mon, 26 Oct 2020 16:58:36 -0700 (PDT) MIME-Version: 1.0 References: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> <20201026152910.happu7wic4qjxmp7@box> In-Reply-To: <20201026152910.happu7wic4qjxmp7@box> From: Andy Lutomirski Date: Mon, 26 Oct 2020 16:58:16 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFCv2 00/16] KVM protected memory extension To: "Kirill A. Shutemov" Cc: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe, Rick P" , "Kleen, Andi" , Liran Alon , Mike Rapoport , X86 ML , kvm list , Linux-MM , LKML , "Kirill A. Shutemov" Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 26, 2020 at 8:29 AM Kirill A. Shutemov wrote: > > On Wed, Oct 21, 2020 at 11:20:56AM -0700, Andy Lutomirski wrote: > > > On Oct 19, 2020, at 11:19 PM, Kirill A. Shutemov wrote: > > > > > For removing the userspace mapping, use a trick similar to what NUMA > > > balancing does: convert memory that belongs to KVM memory slots to > > > PROT_NONE: all existing entries converted to PROT_NONE with mprotect() and > > > the newly faulted in pages get PROT_NONE from the updated vm_page_prot. > > > The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the > > > VMA must be treated in a special way in the GUP and fault paths. The flag > > > allows GUP to return the page even though it is mapped with PROT_NONE, but > > > only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace access > > > to the memory would result in SIGBUS. Any GUP access without FOLL_KVM > > > would result in -EFAULT. > > > > > > > I definitely like the direction this patchset is going in, and I think > > that allowing KVM guests to have memory that is inaccessible to QEMU > > is a great idea. > > > > I do wonder, though: do we really want to do this with these PROT_NONE > > tricks, or should we actually come up with a way to have KVM guest map > > memory that isn't mapped into QEMU's mm_struct at all? As an example > > of the latter, I mean something a bit like this: > > > > https://lkml.kernel.org/r/CALCETrUSUp_7svg8EHNTk3nQ0x9sdzMCU=h8G-Sy6=SODq5GHg@mail.gmail.com > > > > I don't mean to say that this is a requirement of any kind of > > protected memory like this, but I do think we should understand the > > tradeoffs, in terms of what a full implementation looks like, the > > effort and time frames involved, and the maintenance burden of > > supporting whatever gets merged going forward. > > I considered the PROT_NONE trick neat. Complete removing of the mapping > from QEMU would require more changes into KVM and I'm not really familiar > with it. I think it's neat. The big tradeoff I'm concerned about is that it will likely become ABI once it lands. That is, if this series lands, then we will always have to support the case in which QEMU has a special non-present mapping that is nonetheless reflected as present in a guest. This is a bizarre state of affairs, it may become obsolete if a better API ever shows up, and it might end up placing constraints on the Linux VM that we don't love going forward. I don't think my proposal in the referenced thread above is that crazy or that difficult to implement. The basic idea is to have a way to create an mm_struct that is not loaded in CR3 anywhere. Instead, KVM will reference it, much as it currently references QEMU's mm_struct, to mirror mappings into the guest. This means it would be safe to have "protected" memory mapped into the special mm_struct because nothing other than KVM will ever reference the PTEs. But I think that someone who really understands the KVM memory mapping code should chime in. > > About tradeoffs: the trick interferes with AutoNUMA. I didn't put much > thought into how we can get it work together. Need to look into it. > > Do you see other tradeoffs? > > -- > Kirill A. Shutemov