All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Sean Christopherson <seanjc@google.com>,
	Jim Mattson <jmattson@google.com>,
	David Rientjes <rientjes@google.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Oscar Salvador <osalvador@suse.de>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [RFCv1 7/7] KVM: unmap guest memory using poisoned pages
Date: Fri, 9 Apr 2021 15:50:42 +0200	[thread overview]
Message-ID: <5ef83789-ffa5-debd-9ea2-50d831262237@redhat.com> (raw)
In-Reply-To: <20210409133347.r2uf3u5g55pp27xn@box>

>> It looks quite hacky (well, what did I expect from an RFC :) ) you can no
>> longer distinguish actually poisoned pages from "temporarily poisoned"
>> pages. FOLL_ALLOW_POISONED sounds especially nasty and dangerous -  "I want
>> to read/write a poisoned page, trust me, I know what I am doing".
>>
>> Storing the state for each individual page initially sounded like the right
>> thing to do, but I wonder if we couldn't handle this on a per-VMA level. You
>> can just remember the handful of shared ranges internally like you do right
>> now AFAIU.
> 
> per-VMA would not fly for file-backed (e.g. tmpfs) memory. We may need to
> combine PG_hwpoison with VMA flag. Maybe per-inode tracking would also be
> required. Or per-memslot. I donno. Need more experiments.

Indeed.

> 
> Note, I use PG_hwpoison now, but if we find a show-stopper issue where we
> would see confusion with a real poison, we can switch to new flags and
> a new swap_type(). I have not seen a reason yet.

I think we'll want a dedicate mechanism to cleanly mark pages as 
"protected". Finding a page flag you can use will be the problematic 
part, but should not be impossible if we have a good reason to do so 
(even if it means making the feature mutually exclusive with other 
features).

> 
>>  From what I get, you want a way to
>>
>> 1. Unmap pages from the user space page tables.
> 
> Plain unmap would not work for some use-cases. Some CSPs want to
> preallocate memory in a specific way. It's a way to provide a fine-grained
> NUMA policy.
> 
> The existing mapping has to be converted.
> 
>> 2. Disallow re-faulting of the protected pages into the page tables. On user
>> space access, you want to deliver some signal (e.g., SIGBUS).
> 
> Note that userspace mapping is the only source of pfn's for VM's shadow
> mapping. The fault should be allow, but lead to non-present PTE that still
> encodes pfn.

Makes sense, but I guess that's the part still to be implemented (see 
next comment).

> 
>> 3. Allow selected users to still grab the pages (esp. KVM to fault them into
>> the page tables).
> 
> As long as fault leads to non-present PTEs we are fine. Usespace still may
> want to mlock() some of guest memory. There's no reason to prevent this.

I'm curious, even get_user_pages() will lead to a present PTE as is, no? 
So that will need modifications I assume. (although I think it 
fundamentally differs to the way get_user_pages() works - trigger a 
fault first, then lookup the PTE in the page tables).

>> 4. Allow access to currently shared specific pages from user space.
>>
>> Right now, you achieve
>>
>> 1. Via try_to_unmap()
>> 2. TestSetPageHWPoison
>> 3. TBD (e.g., FOLL_ALLOW_POISONED)
>> 4. ClearPageHWPoison()
>>
>>
>> If we could bounce all writes to shared pages through the kernel, things
>> could end up a little easier. Some very rough idea:
>>
>> We could let user space setup VM memory as
>> mprotect(PROT_READ) (+ PROT_KERNEL_WRITE?), and after activating protected
>> memory (I assume via a KVM ioctl), make sure the VMAs cannot be set to
>> PROT_WRITE anymore. This would already properly unmap and deliver a SIGSEGV
>> when trying to write from user space.
>>
>> You could then still access the pages, e.g., via FOLL_FORCE or a new fancy
>> flag that allows to write with VM_MAYWRITE|VM_DENYUSERWRITE. This would
>> allow an ioctl to write page content and to map the pages into NPTs.
>>
>> As an extension, we could think about (re?)mapping some shared pages
>> read|write. The question is how to synchronize with user space.
>>
>> I have no idea how expensive would be bouncing writes (and reads?) through
>> the kernel. Did you ever experiment with that/evaluate that?
> 
> It's going to be double bounce buffer: on the guest we force swiotlb to
> make it go through shared region. I don't think it's a good idea.

So if it's already slow, do we really care? ;)

> 
> There are a number of way to share a memory. It's going to be decided by
> the way we get these pages unmapped in the first place.

I agree that shared memory can be somewhat problematic and would require 
tracking it per page.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-04-09 13:50 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02 15:26 [RFCv1 0/7] TDX and guest memory unmapping Kirill A. Shutemov
2021-04-02 15:26 ` [RFCv1 1/7] x86/mm: Move force_dma_unencrypted() to common code Kirill A. Shutemov
2021-04-02 15:26 ` [RFCv1 2/7] x86/kvm: Introduce KVM memory protection feature Kirill A. Shutemov
2021-04-08  9:52   ` Borislav Petkov
2021-04-09 13:36     ` Kirill A. Shutemov
2021-04-09 14:37       ` Borislav Petkov
2021-04-02 15:26 ` [RFCv1 3/7] x86/kvm: Make DMA pages shared Kirill A. Shutemov
2021-04-02 15:26 ` [RFCv1 4/7] x86/kvm: Use bounce buffers for KVM memory protection Kirill A. Shutemov
2021-04-02 15:26 ` [RFCv1 5/7] x86/kvmclock: Share hvclock memory with the host Kirill A. Shutemov
2021-04-02 15:26 ` [RFCv1 6/7] x86/realmode: Share trampoline area if KVM memory protection enabled Kirill A. Shutemov
2021-04-02 15:26 ` [RFCv1 7/7] KVM: unmap guest memory using poisoned pages Kirill A. Shutemov
2021-04-06  7:44   ` David Hildenbrand
2021-04-06 10:50     ` Kirill A. Shutemov
2021-04-06 14:33     ` Dave Hansen
2021-04-06 14:57       ` David Hildenbrand
2021-04-07 13:16         ` Kirill A. Shutemov
2021-04-07 13:31           ` Christophe de Dinechin
2021-04-07 14:09             ` Andi Kleen
2021-04-07 14:09               ` Andi Kleen
2021-04-07 14:09           ` David Hildenbrand
2021-04-07 14:36             ` Kirill A. Shutemov
2021-04-06 17:52       ` Tom Lendacky
2021-04-07 14:55   ` David Hildenbrand
2021-04-07 15:10     ` Andi Kleen
2021-04-07 15:10       ` Andi Kleen
2021-04-09 13:33     ` Kirill A. Shutemov
2021-04-09 13:50       ` David Hildenbrand [this message]
2021-04-09 14:12         ` Kirill A. Shutemov
2021-04-09 14:18           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ef83789-ffa5-debd-9ea2-50d831262237@redhat.com \
    --to=david@redhat.com \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jmattson@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.