linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Janosch Frank <frankja@linux.ibm.com>
To: Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Dave Hansen <dave.hansen@intel.com>
Cc: linux-next@vger.kernel.org, akpm@linux-foundation.org,
	jack@suse.cz, kirill@shutemov.name, borntraeger@de.ibm.com,
	david@redhat.com, aarcange@redhat.com, linux-mm@kvack.org,
	sfr@canb.auug.org.au, jhubbard@nvidia.com,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Sean Christopherson <sean.j.christopherson@intel.com>
Subject: Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages
Date: Wed, 15 Apr 2020 13:39:36 +0200	[thread overview]
Message-ID: <f1594ee2-0eea-9f8a-8e5b-8efd81af8c05@linux.ibm.com> (raw)
In-Reply-To: <20200415112639.525e25bc@p-imbrenda>


[-- Attachment #1.1: Type: text/plain, Size: 5234 bytes --]

On 4/15/20 11:26 AM, Claudio Imbrenda wrote:
> On Tue, 14 Apr 2020 11:50:16 -0700
> Dave Hansen <dave.hansen@intel.com> wrote:
> 
>> On 4/14/20 9:03 AM, Claudio Imbrenda wrote:
>>> On Mon, 13 Apr 2020 13:22:24 -0700
>>> Dave Hansen <dave.hansen@intel.com> wrote:
>>>   
>>>> On 3/6/20 5:25 AM, Claudio Imbrenda wrote:  
>>>>> On s390x the function is not supposed to fail, so it is ok to use
>>>>> a WARN_ON on failure. If we ever need some more finegrained
>>>>> handling we can tackle this when we know the details.    
>>>>
>>>> Could you explain a bit why the function can't fail?  
>>>
>>> the concept of "making accessible" is only to make sure that
>>> accessing the page will not trigger faults or I/O or DMA errors. in
>>> general it does not mean freely accessing the content of the page
>>> in cleartext. 
>>>
>>> on s390x, protected guest pages can be shared. the guest has to
>>> actively share its pages, and in that case those pages are both
>>> part of the protected VM and freely accessible by the host.  
>>
>> Oh, that's interesting.
>>
>> It sounds like there are three separate concepts:
>> 1. Protection
>> 2. Sharing
>> 3. Accessibility
>>
>> Protected pages may be shared and the request of the guest.
>> Shared pages' plaintext can be accessed by the host.  For unshared
>> pages, the host can only see ciphertext.
>>
>> I wonder if Documentation/virt/kvm/s390-pv.rst can be beefed up with
>> some of this information.  It seems a bit sparse on this topic.
> 
> that is definitely something that can be fixed.
> 
> I will improve the documentation and make sure it properly explains
> all the details of how protected VMs work on s390x.

I'd also definitely appreciate more people looking over that document
and adding things I forgot ;-)

> 
>> As it stands, if I were modifying generic code, I don't think I'd have
>> even a chance of getting an arch_make_page_accessible() in the right
>> spot.
>>
>>> in our case "making the page accessible" means:  
>> ...
>>>  - if the page was not shared, first encrypt it and then make it
>>>    accessible to the host (both operations performed securely and
>>>    atomically by the hardware)  
>>
>> What happens to the guest's view of the page when this happens?  Does
>> it keep seeing plaintext?
>>
>>> then the page can be swapped out, or used for direct I/O (obviously
>>> if you do I/O on a page that was not shared, you cannot expect good
>>> things to happen, since you basically corrupt the memory of the
>>> guest).  
>>
>> So why even allow access to the encrypted contents if the host can't
>> do anything useful with it?  Is there some reason for going to the
>> trouble of encrypting it and exposing it to the host?
> 
> you should not overwrite it, but you can/should write it out verbatim,
> e.g. for swap
> 
>>> on s390x performing I/O directly on protected pages results in (in
>>> practice) unrecoverable I/O errors, so we want to avoid it at all
>>> costs.  
>>
>> This is understandable, but we usually steer I/O operations in places
>> like the DMA API, not in the core VM.
>>
>> We *have* the concept of pages to which I/O can't be done.  There are
>> plenty of crippled devices where we have to bounce data into a low
>> buffer before it can go where we really want it to.  I think the AMD
>> SEV patches do this, for instance.
>>
>>> accessing protected pages from the CPU triggers an exception that
>>> can be handled (and we do handle it, in fact)
>>>
>>> now imagine a buggy or malicious qemu process crashing the whole
>>> machine just because it did I/O to/from a protected page. we
>>> clearly don't want that.  
>>
>> Is DMA disallowed to *all* protected pages?  Even pages which the
>> guest has explicitly shared with the host?
>>
>>
>>>>> @@ -2807,6 +2807,13 @@ int __test_set_page_writeback(struct page
>>>>> *page, bool keep_write) inc_zone_page_state(page,
>>>>> NR_ZONE_WRITE_PENDING); }
>>>>>  	unlock_page_memcg(page);
>>>>> +	access_ret = arch_make_page_accessible(page);
>>>>> +	/*
>>>>> +	 * If writeback has been triggered on a page that cannot
>>>>> be made
>>>>> +	 * accessible, it is too late to recover here.
>>>>> +	 */
>>>>> +	VM_BUG_ON_PAGE(access_ret != 0, page);
>>>>> +
>>>>>  	return ret;
>>>>>  
>>>>>  }    
>>>>
>>>> This seems like a really odd place to do this.  Writeback is
>>>> specific to block I/O.  I would have thought there were other
>>>> kinds of devices that matter, not just block devices.  
>>>
>>> well, yes and no. for writeback (block I/O and swap) this is the
>>> right place. at this point we know that the page is present and
>>> nobody else has started doing I/O yet, and I/O will happen
>>> soon-ish. so we make the page accessible. there is no turning back
>>> here, unlike pinning. we are not allowed to fail, we can't   
>>
>> This description sounds really incomplete to me.
>>
>> Not all swap involved device I/O.  For instance, zswap doesn't involve
>> any devices.  Would zswap need this hook?
> 
> please feel free to write to me privately if you have any further
> questions or doubts :)
> 
> 
> best regards,
> 
> Claudio Imbrenda
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-04-15 11:39 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 13:25 [PATCH v4 0/2] add callbacks for inaccessible pages Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
2020-04-13 20:22   ` Dave Hansen
2020-04-14 16:03     ` Claudio Imbrenda
2020-04-14 18:50       ` Dave Hansen
2020-04-15  9:26         ` Claudio Imbrenda
2020-04-15 11:39           ` Janosch Frank [this message]
2020-04-15 21:52   ` Dave Hansen
2020-04-15 22:17     ` Peter Zijlstra
2020-04-15 23:34       ` Dave Hansen
2020-04-16 12:15         ` Claudio Imbrenda
2020-04-16 14:20           ` Dave Hansen
2020-04-16 14:59             ` Claudio Imbrenda
2020-04-16 15:36               ` Dave Hansen
2020-04-16 16:34                 ` Claudio Imbrenda
2020-04-16 19:02                   ` Dave Hansen
2020-04-21 21:31                     ` Dave Hansen
2020-04-28 19:43                       ` Dave Hansen
2020-04-28 20:02                         ` Christian Borntraeger
2020-04-28 23:39                         ` Claudio Imbrenda
2020-04-29  0:42                           ` Dave Hansen
2020-04-16 11:51     ` Claudio Imbrenda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1594ee2-0eea-9f8a-8e5b-8efd81af8c05@linux.ibm.com \
    --to=frankja@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=sfr@canb.auug.org.au \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).