linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: linux-next@vger.kernel.org, akpm@linux-foundation.org,
	jack@suse.cz, kirill@shutemov.name, borntraeger@de.ibm.com,
	david@redhat.com, aarcange@redhat.com, linux-mm@kvack.org,
	frankja@linux.ibm.com, sfr@canb.auug.org.au, jhubbard@nvidia.com,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Sean Christopherson <sean.j.christopherson@intel.com>
Subject: Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages
Date: Tue, 14 Apr 2020 18:03:00 +0200	[thread overview]
Message-ID: <20200414180300.52640444@p-imbrenda> (raw)
In-Reply-To: <11dc928d-60b4-f04f-1ebf-f4cffb337a6c@intel.com>

On Mon, 13 Apr 2020 13:22:24 -0700
Dave Hansen <dave.hansen@intel.com> wrote:

> On 3/6/20 5:25 AM, Claudio Imbrenda wrote:
> > On s390x the function is not supposed to fail, so it is ok to use a
> > WARN_ON on failure. If we ever need some more finegrained handling
> > we can tackle this when we know the details.  
> 
> Could you explain a bit why the function can't fail?

the concept of "making accessible" is only to make sure that accessing
the page will not trigger faults or I/O or DMA errors. in general it
does not mean freely accessing the content of the page in cleartext. 

on s390x, protected guest pages can be shared. the guest has to
actively share its pages, and in that case those pages are both part of
the protected VM and freely accessible by the host.

pages that are not shared cannot be accessed by the host.

in our case "making the page accessible" means:
 - if the page was shared, make sure it stays shared
 - if the page was not shared, first encrypt it and then make it
   accessible to the host (both operations performed securely and
   atomically by the hardware)

then the page can be swapped out, or used for direct I/O (obviously if
you do I/O on a page that was not shared, you cannot expect good
things to happen, since you basically corrupt the memory of the guest).

on s390x performing I/O directly on protected pages results in (in
practice) unrecoverable I/O errors, so we want to avoid it at all costs.

accessing protected pages from the CPU triggers an exception that can
be handled (and we do handle it, in fact)

now imagine a buggy or malicious qemu process crashing the whole machine
just because it did I/O to/from a protected page. we clearly don't want
that.

> If the guest has secret data in the page, then it *can* and does fail.

no, that's the whole point of this mechanism. in fact, most of the
guest pages will be "secret data", only the few pages used for guest I/O
bounce buffers will be shared with the host

> It won't fail, though, if the host and guest agree on whether the page
> is protected.
> 
> Right?
> 
> > @@ -2807,6 +2807,13 @@ int __test_set_page_writeback(struct page
> > *page, bool keep_write) inc_zone_page_state(page,
> > NR_ZONE_WRITE_PENDING); }
> >  	unlock_page_memcg(page);
> > +	access_ret = arch_make_page_accessible(page);
> > +	/*
> > +	 * If writeback has been triggered on a page that cannot
> > be made
> > +	 * accessible, it is too late to recover here.
> > +	 */
> > +	VM_BUG_ON_PAGE(access_ret != 0, page);
> > +
> >  	return ret;
> >  
> >  }  
> 
> This seems like a really odd place to do this.  Writeback is specific
> to block I/O.  I would have thought there were other kinds of devices
> that matter, not just block devices.

well, yes and no. for writeback (block I/O and swap) this is the right
place. at this point we know that the page is present and nobody else
has started doing I/O yet, and I/O will happen soon-ish. so we make the
page accessible. there is no turning back here, unlike pinning. we
are not allowed to fail, we can't 

regarding the other kinds of devices: yes, they will use pinning, which
is covered by the rest of the patch. the semantics of get page and pin
page (if the documentation has not changed meanwhile) is that the
traditional get_page is used for when the page is needed but not its
content, and pin_page is used when the content of the page is accessed.
since the only issue here is accessing the content of the page, we
don't need to make it accessible for get_page, but only for pin_page.

get_page and pin_page are allowed to fail, so in this case we return an
error code, so other architectures can potentially abort the pinning if
needed. on s390x we will never fail, for the same reasons written
above.

> Also, this patch seems odd that it only does the
> arch_make_page_accessible() half.  Where's the other half where the
> page is made inaccessible?

that is very arch-specific. for s390x, you can look at this patch and
the ones immediately before/after: 214d9bbcd3a67230b932f6ce

> I assume it's OK to "leak" things like this, it's just not clear to me
> _why_ it's OK.

nothing is being leaked :)



I hope I clarified a little how this works on s390x :)
feel free to poke me again if some things are still unclear


best regards,


Claudio Imbrenda



  reply	other threads:[~2020-04-14 16:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-06 13:25 [PATCH v4 0/2] add callbacks for inaccessible pages Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
2020-04-13 20:22   ` Dave Hansen
2020-04-14 16:03     ` Claudio Imbrenda [this message]
2020-04-14 18:50       ` Dave Hansen
2020-04-15  9:26         ` Claudio Imbrenda
2020-04-15 11:39           ` Janosch Frank
2020-04-15 21:52   ` Dave Hansen
2020-04-15 22:17     ` Peter Zijlstra
2020-04-15 23:34       ` Dave Hansen
2020-04-16 12:15         ` Claudio Imbrenda
2020-04-16 14:20           ` Dave Hansen
2020-04-16 14:59             ` Claudio Imbrenda
2020-04-16 15:36               ` Dave Hansen
2020-04-16 16:34                 ` Claudio Imbrenda
2020-04-16 19:02                   ` Dave Hansen
2020-04-21 21:31                     ` Dave Hansen
2020-04-28 19:43                       ` Dave Hansen
2020-04-28 20:02                         ` Christian Borntraeger
2020-04-28 23:39                         ` Claudio Imbrenda
2020-04-29  0:42                           ` Dave Hansen
2020-04-16 11:51     ` Claudio Imbrenda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200414180300.52640444@p-imbrenda \
    --to=imbrenda@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=sfr@canb.auug.org.au \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).