KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: David Hildenbrand <david@redhat.com>,
	Janosch Frank <frankja@linux.vnet.ibm.com>
Cc: KVM <kvm@vger.kernel.org>, Cornelia Huck <cohuck@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Michael Mueller <mimu@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests
Date: Mon, 17 Feb 2020 12:28:58 +0100
Message-ID: <87742e00-9a8d-b7b4-ab96-05e8c9d39534@de.ibm.com> (raw)
In-Reply-To: <f5523486-ee76-e6c1-9563-658bca7f3b0d@redhat.com>



On 17.02.20 11:21, David Hildenbrand wrote:
>> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
>> index 85e944f04c70..4ebcf891ff3c 100644
>> --- a/arch/s390/include/asm/page.h
>> +++ b/arch/s390/include/asm/page.h
>> @@ -153,6 +153,11 @@ static inline int devmem_is_allowed(unsigned long pfn)
>>  #define HAVE_ARCH_FREE_PAGE
>>  #define HAVE_ARCH_ALLOC_PAGE
>>  
>> +#if IS_ENABLED(CONFIG_PGSTE)
>> +int arch_make_page_accessible(struct page *page);
>> +#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
>> +#endif
>> +
> 
> Feels like this should have been one of the (CONFIG_)ARCH_HAVE_XXX
> thingies defined via kconfig instead.
> 
> E.g., like (CONFIG_)HAVE_ARCH_TRANSPARENT_HUGEPAGE
> 
> [...]

This looks more or less like HAVE_ARCH_ALLOC_PAGE. You will find both
variants.
I think I will leave it that way for now until we need it to be a config
or the mm maintainers have a preference.


> 
>> +
>> +/*
>> + * Requests the Ultravisor to encrypt a guest page and make it
>> + * accessible to the host for paging (export).
>> + *
>> + * @paddr: Absolute host address of page to be exported
>> + */
>> +int uv_convert_from_secure(unsigned long paddr)
>> +{
>> +	struct uv_cb_cfs uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.paddr = paddr
>> +	};
>> +
>> +	if (uv_call(0, (u64)&uvcb))
>> +		return -EINVAL;
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Calculate the expected ref_count for a page that would otherwise have no
>> + * further pins. This was cribbed from similar functions in other places in
>> + * the kernel, but with some slight modifications. We know that a secure
>> + * page can not be a huge page for example.
> 
> s/ca not cannot/

ack.


> 
>> + */
>> +static int expected_page_refs(struct page *page)
>> +{
>> +	int res;
>> +
>> +	res = page_mapcount(page);
>> +	if (PageSwapCache(page)) {
>> +		res++;
>> +	} else if (page_mapping(page)) {
>> +		res++;
>> +		if (page_has_private(page))
>> +			res++;
>> +	}
>> +	return res;
>> +}
>> +
>> +static int make_secure_pte(pte_t *ptep, unsigned long addr,
>> +			   struct page *exp_page, struct uv_cb_header *uvcb)
>> +{
>> +	pte_t entry = READ_ONCE(*ptep);
>> +	struct page *page;
>> +	int expected, rc = 0;
>> +
>> +	if (!pte_present(entry))
>> +		return -ENXIO;
>> +	if (pte_val(entry) & _PAGE_INVALID)
>> +		return -ENXIO;
>> +
>> +	page = pte_page(entry);
>> +	if (page != exp_page)
>> +		return -ENXIO;
>> +	if (PageWriteback(page))
>> +		return -EAGAIN;
>> +	expected = expected_page_refs(page);
>> +	if (!page_ref_freeze(page, expected))
>> +		return -EBUSY;
>> +	set_bit(PG_arch_1, &page->flags);
>> +	rc = uv_call(0, (u64)uvcb);
>> +	page_ref_unfreeze(page, expected);
>> +	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
>> +	if (rc)
>> +		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
>> +	return rc;
>> +}
>> +
>> +/*
>> + * Requests the Ultravisor to make a page accessible to a guest.
>> + * If it's brought in the first time, it will be cleared. If
>> + * it has been exported before, it will be decrypted and integrity
>> + * checked.
>> + */
>> +int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>> +{
>> +	struct vm_area_struct *vma;
>> +	unsigned long uaddr;
>> +	struct page *page;
>> +	int rc, local_drain = 0;
> 
> local_drain could have been a bool.

ack
> 
>> +	spinlock_t *ptelock;
>> +	pte_t *ptep;
>> +
>> +again:
>> +	rc = -EFAULT;
>> +	down_read(&gmap->mm->mmap_sem);
>> +
>> +	uaddr = __gmap_translate(gmap, gaddr);
>> +	if (IS_ERR_VALUE(uaddr))
>> +		goto out;
>> +	vma = find_vma(gmap->mm, uaddr);
>> +	if (!vma)
>> +		goto out;
>> +	/*
>> +	 * Secure pages cannot be huge and userspace should not combine both.
>> +	 * In case userspace does it anyway this will result in an -EFAULT for
>> +	 * the unpack. The guest is thus never reaching secure mode. If
>> +	 * userspace is playing dirty tricky with mapping huge pages later
>> +	 * on this will result in a segmenation fault.
> 
> s/segmenation/segmentation/

ack.
> 
>> +	 */
>> +	if (is_vm_hugetlb_page(vma))
>> +		goto out;
>> +
>> +	rc = -ENXIO;
>> +	page = follow_page(vma, uaddr, FOLL_WRITE);
>> +	if (IS_ERR_OR_NULL(page))
>> +		goto out;
>> +
>> +	lock_page(page);
>> +	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
>> +	rc = make_secure_pte(ptep, uaddr, page, uvcb);
>> +	pte_unmap_unlock(ptep, ptelock);
>> +	unlock_page(page);
>> +out:
>> +	up_read(&gmap->mm->mmap_sem);
>> +
>> +	if (rc == -EAGAIN) {
>> +		wait_on_page_writeback(page);
>> +	} else if (rc == -EBUSY) {
>> +		/*
>> +		 * If we have tried a local drain and the page refcount
>> +		 * still does not match our expected safe value, try with a
>> +		 * system wide drain. This is needed if the pagevecs holding
>> +		 * the page are on a different CPU.
>> +		 */
>> +		if (local_drain) {
>> +			lru_add_drain_all();
> 
> I do wonder if that is valid to be called with all the locks at this point.

This function uses per cpu workers and needs no other locks. Also verified 
with lockdep. 

> 
>> +			/* We give up here, and let the caller try again */
>> +			return -EAGAIN;
>> +		}
>> +		/*
>> +		 * We are here if the page refcount does not match the
>> +		 * expected safe value. The main culprits are usually
>> +		 * pagevecs. With lru_add_drain() we drain the pagevecs
>> +		 * on the local CPU so that hopefully the refcount will
>> +		 * reach the expected safe value.
>> +		 */
>> +		lru_add_drain();
> 
> dito ...

dito. 

> 
>> +		local_drain = 1;
>> +		/* And now we try again immediately after draining */
>> +		goto again;
>> +	} else if (rc == -ENXIO) {
>> +		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
>> +			return -EFAULT;
>> +		return -EAGAIN;
>> +	}
>> +	return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(gmap_make_secure);
>> +
>> +int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
>> +{
>> +	struct uv_cb_cts uvcb = {
>> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
>> +		.header.len = sizeof(uvcb),
>> +		.guest_handle = gmap->guest_handle,
>> +		.gaddr = gaddr,
>> +	};
>> +
>> +	return gmap_make_secure(gmap, gaddr, &uvcb);
>> +}
>> +EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
>> +
>> +/**
>> + * To be called with the page locked or with an extra reference!
> 
> Can we have races here? (IOW, two callers concurrently for the same page)

That would be fine and is part of the design. The ultravisor calls will
either make the page accessible or will be a (mostly) no-op.
In fact, we allow for slight over-indication of "needs to be exported"

What about:

/*
 * To be called with the page locked or with an extra reference! This will
 * prevent gmap_make_secure from touching the page concurrently. Having 2
 * parallel make_page_accessible is fine, as the UV calls will become a 
 * no-op if the page is already exported.
 */


> 
>> + */
>> +int arch_make_page_accessible(struct page *page)
>> +{
>> +	int rc = 0;
>> +
>> +	/* Hugepage cannot be protected, so nothing to do */
>> +	if (PageHuge(page))
>> +		return 0;
>> +
>> +	/*
>> +	 * PG_arch_1 is used in 3 places:
>> +	 * 1. for kernel page tables during early boot
>> +	 * 2. for storage keys of huge pages and KVM
>> +	 * 3. As an indication that this page might be secure. This can
>> +	 *    overindicate, e.g. we set the bit before calling
>> +	 *    convert_to_secure.
>> +	 * As secure pages are never huge, all 3 variants can co-exists.
>> +	 */
>> +	if (!test_bit(PG_arch_1, &page->flags))
>> +		return 0;
>> +
>> +	rc = uv_pin_shared(page_to_phys(page));
>> +	if (!rc) {
>> +		clear_bit(PG_arch_1, &page->flags);
>> +		return 0;
>> +	}
> 
> Overall, looks sane to me. (I am mostly concerned about possible races,
> e.g., when two gmaps would be created for a single VM and nasty stuff be
> done with them). But yeah, I guess you guys thought about this ;)



  reply index

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
2020-02-17  9:14   ` David Hildenbrand
2020-02-17 11:10     ` Christian Borntraeger
2020-02-18  8:27       ` David Hildenbrand
2020-02-18 15:46         ` Sean Christopherson
2020-02-18 16:02           ` Will Deacon
2020-02-18 16:15             ` Christian Borntraeger
2020-02-18 21:35               ` Sean Christopherson
2020-02-19  8:31                 ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 02/42] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
2020-02-17  9:43   ` David Hildenbrand
2020-02-20 12:18     ` David Hildenbrand
2020-02-20 13:31     ` Christian Borntraeger
2020-02-20 13:34       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 03/42] s390/protvirt: introduce host side setup Christian Borntraeger
2020-02-17  9:53   ` David Hildenbrand
2020-02-17 11:11     ` Christian Borntraeger
2020-02-17 11:13       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 04/42] s390/protvirt: add ultravisor initialization Christian Borntraeger
2020-02-17  9:57   ` David Hildenbrand
2020-02-17 11:13     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
2020-02-17 10:21   ` David Hildenbrand
2020-02-17 11:28     ` Christian Borntraeger [this message]
2020-02-17 12:07       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 06/42] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 07/42] KVM: s390: protvirt: Add UV debug trace Christian Borntraeger
2020-02-17 10:41   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 08/42] KVM: s390: add new variants of UV CALL Christian Borntraeger
2020-02-17 10:42   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 09/42] KVM: s390: protvirt: Add initial vm and cpu lifecycle handling Christian Borntraeger
2020-02-17 10:56   ` David Hildenbrand
2020-02-17 12:04     ` Christian Borntraeger
2020-02-17 12:09       ` David Hildenbrand
2020-02-17 14:53         ` [PATCH 0/2] example changes Christian Borntraeger
2020-02-17 14:53           ` [PATCH 1/2] lock changes Christian Borntraeger
2020-02-17 14:53           ` [PATCH 2/2] merge vm/cpu create Christian Borntraeger
2020-02-17 15:00             ` Janosch Frank
2020-02-17 15:02               ` Christian Borntraeger
2020-02-19 11:02               ` Christian Borntraeger
2020-02-17 19:18           ` [PATCH 0/2] example changes David Hildenbrand
2020-02-18  8:09     ` [PATCH v2 09/42] KVM: s390: protvirt: Add initial vm and cpu lifecycle handling Christian Borntraeger
2020-02-18  8:39   ` [PATCH v2.1] " Christian Borntraeger
2020-02-18  9:12     ` David Hildenbrand
2020-02-18 21:18       ` Christian Borntraeger
2020-02-19  8:32         ` David Hildenbrand
2020-02-19 11:01       ` Christian Borntraeger
2020-02-18  9:56     ` David Hildenbrand
2020-02-18 20:26       ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 10/42] KVM: s390: protvirt: Add KVM api documentation Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 11/42] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 12/42] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 13/42] KVM: s390: protvirt: Handle SE notification interceptions Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 14/42] KVM: s390: protvirt: Instruction emulation Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 15/42] KVM: s390: protvirt: Add interruption injection controls Christian Borntraeger
2020-02-17 10:59   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 16/42] KVM: s390: protvirt: Implement interruption injection Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 17/42] KVM: s390: protvirt: Add SCLP interrupt handling Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 18/42] KVM: s390: protvirt: Handle spec exception loops Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 19/42] KVM: s390: protvirt: Add new gprs location handling Christian Borntraeger
2020-02-17 11:01   ` David Hildenbrand
2020-02-17 11:33     ` Christian Borntraeger
2020-02-17 14:37     ` Janosch Frank
2020-02-14 22:26 ` [PATCH v2 20/42] KVM: S390: protvirt: Introduce instruction data area bounce buffer Christian Borntraeger
2020-02-17 11:08   ` David Hildenbrand
2020-02-17 14:47     ` Janosch Frank
2020-02-17 15:00       ` Christian Borntraeger
2020-02-17 15:38         ` Janosch Frank
2020-02-17 16:58           ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 21/42] KVM: s390: protvirt: handle secure guest prefix pages Christian Borntraeger
2020-02-17 11:11   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 22/42] KVM: s390/mm: handle guest unpin events Christian Borntraeger
2020-02-17 14:23   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 23/42] KVM: s390: protvirt: Write sthyi data to instruction data area Christian Borntraeger
2020-02-17 14:24   ` David Hildenbrand
2020-02-17 18:40     ` Christian Borntraeger
2020-02-17 19:16       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 24/42] KVM: s390: protvirt: STSI handling Christian Borntraeger
2020-02-18  8:35   ` David Hildenbrand
2020-02-18  8:44     ` Christian Borntraeger
2020-02-18  9:08       ` David Hildenbrand
2020-02-18  9:11         ` Christian Borntraeger
2020-02-18  9:13           ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 25/42] KVM: s390: protvirt: disallow one_reg Christian Borntraeger
2020-02-18  8:40   ` David Hildenbrand
2020-02-18  8:57     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 26/42] KVM: s390: protvirt: Do only reset registers that are accessible Christian Borntraeger
2020-02-18  8:42   ` David Hildenbrand
2020-02-18  9:20     ` Christian Borntraeger
2020-02-18  9:28       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 27/42] KVM: s390: protvirt: Only sync fmt4 registers Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 28/42] KVM: s390: protvirt: Add program exception injection Christian Borntraeger
2020-02-18  9:33   ` David Hildenbrand
2020-02-18  9:37     ` Christian Borntraeger
2020-02-18  9:39       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 29/42] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Christian Borntraeger
2020-02-18  9:38   ` David Hildenbrand
2020-02-19 12:45     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 30/42] KVM: s390: protvirt: UV calls in support of diag308 0, 1 Christian Borntraeger
2020-02-18  9:44   ` David Hildenbrand
2020-02-19 11:53     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 31/42] KVM: s390: protvirt: Report CPU state to Ultravisor Christian Borntraeger
2020-02-18  9:48   ` David Hildenbrand
2020-02-19 19:36     ` Christian Borntraeger
2020-02-19 19:46       ` Christian Borntraeger
2020-02-20 10:52         ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 32/42] KVM: s390: protvirt: Support cmd 5 operation state Christian Borntraeger
2020-02-18  9:50   ` David Hildenbrand
2020-02-19 11:06     ` Christian Borntraeger
2020-02-19 11:08       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 33/42] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112 Christian Borntraeger
2020-02-18  9:53   ` David Hildenbrand
2020-02-18 10:02     ` David Hildenbrand
2020-02-18 10:05     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 34/42] KVM: s390: protvirt: do not inject interrupts after start Christian Borntraeger
2020-02-18  9:53   ` David Hildenbrand
2020-02-18 10:02     ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 35/42] KVM: s390: protvirt: Add UV cpu reset calls Christian Borntraeger
2020-02-18  9:54   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 36/42] DOCUMENTATION: Protected virtual machine introduction and IPL Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 37/42] s390/uv: Fix handling of length extensions (already in s390 tree) Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 38/42] s390: protvirt: Add sysfs firmware interface for Ultravisor information Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases Christian Borntraeger
2020-02-18 16:25   ` Will Deacon
2020-02-18 16:30     ` Christian Borntraeger
2020-02-18 16:33       ` Will Deacon
2020-02-14 22:26 ` [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication Christian Borntraeger
2020-02-17 14:15   ` Ulrich Weigand
2020-02-17 14:38     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 41/42] potential fixup for "s390/mm: provide memory management functions for protected KVM guests" Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 42/42] KVM: s390: rstify new ioctls in api.rst Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87742e00-9a8d-b7b4-ab96-05e8c9d39534@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.vnet.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mimu@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git