KVM Archive on lore.kernel.org
 help / color / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>,
	Janosch Frank <frankja@linux.vnet.ibm.com>
Cc: KVM <kvm@vger.kernel.org>, Cornelia Huck <cohuck@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Ulrich Weigand <Ulrich.Weigand@de.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Michael Mueller <mimu@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests
Date: Mon, 17 Feb 2020 11:21:44 +0100
Message-ID: <f5523486-ee76-e6c1-9563-658bca7f3b0d@redhat.com> (raw)
In-Reply-To: <20200214222658.12946-6-borntraeger@de.ibm.com>

> diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
> index 85e944f04c70..4ebcf891ff3c 100644
> --- a/arch/s390/include/asm/page.h
> +++ b/arch/s390/include/asm/page.h
> @@ -153,6 +153,11 @@ static inline int devmem_is_allowed(unsigned long pfn)
>  #define HAVE_ARCH_FREE_PAGE
>  #define HAVE_ARCH_ALLOC_PAGE
>  
> +#if IS_ENABLED(CONFIG_PGSTE)
> +int arch_make_page_accessible(struct page *page);
> +#define HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +#endif
> +

Feels like this should have been one of the (CONFIG_)ARCH_HAVE_XXX
thingies defined via kconfig instead.

E.g., like (CONFIG_)HAVE_ARCH_TRANSPARENT_HUGEPAGE

[...]

> +
> +/*
> + * Requests the Ultravisor to encrypt a guest page and make it
> + * accessible to the host for paging (export).
> + *
> + * @paddr: Absolute host address of page to be exported
> + */
> +int uv_convert_from_secure(unsigned long paddr)
> +{
> +	struct uv_cb_cfs uvcb = {
> +		.header.cmd = UVC_CMD_CONV_FROM_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.paddr = paddr
> +	};
> +
> +	if (uv_call(0, (u64)&uvcb))
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +/*
> + * Calculate the expected ref_count for a page that would otherwise have no
> + * further pins. This was cribbed from similar functions in other places in
> + * the kernel, but with some slight modifications. We know that a secure
> + * page can not be a huge page for example.

s/ca not cannot/

> + */
> +static int expected_page_refs(struct page *page)
> +{
> +	int res;
> +
> +	res = page_mapcount(page);
> +	if (PageSwapCache(page)) {
> +		res++;
> +	} else if (page_mapping(page)) {
> +		res++;
> +		if (page_has_private(page))
> +			res++;
> +	}
> +	return res;
> +}
> +
> +static int make_secure_pte(pte_t *ptep, unsigned long addr,
> +			   struct page *exp_page, struct uv_cb_header *uvcb)
> +{
> +	pte_t entry = READ_ONCE(*ptep);
> +	struct page *page;
> +	int expected, rc = 0;
> +
> +	if (!pte_present(entry))
> +		return -ENXIO;
> +	if (pte_val(entry) & _PAGE_INVALID)
> +		return -ENXIO;
> +
> +	page = pte_page(entry);
> +	if (page != exp_page)
> +		return -ENXIO;
> +	if (PageWriteback(page))
> +		return -EAGAIN;
> +	expected = expected_page_refs(page);
> +	if (!page_ref_freeze(page, expected))
> +		return -EBUSY;
> +	set_bit(PG_arch_1, &page->flags);
> +	rc = uv_call(0, (u64)uvcb);
> +	page_ref_unfreeze(page, expected);
> +	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
> +	if (rc)
> +		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
> +	return rc;
> +}
> +
> +/*
> + * Requests the Ultravisor to make a page accessible to a guest.
> + * If it's brought in the first time, it will be cleared. If
> + * it has been exported before, it will be decrypted and integrity
> + * checked.
> + */
> +int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> +{
> +	struct vm_area_struct *vma;
> +	unsigned long uaddr;
> +	struct page *page;
> +	int rc, local_drain = 0;

local_drain could have been a bool.

> +	spinlock_t *ptelock;
> +	pte_t *ptep;
> +
> +again:
> +	rc = -EFAULT;
> +	down_read(&gmap->mm->mmap_sem);
> +
> +	uaddr = __gmap_translate(gmap, gaddr);
> +	if (IS_ERR_VALUE(uaddr))
> +		goto out;
> +	vma = find_vma(gmap->mm, uaddr);
> +	if (!vma)
> +		goto out;
> +	/*
> +	 * Secure pages cannot be huge and userspace should not combine both.
> +	 * In case userspace does it anyway this will result in an -EFAULT for
> +	 * the unpack. The guest is thus never reaching secure mode. If
> +	 * userspace is playing dirty tricky with mapping huge pages later
> +	 * on this will result in a segmenation fault.

s/segmenation/segmentation/

> +	 */
> +	if (is_vm_hugetlb_page(vma))
> +		goto out;
> +
> +	rc = -ENXIO;
> +	page = follow_page(vma, uaddr, FOLL_WRITE);
> +	if (IS_ERR_OR_NULL(page))
> +		goto out;
> +
> +	lock_page(page);
> +	ptep = get_locked_pte(gmap->mm, uaddr, &ptelock);
> +	rc = make_secure_pte(ptep, uaddr, page, uvcb);
> +	pte_unmap_unlock(ptep, ptelock);
> +	unlock_page(page);
> +out:
> +	up_read(&gmap->mm->mmap_sem);
> +
> +	if (rc == -EAGAIN) {
> +		wait_on_page_writeback(page);
> +	} else if (rc == -EBUSY) {
> +		/*
> +		 * If we have tried a local drain and the page refcount
> +		 * still does not match our expected safe value, try with a
> +		 * system wide drain. This is needed if the pagevecs holding
> +		 * the page are on a different CPU.
> +		 */
> +		if (local_drain) {
> +			lru_add_drain_all();

I do wonder if that is valid to be called with all the locks at this point.

> +			/* We give up here, and let the caller try again */
> +			return -EAGAIN;
> +		}
> +		/*
> +		 * We are here if the page refcount does not match the
> +		 * expected safe value. The main culprits are usually
> +		 * pagevecs. With lru_add_drain() we drain the pagevecs
> +		 * on the local CPU so that hopefully the refcount will
> +		 * reach the expected safe value.
> +		 */
> +		lru_add_drain();

dito ...

> +		local_drain = 1;
> +		/* And now we try again immediately after draining */
> +		goto again;
> +	} else if (rc == -ENXIO) {
> +		if (gmap_fault(gmap, gaddr, FAULT_FLAG_WRITE))
> +			return -EFAULT;
> +		return -EAGAIN;
> +	}
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(gmap_make_secure);
> +
> +int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr)
> +{
> +	struct uv_cb_cts uvcb = {
> +		.header.cmd = UVC_CMD_CONV_TO_SEC_STOR,
> +		.header.len = sizeof(uvcb),
> +		.guest_handle = gmap->guest_handle,
> +		.gaddr = gaddr,
> +	};
> +
> +	return gmap_make_secure(gmap, gaddr, &uvcb);
> +}
> +EXPORT_SYMBOL_GPL(gmap_convert_to_secure);
> +
> +/**
> + * To be called with the page locked or with an extra reference!

Can we have races here? (IOW, two callers concurrently for the same page)

> + */
> +int arch_make_page_accessible(struct page *page)
> +{
> +	int rc = 0;
> +
> +	/* Hugepage cannot be protected, so nothing to do */
> +	if (PageHuge(page))
> +		return 0;
> +
> +	/*
> +	 * PG_arch_1 is used in 3 places:
> +	 * 1. for kernel page tables during early boot
> +	 * 2. for storage keys of huge pages and KVM
> +	 * 3. As an indication that this page might be secure. This can
> +	 *    overindicate, e.g. we set the bit before calling
> +	 *    convert_to_secure.
> +	 * As secure pages are never huge, all 3 variants can co-exists.
> +	 */
> +	if (!test_bit(PG_arch_1, &page->flags))
> +		return 0;
> +
> +	rc = uv_pin_shared(page_to_phys(page));
> +	if (!rc) {
> +		clear_bit(PG_arch_1, &page->flags);
> +		return 0;
> +	}

Overall, looks sane to me. (I am mostly concerned about possible races,
e.g., when two gmaps would be created for a single VM and nasty stuff be
done with them). But yeah, I guess you guys thought about this ;)

-- 
Thanks,

David / dhildenb


  reply index

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 22:26 [PATCH v2 00/42] KVM: s390: Add support for protected VMs Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 01/42] mm:gup/writeback: add callbacks for inaccessible pages Christian Borntraeger
2020-02-17  9:14   ` David Hildenbrand
2020-02-17 11:10     ` Christian Borntraeger
2020-02-18  8:27       ` David Hildenbrand
2020-02-18 15:46         ` Sean Christopherson
2020-02-18 16:02           ` Will Deacon
2020-02-18 16:15             ` Christian Borntraeger
2020-02-18 21:35               ` Sean Christopherson
2020-02-19  8:31                 ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 02/42] KVM: s390/interrupt: do not pin adapter interrupt pages Christian Borntraeger
2020-02-17  9:43   ` David Hildenbrand
2020-02-20 12:18     ` David Hildenbrand
2020-02-20 13:31     ` Christian Borntraeger
2020-02-20 13:34       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 03/42] s390/protvirt: introduce host side setup Christian Borntraeger
2020-02-17  9:53   ` David Hildenbrand
2020-02-17 11:11     ` Christian Borntraeger
2020-02-17 11:13       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 04/42] s390/protvirt: add ultravisor initialization Christian Borntraeger
2020-02-17  9:57   ` David Hildenbrand
2020-02-17 11:13     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 05/42] s390/mm: provide memory management functions for protected KVM guests Christian Borntraeger
2020-02-17 10:21   ` David Hildenbrand [this message]
2020-02-17 11:28     ` Christian Borntraeger
2020-02-17 12:07       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 06/42] s390/mm: add (non)secure page access exceptions handlers Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 07/42] KVM: s390: protvirt: Add UV debug trace Christian Borntraeger
2020-02-17 10:41   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 08/42] KVM: s390: add new variants of UV CALL Christian Borntraeger
2020-02-17 10:42   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 09/42] KVM: s390: protvirt: Add initial vm and cpu lifecycle handling Christian Borntraeger
2020-02-17 10:56   ` David Hildenbrand
2020-02-17 12:04     ` Christian Borntraeger
2020-02-17 12:09       ` David Hildenbrand
2020-02-17 14:53         ` [PATCH 0/2] example changes Christian Borntraeger
2020-02-17 14:53           ` [PATCH 1/2] lock changes Christian Borntraeger
2020-02-17 14:53           ` [PATCH 2/2] merge vm/cpu create Christian Borntraeger
2020-02-17 15:00             ` Janosch Frank
2020-02-17 15:02               ` Christian Borntraeger
2020-02-19 11:02               ` Christian Borntraeger
2020-02-17 19:18           ` [PATCH 0/2] example changes David Hildenbrand
2020-02-18  8:09     ` [PATCH v2 09/42] KVM: s390: protvirt: Add initial vm and cpu lifecycle handling Christian Borntraeger
2020-02-18  8:39   ` [PATCH v2.1] " Christian Borntraeger
2020-02-18  9:12     ` David Hildenbrand
2020-02-18 21:18       ` Christian Borntraeger
2020-02-19  8:32         ` David Hildenbrand
2020-02-19 11:01       ` Christian Borntraeger
2020-02-18  9:56     ` David Hildenbrand
2020-02-18 20:26       ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 10/42] KVM: s390: protvirt: Add KVM api documentation Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 11/42] KVM: s390: protvirt: Secure memory is not mergeable Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 12/42] KVM: s390/mm: Make pages accessible before destroying the guest Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 13/42] KVM: s390: protvirt: Handle SE notification interceptions Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 14/42] KVM: s390: protvirt: Instruction emulation Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 15/42] KVM: s390: protvirt: Add interruption injection controls Christian Borntraeger
2020-02-17 10:59   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 16/42] KVM: s390: protvirt: Implement interruption injection Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 17/42] KVM: s390: protvirt: Add SCLP interrupt handling Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 18/42] KVM: s390: protvirt: Handle spec exception loops Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 19/42] KVM: s390: protvirt: Add new gprs location handling Christian Borntraeger
2020-02-17 11:01   ` David Hildenbrand
2020-02-17 11:33     ` Christian Borntraeger
2020-02-17 14:37     ` Janosch Frank
2020-02-14 22:26 ` [PATCH v2 20/42] KVM: S390: protvirt: Introduce instruction data area bounce buffer Christian Borntraeger
2020-02-17 11:08   ` David Hildenbrand
2020-02-17 14:47     ` Janosch Frank
2020-02-17 15:00       ` Christian Borntraeger
2020-02-17 15:38         ` Janosch Frank
2020-02-17 16:58           ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 21/42] KVM: s390: protvirt: handle secure guest prefix pages Christian Borntraeger
2020-02-17 11:11   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 22/42] KVM: s390/mm: handle guest unpin events Christian Borntraeger
2020-02-17 14:23   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 23/42] KVM: s390: protvirt: Write sthyi data to instruction data area Christian Borntraeger
2020-02-17 14:24   ` David Hildenbrand
2020-02-17 18:40     ` Christian Borntraeger
2020-02-17 19:16       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 24/42] KVM: s390: protvirt: STSI handling Christian Borntraeger
2020-02-18  8:35   ` David Hildenbrand
2020-02-18  8:44     ` Christian Borntraeger
2020-02-18  9:08       ` David Hildenbrand
2020-02-18  9:11         ` Christian Borntraeger
2020-02-18  9:13           ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 25/42] KVM: s390: protvirt: disallow one_reg Christian Borntraeger
2020-02-18  8:40   ` David Hildenbrand
2020-02-18  8:57     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 26/42] KVM: s390: protvirt: Do only reset registers that are accessible Christian Borntraeger
2020-02-18  8:42   ` David Hildenbrand
2020-02-18  9:20     ` Christian Borntraeger
2020-02-18  9:28       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 27/42] KVM: s390: protvirt: Only sync fmt4 registers Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 28/42] KVM: s390: protvirt: Add program exception injection Christian Borntraeger
2020-02-18  9:33   ` David Hildenbrand
2020-02-18  9:37     ` Christian Borntraeger
2020-02-18  9:39       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 29/42] KVM: s390: protvirt: Add diag 308 subcode 8 - 10 handling Christian Borntraeger
2020-02-18  9:38   ` David Hildenbrand
2020-02-19 12:45     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 30/42] KVM: s390: protvirt: UV calls in support of diag308 0, 1 Christian Borntraeger
2020-02-18  9:44   ` David Hildenbrand
2020-02-19 11:53     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 31/42] KVM: s390: protvirt: Report CPU state to Ultravisor Christian Borntraeger
2020-02-18  9:48   ` David Hildenbrand
2020-02-19 19:36     ` Christian Borntraeger
2020-02-19 19:46       ` Christian Borntraeger
2020-02-20 10:52         ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 32/42] KVM: s390: protvirt: Support cmd 5 operation state Christian Borntraeger
2020-02-18  9:50   ` David Hildenbrand
2020-02-19 11:06     ` Christian Borntraeger
2020-02-19 11:08       ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 33/42] KVM: s390: protvirt: Mask PSW interrupt bits for interception 104 and 112 Christian Borntraeger
2020-02-18  9:53   ` David Hildenbrand
2020-02-18 10:02     ` David Hildenbrand
2020-02-18 10:05     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 34/42] KVM: s390: protvirt: do not inject interrupts after start Christian Borntraeger
2020-02-18  9:53   ` David Hildenbrand
2020-02-18 10:02     ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 35/42] KVM: s390: protvirt: Add UV cpu reset calls Christian Borntraeger
2020-02-18  9:54   ` David Hildenbrand
2020-02-14 22:26 ` [PATCH v2 36/42] DOCUMENTATION: Protected virtual machine introduction and IPL Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 37/42] s390/uv: Fix handling of length extensions (already in s390 tree) Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 38/42] s390: protvirt: Add sysfs firmware interface for Ultravisor information Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 39/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: error cases Christian Borntraeger
2020-02-18 16:25   ` Will Deacon
2020-02-18 16:30     ` Christian Borntraeger
2020-02-18 16:33       ` Will Deacon
2020-02-14 22:26 ` [PATCH v2 40/42] example for future extension: mm:gup/writeback: add callbacks for inaccessible pages: source indication Christian Borntraeger
2020-02-17 14:15   ` Ulrich Weigand
2020-02-17 14:38     ` Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 41/42] potential fixup for "s390/mm: provide memory management functions for protected KVM guests" Christian Borntraeger
2020-02-14 22:26 ` [PATCH v2 42/42] KVM: s390: rstify new ioctls in api.rst Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5523486-ee76-e6c1-9563-658bca7f3b0d@redhat.com \
    --to=david@redhat.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=frankja@linux.vnet.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mimu@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git