linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Kelley <mikelley@microsoft.com>
To: Nuno Das Neves <nunodasneves@linux.microsoft.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
Cc: "virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"viremana@linux.microsoft.com" <viremana@linux.microsoft.com>,
	Sunil Muthuswamy <sunilmut@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	Lillian Grassin-Drake <Lillian.GrassinDrake@microsoft.com>,
	KY Srinivasan <kys@microsoft.com>
Subject: RE: [RFC PATCH 08/18] virt/mshv: map and unmap guest memory
Date: Mon, 8 Mar 2021 19:30:00 +0000	[thread overview]
Message-ID: <MWHPR21MB15934FDC8DBE4088E8227AAFD7939@MWHPR21MB1593.namprd21.prod.outlook.com> (raw)
In-Reply-To: <d63330fa-de83-85de-c8ec-74cc90d680e3@linux.microsoft.com>

From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Monday, March 8, 2021 11:14 AM
> 
> On 2/8/2021 11:45 AM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, November
> 20, 2020 4:30 PM
> >>

[snip]

> >> @@ -245,16 +249,318 @@ hv_call_delete_partition(u64 partition_id)
> >>  	return -hv_status_to_errno(status);
> >>  }
> >>
> >> +static int
> >> +hv_call_map_gpa_pages(u64 partition_id,
> >> +		      u64 gpa_target,
> >> +		      u64 page_count, u32 flags,
> >> +		      struct page **pages)
> >> +{
> >> +	struct hv_map_gpa_pages *input_page;
> >> +	int status;
> >> +	int i;
> >> +	struct page **p;
> >> +	u32 completed = 0;
> >> +	u64 hypercall_status;
> >> +	unsigned long remaining = page_count;
> >> +	int rep_count;
> >> +	unsigned long irq_flags;
> >> +	int ret = 0;
> >> +
> >> +	while (remaining) {
> >> +
> >> +		rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE);
> >> +
> >> +		local_irq_save(irq_flags);
> >> +		input_page = (struct hv_map_gpa_pages *)(*this_cpu_ptr(
> >> +			hyperv_pcpu_input_arg));
> >> +
> >> +		input_page->target_partition_id = partition_id;
> >> +		input_page->target_gpa_base = gpa_target;
> >> +		input_page->map_flags = flags;
> >> +
> >> +		for (i = 0, p = pages; i < rep_count; i++, p++)
> >> +			input_page->source_gpa_page_list[i] =
> >> +				page_to_pfn(*p) & HV_MAP_GPA_MASK;
> >
> > The masking seems a bit weird.  The mask allows for up to 64G page frames,
> > which is 256 Tbytes of total physical memory, which is probably the current
> > Hyper-V limit on memory size (48 bit physical address space, though 52 bit
> > physical address spaces are coming).  So the masking shouldn't ever be doing
> > anything.   And if it was doing something, that probably should be treated as
> > an error rather than simply dropping the high bits.
> 
> Good point - It looks like the mask isn't needed.
> 
> >
> > Note that this code does not handle the case where PAGE_SIZE !=
> > HV_HYP_PAGE_SIZE.  But maybe we'll never run the root partition with a
> > page size other than 4K.
> >
> 
> For now on x86 it won't happen, but maybe on ARM?
> It shouldn't be hard to support this case, especially since
> PAGE_SIZE >= HV_HYP_PAGE_SIZE. Do you think we need it in this patch set?

No, from my perspective, this case does not need to be handled in 
this patch set.

> 
> >> +		hypercall_status = hv_do_rep_hypercall(
> >> +			HVCALL_MAP_GPA_PAGES, rep_count, 0, input_page, NULL);
> >> +		local_irq_restore(irq_flags);
> >> +
> >> +		status = hypercall_status & HV_HYPERCALL_RESULT_MASK;
> >> +		completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >>
> >> +				HV_HYPERCALL_REP_COMP_OFFSET;
> >> +
> >> +		if (status == HV_STATUS_INSUFFICIENT_MEMORY) {
> >> +			ret = hv_call_deposit_pages(NUMA_NO_NODE,
> >> +						    partition_id, 256);
> >
> > Why adding 256 pages?  I'm just contrasting with other places that add
> > 1 page at a time.  Maybe a comment to explain ....
> >
> 
> Empirically determined. I'll add a #define and comment.
> 
> >> +			if (ret)
> >> +				break;
> >> +		} else if (status != HV_STATUS_SUCCESS) {
> >> +			pr_err("%s: completed %llu out of %llu, %s\n",
> >> +			       __func__,
> >> +			       page_count - remaining, page_count,
> >> +			       hv_status_to_string(status));
> >> +			ret = -hv_status_to_errno(status);
> >> +			break;
> >> +		}
> >> +
> >> +		pages += completed;
> >> +		remaining -= completed;
> >> +		gpa_target += completed;
> >> +	}
> >> +
> >> +	if (ret && completed) {
> >
> > Is the above the right test?  Completed could be zero from the most
> > recent iteration, but still could be partially succeeded based on a previous
> > successful iteration.   I think this needs to check whether remaining equals
> > page_count.
> >
> 
> You're right; I'll change it to (ret && remaining < page_count)
> 
> >> +		pr_err("%s: Partially succeeded; mapped regions may be in invalid state",
> >> +		       __func__);
> >> +		ret = -EBADFD;
> >> +	}
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static int
> >> +hv_call_unmap_gpa_pages(u64 partition_id,
> >> +			u64 gpa_target,
> >> +			u64 page_count, u32 flags)
> >> +{
> >> +	struct hv_unmap_gpa_pages *input_page;
> >> +	int status;
> >> +	int ret = 0;
> >> +	u32 completed = 0;
> >> +	u64 hypercall_status;
> >> +	unsigned long remaining = page_count;
> >> +	int rep_count;
> >> +	unsigned long irq_flags;
> >> +
> >> +	local_irq_save(irq_flags);
> >> +	input_page = (struct hv_unmap_gpa_pages *)(*this_cpu_ptr(
> >> +		hyperv_pcpu_input_arg));
> >> +
> >> +	input_page->target_partition_id = partition_id;
> >> +	input_page->target_gpa_base = gpa_target;
> >> +	input_page->unmap_flags = flags;
> >> +
> >> +	while (remaining) {
> >> +		rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE);
> >> +		hypercall_status = hv_do_rep_hypercall(
> >> +			HVCALL_UNMAP_GPA_PAGES, rep_count, 0, input_page, NULL);
> >
> > Similarly, this code doesn't handle PAGE_SIZE != HV_HYP_PAGE_SIZE.
> >
> 
> As above - do we need this for this patch set? This won't happen on x86.

Again, not needed from my perspective.

> 
> >> +		status = hypercall_status & HV_HYPERCALL_RESULT_MASK;
> >> +		completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >>
> >> +				HV_HYPERCALL_REP_COMP_OFFSET;
> >> +		if (status != HV_STATUS_SUCCESS) {
> >> +			pr_err("%s: completed %llu out of %llu, %s\n",
> >> +			       __func__,
> >> +			       page_count - remaining, page_count,
> >> +			       hv_status_to_string(status));
> >> +			ret = -hv_status_to_errno(status);
> >> +			break;
> >> +		}
> >> +
> >> +		remaining -= completed;
> >> +		gpa_target += completed;
> >> +		input_page->target_gpa_base = gpa_target;
> >> +	}
> >> +	local_irq_restore(irq_flags);
> >
> > I have some concern about holding interrupts disabled for this long.
> >
> 
> How about I move the interrupt enabling/disabling inside the loop? i.e.:
>         while (remaining) {
>                 local_irq_save(irq_flags);
>                 input_page = (struct hv_unmap_gpa_pages *)(*this_cpu_ptr(
>                         hyperv_pcpu_input_arg));
> 
>                 input_page->target_partition_id = partition_id;
>                 input_page->target_gpa_base = gpa_target;
>                 input_page->unmap_flags = flags;
>                 rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE);
>                 status = hv_do_rep_hypercall(
>                         HVCALL_UNMAP_GPA_PAGES, rep_count, 0, input_page, NULL);
>                 local_irq_restore(irq_flags);
> 
>                 completed = (status & HV_HYPERCALL_REP_COMP_MASK) >>
>                                 HV_HYPERCALL_REP_COMP_OFFSET;
>                 status &= HV_HYPERCALL_RESULT_MASK;
>                 if (status != HV_STATUS_SUCCESS) {
>                         pr_err("%s: completed %llu out of %llu, %s\n",
>                                __func__,
>                                page_count - remaining, page_count,
>                                hv_status_to_string(status));
>                         ret = hv_status_to_errno(status);
>                         break;
>                 }
> 
>                 remaining -= completed;
>                 gpa_target += completed;
>         }
> 
> 

Yes, that would help.

> >> +
> >> +	if (ret && completed) {
> >
> > Same comment as before.
> >
> 
> Ditto as above.
> 
> >> +		pr_err("%s: Partially succeeded; mapped regions may be in invalid state",
> >> +		       __func__);
> >> +		ret = -EBADFD;
> >> +	}
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +static long
> >> +mshv_partition_ioctl_map_memory(struct mshv_partition *partition,
> >> +				struct mshv_user_mem_region __user *user_mem)
> >> +{
> >> +	struct mshv_user_mem_region mem;
> >> +	struct mshv_mem_region *region;
> >> +	int completed;
> >> +	unsigned long remaining, batch_size;
> >> +	int i;
> >> +	struct page **pages;
> >> +	u64 page_count, user_start, user_end, gpfn_start, gpfn_end;
> >> +	u64 region_page_count, region_user_start, region_user_end;
> >> +	u64 region_gpfn_start, region_gpfn_end;
> >> +	long ret = 0;
> >> +
> >> +	/* Check we have enough slots*/
> >> +	if (partition->regions.count == MSHV_MAX_MEM_REGIONS) {
> >> +		pr_err("%s: not enough memory region slots\n", __func__);
> >> +		return -ENOSPC;
> >> +	}
> >> +
> >> +	if (copy_from_user(&mem, user_mem, sizeof(mem)))
> >> +		return -EFAULT;
> >> +
> >> +	if (!mem.size ||
> >> +	    mem.size & (PAGE_SIZE - 1) ||
> >> +	    mem.userspace_addr & (PAGE_SIZE - 1) ||
> >
> > There's a PAGE_ALIGNED macro that expresses exactly what
> > each of the previous two tests is doing.
> >
> 
> Since these need to be HV_HYP_PAGE_SIZE aligned, I will add a
> HV_HYP_PAGE_ALIGNED macro for this.

I was thinking that PAGE_SIZE and PAGE_ALIGNED are correct.   If
this code were running on an ARM64 system with a 64K page
size, the 64K alignment would be fine and will make sense from
the user space perspective.   You don't want to be mapping part
of a user space page.  And 64K alignment will certainly satisfy
Hyper-V's requirement for 4K alignment.  The real requirement
from Hyper-V's standpoint is that the alignment not be smaller
than 4K.  But maybe I'm misunderstanding.

Michael

  reply	other threads:[~2021-03-08 19:30 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-21  0:30 [RFC PATCH 00/18] Microsoft Hypervisor root partition ioctl interface Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 01/18] x86/hyperv: convert hyperv statuses to linux error codes Nuno Das Neves
2021-02-09 13:04   ` Vitaly Kuznetsov
2021-03-04 18:24     ` Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 02/18] asm-generic/hyperv: convert hyperv statuses to strings Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 03/18] virt/mshv: minimal mshv module (/dev/mshv/) Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 04/18] virt/mshv: request version ioctl Nuno Das Neves
2021-02-08 19:41   ` Michael Kelley
2021-03-04 21:35     ` Nuno Das Neves
2021-02-09 13:11   ` Vitaly Kuznetsov
2021-03-04 18:43     ` Nuno Das Neves
2021-03-05  9:18       ` Vitaly Kuznetsov
2021-04-07  0:21         ` Nuno Das Neves
2021-04-07  7:38           ` Vitaly Kuznetsov
2021-04-07 13:43             ` Wei Liu
2021-04-07 14:02               ` Vitaly Kuznetsov
2021-04-07 14:19                 ` Wei Liu
2020-11-21  0:30 ` [RFC PATCH 05/18] virt/mshv: create partition ioctl Nuno Das Neves
2021-02-09 13:15   ` Vitaly Kuznetsov
2021-03-04 18:44     ` Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 06/18] virt/mshv: create, initialize, finalize, delete partition hypercalls Nuno Das Neves
2021-02-08 19:42   ` Michael Kelley
2021-03-04 23:49     ` Nuno Das Neves
2021-03-04 23:58       ` Michael Kelley
2020-11-21  0:30 ` [RFC PATCH 07/18] virt/mshv: withdraw memory hypercall Nuno Das Neves
2021-02-08 19:44   ` Michael Kelley
2021-03-05 21:01     ` Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 08/18] virt/mshv: map and unmap guest memory Nuno Das Neves
2021-02-08 19:45   ` Michael Kelley
2021-03-08 19:14     ` Nuno Das Neves
2021-03-08 19:30       ` Michael Kelley [this message]
2020-11-21  0:30 ` [RFC PATCH 09/18] virt/mshv: create vcpu ioctl Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 10/18] virt/mshv: get and set vcpu registers ioctls Nuno Das Neves
2021-02-08 19:47   ` Michael Kelley
2021-03-09  1:39     ` Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 11/18] virt/mshv: set up synic pages for intercept messages Nuno Das Neves
2021-02-08 19:47   ` Michael Kelley
2021-03-11 19:37     ` Nuno Das Neves
2021-03-11 20:45       ` Michael Kelley
2020-11-21  0:30 ` [RFC PATCH 12/18] virt/mshv: run vp ioctl and isr Nuno Das Neves
2020-11-24 16:15   ` Wei Liu
2020-11-21  0:30 ` [RFC PATCH 13/18] virt/mshv: install intercept ioctl Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 14/18] virt/mshv: assert interrupt ioctl Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 15/18] virt/mshv: get and set vp state ioctls Nuno Das Neves
2021-02-08 19:48   ` Michael Kelley
2021-03-11 23:38     ` Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 16/18] virt/mshv: mmap vp register page Nuno Das Neves
2021-02-08 19:49   ` Michael Kelley
2021-03-25 17:36     ` Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 17/18] virt/mshv: get and set partition property ioctls Nuno Das Neves
2020-11-21  0:30 ` [RFC PATCH 18/18] virt/mshv: Add enlightenment bits to create partition ioctl Nuno Das Neves
2020-11-24 16:18 ` [RFC PATCH 00/18] Microsoft Hypervisor root partition ioctl interface Wei Liu
2021-02-08 19:40 ` Michael Kelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR21MB15934FDC8DBE4088E8227AAFD7939@MWHPR21MB1593.namprd21.prod.outlook.com \
    --to=mikelley@microsoft.com \
    --cc=Lillian.GrassinDrake@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nunodasneves@linux.microsoft.com \
    --cc=sunilmut@microsoft.com \
    --cc=viremana@linux.microsoft.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).