From: Michael Kelley <mikelley@microsoft.com>
To: Nuno Das Neves <nunodasneves@linux.microsoft.com>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
Cc: "virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"viremana@linux.microsoft.com" <viremana@linux.microsoft.com>,
Sunil Muthuswamy <sunilmut@microsoft.com>,
"wei.liu@kernel.org" <wei.liu@kernel.org>,
Lillian Grassin-Drake <Lillian.GrassinDrake@microsoft.com>,
KY Srinivasan <kys@microsoft.com>
Subject: RE: [RFC PATCH 08/18] virt/mshv: map and unmap guest memory
Date: Mon, 8 Mar 2021 19:30:00 +0000 [thread overview]
Message-ID: <MWHPR21MB15934FDC8DBE4088E8227AAFD7939@MWHPR21MB1593.namprd21.prod.outlook.com> (raw)
In-Reply-To: <d63330fa-de83-85de-c8ec-74cc90d680e3@linux.microsoft.com>
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Monday, March 8, 2021 11:14 AM
>
> On 2/8/2021 11:45 AM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, November
> 20, 2020 4:30 PM
> >>
[snip]
> >> @@ -245,16 +249,318 @@ hv_call_delete_partition(u64 partition_id)
> >> return -hv_status_to_errno(status);
> >> }
> >>
> >> +static int
> >> +hv_call_map_gpa_pages(u64 partition_id,
> >> + u64 gpa_target,
> >> + u64 page_count, u32 flags,
> >> + struct page **pages)
> >> +{
> >> + struct hv_map_gpa_pages *input_page;
> >> + int status;
> >> + int i;
> >> + struct page **p;
> >> + u32 completed = 0;
> >> + u64 hypercall_status;
> >> + unsigned long remaining = page_count;
> >> + int rep_count;
> >> + unsigned long irq_flags;
> >> + int ret = 0;
> >> +
> >> + while (remaining) {
> >> +
> >> + rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE);
> >> +
> >> + local_irq_save(irq_flags);
> >> + input_page = (struct hv_map_gpa_pages *)(*this_cpu_ptr(
> >> + hyperv_pcpu_input_arg));
> >> +
> >> + input_page->target_partition_id = partition_id;
> >> + input_page->target_gpa_base = gpa_target;
> >> + input_page->map_flags = flags;
> >> +
> >> + for (i = 0, p = pages; i < rep_count; i++, p++)
> >> + input_page->source_gpa_page_list[i] =
> >> + page_to_pfn(*p) & HV_MAP_GPA_MASK;
> >
> > The masking seems a bit weird. The mask allows for up to 64G page frames,
> > which is 256 Tbytes of total physical memory, which is probably the current
> > Hyper-V limit on memory size (48 bit physical address space, though 52 bit
> > physical address spaces are coming). So the masking shouldn't ever be doing
> > anything. And if it was doing something, that probably should be treated as
> > an error rather than simply dropping the high bits.
>
> Good point - It looks like the mask isn't needed.
>
> >
> > Note that this code does not handle the case where PAGE_SIZE !=
> > HV_HYP_PAGE_SIZE. But maybe we'll never run the root partition with a
> > page size other than 4K.
> >
>
> For now on x86 it won't happen, but maybe on ARM?
> It shouldn't be hard to support this case, especially since
> PAGE_SIZE >= HV_HYP_PAGE_SIZE. Do you think we need it in this patch set?
No, from my perspective, this case does not need to be handled in
this patch set.
>
> >> + hypercall_status = hv_do_rep_hypercall(
> >> + HVCALL_MAP_GPA_PAGES, rep_count, 0, input_page, NULL);
> >> + local_irq_restore(irq_flags);
> >> +
> >> + status = hypercall_status & HV_HYPERCALL_RESULT_MASK;
> >> + completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >>
> >> + HV_HYPERCALL_REP_COMP_OFFSET;
> >> +
> >> + if (status == HV_STATUS_INSUFFICIENT_MEMORY) {
> >> + ret = hv_call_deposit_pages(NUMA_NO_NODE,
> >> + partition_id, 256);
> >
> > Why adding 256 pages? I'm just contrasting with other places that add
> > 1 page at a time. Maybe a comment to explain ....
> >
>
> Empirically determined. I'll add a #define and comment.
>
> >> + if (ret)
> >> + break;
> >> + } else if (status != HV_STATUS_SUCCESS) {
> >> + pr_err("%s: completed %llu out of %llu, %s\n",
> >> + __func__,
> >> + page_count - remaining, page_count,
> >> + hv_status_to_string(status));
> >> + ret = -hv_status_to_errno(status);
> >> + break;
> >> + }
> >> +
> >> + pages += completed;
> >> + remaining -= completed;
> >> + gpa_target += completed;
> >> + }
> >> +
> >> + if (ret && completed) {
> >
> > Is the above the right test? Completed could be zero from the most
> > recent iteration, but still could be partially succeeded based on a previous
> > successful iteration. I think this needs to check whether remaining equals
> > page_count.
> >
>
> You're right; I'll change it to (ret && remaining < page_count)
>
> >> + pr_err("%s: Partially succeeded; mapped regions may be in invalid state",
> >> + __func__);
> >> + ret = -EBADFD;
> >> + }
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int
> >> +hv_call_unmap_gpa_pages(u64 partition_id,
> >> + u64 gpa_target,
> >> + u64 page_count, u32 flags)
> >> +{
> >> + struct hv_unmap_gpa_pages *input_page;
> >> + int status;
> >> + int ret = 0;
> >> + u32 completed = 0;
> >> + u64 hypercall_status;
> >> + unsigned long remaining = page_count;
> >> + int rep_count;
> >> + unsigned long irq_flags;
> >> +
> >> + local_irq_save(irq_flags);
> >> + input_page = (struct hv_unmap_gpa_pages *)(*this_cpu_ptr(
> >> + hyperv_pcpu_input_arg));
> >> +
> >> + input_page->target_partition_id = partition_id;
> >> + input_page->target_gpa_base = gpa_target;
> >> + input_page->unmap_flags = flags;
> >> +
> >> + while (remaining) {
> >> + rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE);
> >> + hypercall_status = hv_do_rep_hypercall(
> >> + HVCALL_UNMAP_GPA_PAGES, rep_count, 0, input_page, NULL);
> >
> > Similarly, this code doesn't handle PAGE_SIZE != HV_HYP_PAGE_SIZE.
> >
>
> As above - do we need this for this patch set? This won't happen on x86.
Again, not needed from my perspective.
>
> >> + status = hypercall_status & HV_HYPERCALL_RESULT_MASK;
> >> + completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >>
> >> + HV_HYPERCALL_REP_COMP_OFFSET;
> >> + if (status != HV_STATUS_SUCCESS) {
> >> + pr_err("%s: completed %llu out of %llu, %s\n",
> >> + __func__,
> >> + page_count - remaining, page_count,
> >> + hv_status_to_string(status));
> >> + ret = -hv_status_to_errno(status);
> >> + break;
> >> + }
> >> +
> >> + remaining -= completed;
> >> + gpa_target += completed;
> >> + input_page->target_gpa_base = gpa_target;
> >> + }
> >> + local_irq_restore(irq_flags);
> >
> > I have some concern about holding interrupts disabled for this long.
> >
>
> How about I move the interrupt enabling/disabling inside the loop? i.e.:
> while (remaining) {
> local_irq_save(irq_flags);
> input_page = (struct hv_unmap_gpa_pages *)(*this_cpu_ptr(
> hyperv_pcpu_input_arg));
>
> input_page->target_partition_id = partition_id;
> input_page->target_gpa_base = gpa_target;
> input_page->unmap_flags = flags;
> rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE);
> status = hv_do_rep_hypercall(
> HVCALL_UNMAP_GPA_PAGES, rep_count, 0, input_page, NULL);
> local_irq_restore(irq_flags);
>
> completed = (status & HV_HYPERCALL_REP_COMP_MASK) >>
> HV_HYPERCALL_REP_COMP_OFFSET;
> status &= HV_HYPERCALL_RESULT_MASK;
> if (status != HV_STATUS_SUCCESS) {
> pr_err("%s: completed %llu out of %llu, %s\n",
> __func__,
> page_count - remaining, page_count,
> hv_status_to_string(status));
> ret = hv_status_to_errno(status);
> break;
> }
>
> remaining -= completed;
> gpa_target += completed;
> }
>
>
Yes, that would help.
> >> +
> >> + if (ret && completed) {
> >
> > Same comment as before.
> >
>
> Ditto as above.
>
> >> + pr_err("%s: Partially succeeded; mapped regions may be in invalid state",
> >> + __func__);
> >> + ret = -EBADFD;
> >> + }
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static long
> >> +mshv_partition_ioctl_map_memory(struct mshv_partition *partition,
> >> + struct mshv_user_mem_region __user *user_mem)
> >> +{
> >> + struct mshv_user_mem_region mem;
> >> + struct mshv_mem_region *region;
> >> + int completed;
> >> + unsigned long remaining, batch_size;
> >> + int i;
> >> + struct page **pages;
> >> + u64 page_count, user_start, user_end, gpfn_start, gpfn_end;
> >> + u64 region_page_count, region_user_start, region_user_end;
> >> + u64 region_gpfn_start, region_gpfn_end;
> >> + long ret = 0;
> >> +
> >> + /* Check we have enough slots*/
> >> + if (partition->regions.count == MSHV_MAX_MEM_REGIONS) {
> >> + pr_err("%s: not enough memory region slots\n", __func__);
> >> + return -ENOSPC;
> >> + }
> >> +
> >> + if (copy_from_user(&mem, user_mem, sizeof(mem)))
> >> + return -EFAULT;
> >> +
> >> + if (!mem.size ||
> >> + mem.size & (PAGE_SIZE - 1) ||
> >> + mem.userspace_addr & (PAGE_SIZE - 1) ||
> >
> > There's a PAGE_ALIGNED macro that expresses exactly what
> > each of the previous two tests is doing.
> >
>
> Since these need to be HV_HYP_PAGE_SIZE aligned, I will add a
> HV_HYP_PAGE_ALIGNED macro for this.
I was thinking that PAGE_SIZE and PAGE_ALIGNED are correct. If
this code were running on an ARM64 system with a 64K page
size, the 64K alignment would be fine and will make sense from
the user space perspective. You don't want to be mapping part
of a user space page. And 64K alignment will certainly satisfy
Hyper-V's requirement for 4K alignment. The real requirement
from Hyper-V's standpoint is that the alignment not be smaller
than 4K. But maybe I'm misunderstanding.
Michael
next prev parent reply other threads:[~2021-03-08 19:30 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-21 0:30 [RFC PATCH 00/18] Microsoft Hypervisor root partition ioctl interface Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 01/18] x86/hyperv: convert hyperv statuses to linux error codes Nuno Das Neves
2021-02-09 13:04 ` Vitaly Kuznetsov
2021-03-04 18:24 ` Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 02/18] asm-generic/hyperv: convert hyperv statuses to strings Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 03/18] virt/mshv: minimal mshv module (/dev/mshv/) Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 04/18] virt/mshv: request version ioctl Nuno Das Neves
2021-02-08 19:41 ` Michael Kelley
2021-03-04 21:35 ` Nuno Das Neves
2021-02-09 13:11 ` Vitaly Kuznetsov
2021-03-04 18:43 ` Nuno Das Neves
2021-03-05 9:18 ` Vitaly Kuznetsov
2021-04-07 0:21 ` Nuno Das Neves
2021-04-07 7:38 ` Vitaly Kuznetsov
2021-04-07 13:43 ` Wei Liu
2021-04-07 14:02 ` Vitaly Kuznetsov
2021-04-07 14:19 ` Wei Liu
2020-11-21 0:30 ` [RFC PATCH 05/18] virt/mshv: create partition ioctl Nuno Das Neves
2021-02-09 13:15 ` Vitaly Kuznetsov
2021-03-04 18:44 ` Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 06/18] virt/mshv: create, initialize, finalize, delete partition hypercalls Nuno Das Neves
2021-02-08 19:42 ` Michael Kelley
2021-03-04 23:49 ` Nuno Das Neves
2021-03-04 23:58 ` Michael Kelley
2020-11-21 0:30 ` [RFC PATCH 07/18] virt/mshv: withdraw memory hypercall Nuno Das Neves
2021-02-08 19:44 ` Michael Kelley
2021-03-05 21:01 ` Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 08/18] virt/mshv: map and unmap guest memory Nuno Das Neves
2021-02-08 19:45 ` Michael Kelley
2021-03-08 19:14 ` Nuno Das Neves
2021-03-08 19:30 ` Michael Kelley [this message]
2020-11-21 0:30 ` [RFC PATCH 09/18] virt/mshv: create vcpu ioctl Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 10/18] virt/mshv: get and set vcpu registers ioctls Nuno Das Neves
2021-02-08 19:47 ` Michael Kelley
2021-03-09 1:39 ` Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 11/18] virt/mshv: set up synic pages for intercept messages Nuno Das Neves
2021-02-08 19:47 ` Michael Kelley
2021-03-11 19:37 ` Nuno Das Neves
2021-03-11 20:45 ` Michael Kelley
2020-11-21 0:30 ` [RFC PATCH 12/18] virt/mshv: run vp ioctl and isr Nuno Das Neves
2020-11-24 16:15 ` Wei Liu
2020-11-21 0:30 ` [RFC PATCH 13/18] virt/mshv: install intercept ioctl Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 14/18] virt/mshv: assert interrupt ioctl Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 15/18] virt/mshv: get and set vp state ioctls Nuno Das Neves
2021-02-08 19:48 ` Michael Kelley
2021-03-11 23:38 ` Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 16/18] virt/mshv: mmap vp register page Nuno Das Neves
2021-02-08 19:49 ` Michael Kelley
2021-03-25 17:36 ` Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 17/18] virt/mshv: get and set partition property ioctls Nuno Das Neves
2020-11-21 0:30 ` [RFC PATCH 18/18] virt/mshv: Add enlightenment bits to create partition ioctl Nuno Das Neves
2020-11-24 16:18 ` [RFC PATCH 00/18] Microsoft Hypervisor root partition ioctl interface Wei Liu
2021-02-08 19:40 ` Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MWHPR21MB15934FDC8DBE4088E8227AAFD7939@MWHPR21MB1593.namprd21.prod.outlook.com \
--to=mikelley@microsoft.com \
--cc=Lillian.GrassinDrake@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nunodasneves@linux.microsoft.com \
--cc=sunilmut@microsoft.com \
--cc=viremana@linux.microsoft.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).