All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: ARM fixes for my improved privcmd patch.
       [not found]           ` <50D074F5.6060202@citrix.com>
@ 2012-12-18 16:07             ` Konrad Rzeszutek Wilk
  2012-12-18 19:34               ` Mats Petersson
  0 siblings, 1 reply; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-12-18 16:07 UTC (permalink / raw)
  To: Mats Petersson; +Cc: xen-devel, Ian Campbell

On Tue, Dec 18, 2012 at 01:51:49PM +0000, Mats Petersson wrote:
> On 18/12/12 11:40, Ian Campbell wrote:
> >On Tue, 2012-12-18 at 11:28 +0000, Mats Petersson wrote:
> >>On 18/12/12 11:17, Ian Campbell wrote:
> >>>On Mon, 2012-12-17 at 17:38 +0000, Mats Petersson wrote:
> >>>>On 17/12/12 16:57, Ian Campbell wrote:
> >>>>>On Fri, 2012-12-14 at 17:00 +0000, Mats Petersson wrote:
> >>>>>>Ian, Konrad:
> >>>>>>I took Konrad's latest version [I think] and applied my patch (which
> >>>>>>needed some adjustments as there are other "post 3.7" changes to same
> >>>>>>source code - including losing the xen_flush_tlb_all() ??)
> >>>>>>
> >>>>>>Attached are the patches:
> >>>>>>arm-enlighten.patch, which updates the ARM code.
> >>>>>>improve-pricmd.patch, which updates the privcmd code.
> >>>>>>
> >>>>>>Ian, can you have a look at the ARM code - which I quickly hacked
> >>>>>>together, I haven't compiled it, and I certainly haven't tested it,
> >>>>>There are a lot of build errors as you might expect (patch below, a few
> >>>>>warnings remain). You can find a cross compiler at
> >>>>>http://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/
> >>>>>
> >>>>>or you can use
> >>>>>drall:/home/ianc/devel/cross/x86_64-gcc-4.6.0-nolibc_arm-unknown-linux-gnueabi.tar.bz2
> >>>>>
> >>>>>which is an older version from the same place.
> >>>>>
> >>>>>Anyway, the patch...
> >>>>>>and
> >>>>>>it needs further changes to make my changes actually make it more
> >>>>>>efficient.
> >>>>>Right, the benefit on PVH or ARM would be in batching the
> >>>>>XENMEM_add_to_physmap_range calls. The batching of the
> >>>>>apply_to_page_range which this patch adds isn't useful because there is
> >>>>>no HYPERVISOR_mmu_update call to batch in this case. So basically this
> >>>>>patch as it stands does a lot of needless work for no gain I'm afraid.
> >>>>So, basically, what is an improvement on x86 isn't anything useful on
> >>>>ARM, and you'd prefer to loop around in privcmd.c calling into
> >>>>xen_remap_domain_mfn_range() a lot of times?
> >>>Not at all. ARM (and PVH) still benefits from the interface change but
> >>>the implementation of the benefit is totally different.
> >>>
> >>>For normal x86 PV you want to batch the HYPERVISOR_mmu_update.
> >>>
> >>>For both x86 PVH and ARM this hypercall  doesn't exist but instead there
> >>>is a call to HYPERVISOR_memory_op XENMEM_add_to_physmap_range which is
> >>>something which would benefit from batching.
> >>So, you want me to fix that up?
> >If you want to sure, yes please.
> >
> >But feel free to just make the existing code work with the interface,
> >without adding any batching. That should be a much smaller change than
> >what you proposed.
> >
> >(aside; I do wonder how much of this x86/arm code could be made generic)
> I think, once it goes to PVH everywhere, quite a bit (as I believe
> the hypercalls should be the same by then, right?)
> 
> In the PVOPS kernel, it's probably a bit more job. I'm sure it can
> be done, but with a bit more work.
> 
> I think I'll do the minimal patch first, then, if I find some spare
> time, work on the "batching" variant.

OK. The batching is IMHO just using the multicall variant.

> >
> >>To make xentrace not work until it is fixed wouldn't be a terrible
> >>thing, would it?
> >On ARM I think it is fine (I doubt this is the only thing stopping
> >xentrace from working). I suspect people would be less impressed with
> >breaking xentrace on x86. For PVH it probably is a requirement for it to
> >keep working, I'm not sure though.
> Ok, ENOSYS it is for remap_range() then.
> >
> >>  Then we can remove that old gunk from x86 as well
> >>(eventually).
> >>Thanks. I was starting to wonder if I'd been teleported back to the time
> >>when I struggled with pointers...
> >>Maybe it needs a better comment.
> >The other thing I had missed was that this was a pure increment and not
> >taking the value at the same time, which also confused me.
> >
> >Splitting the increment out from the dereference usually makes these
> >things clearer, I was obviously just being a bit hard of thinking
> >yesterday!
> No worries. I will see about making a more readable comment (and for
> ARM, I can remove the whole if/else and just do the one increment
> (based on above discussion), so should make the code better.

You can use the v3.8 tree as your base - it has the required PVH and ARM
patches. There is one bug (where dom0 crashes) - and I just sent
a git pull for that in your Linus's tree:

 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-linus-3.8

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-18 16:07             ` ARM fixes for my improved privcmd patch Konrad Rzeszutek Wilk
@ 2012-12-18 19:34               ` Mats Petersson
  2012-12-19 10:59                 ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Mats Petersson @ 2012-12-18 19:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Ian Campbell

[-- Attachment #1: Type: text/plain, Size: 5584 bytes --]

On 18/12/12 16:07, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 18, 2012 at 01:51:49PM +0000, Mats Petersson wrote:
>> On 18/12/12 11:40, Ian Campbell wrote:
>>> On Tue, 2012-12-18 at 11:28 +0000, Mats Petersson wrote:
>>>> On 18/12/12 11:17, Ian Campbell wrote:
>>>>> On Mon, 2012-12-17 at 17:38 +0000, Mats Petersson wrote:
>>>>>> On 17/12/12 16:57, Ian Campbell wrote:
>>>>>>> On Fri, 2012-12-14 at 17:00 +0000, Mats Petersson wrote:
>>>>>>>> Ian, Konrad:
>>>>>>>> I took Konrad's latest version [I think] and applied my patch (which
>>>>>>>> needed some adjustments as there are other "post 3.7" changes to same
>>>>>>>> source code - including losing the xen_flush_tlb_all() ??)
>>>>>>>>
>>>>>>>> Attached are the patches:
>>>>>>>> arm-enlighten.patch, which updates the ARM code.
>>>>>>>> improve-pricmd.patch, which updates the privcmd code.
>>>>>>>>
>>>>>>>> Ian, can you have a look at the ARM code - which I quickly hacked
>>>>>>>> together, I haven't compiled it, and I certainly haven't tested it,
>>>>>>> There are a lot of build errors as you might expect (patch below, a few
>>>>>>> warnings remain). You can find a cross compiler at
>>>>>>> http://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/
>>>>>>>
>>>>>>> or you can use
>>>>>>> drall:/home/ianc/devel/cross/x86_64-gcc-4.6.0-nolibc_arm-unknown-linux-gnueabi.tar.bz2
>>>>>>>
>>>>>>> which is an older version from the same place.
>>>>>>>
>>>>>>> Anyway, the patch...
>>>>>>>> and
>>>>>>>> it needs further changes to make my changes actually make it more
>>>>>>>> efficient.
>>>>>>> Right, the benefit on PVH or ARM would be in batching the
>>>>>>> XENMEM_add_to_physmap_range calls. The batching of the
>>>>>>> apply_to_page_range which this patch adds isn't useful because there is
>>>>>>> no HYPERVISOR_mmu_update call to batch in this case. So basically this
>>>>>>> patch as it stands does a lot of needless work for no gain I'm afraid.
>>>>>> So, basically, what is an improvement on x86 isn't anything useful on
>>>>>> ARM, and you'd prefer to loop around in privcmd.c calling into
>>>>>> xen_remap_domain_mfn_range() a lot of times?
>>>>> Not at all. ARM (and PVH) still benefits from the interface change but
>>>>> the implementation of the benefit is totally different.
>>>>>
>>>>> For normal x86 PV you want to batch the HYPERVISOR_mmu_update.
>>>>>
>>>>> For both x86 PVH and ARM this hypercall  doesn't exist but instead there
>>>>> is a call to HYPERVISOR_memory_op XENMEM_add_to_physmap_range which is
>>>>> something which would benefit from batching.
>>>> So, you want me to fix that up?
>>> If you want to sure, yes please.
>>>
>>> But feel free to just make the existing code work with the interface,
>>> without adding any batching. That should be a much smaller change than
>>> what you proposed.
>>>
>>> (aside; I do wonder how much of this x86/arm code could be made generic)
>> I think, once it goes to PVH everywhere, quite a bit (as I believe
>> the hypercalls should be the same by then, right?)
>>
>> In the PVOPS kernel, it's probably a bit more job. I'm sure it can
>> be done, but with a bit more work.
>>
>> I think I'll do the minimal patch first, then, if I find some spare
>> time, work on the "batching" variant.
> OK. The batching is IMHO just using the multicall variant.
>
>>>> To make xentrace not work until it is fixed wouldn't be a terrible
>>>> thing, would it?
>>> On ARM I think it is fine (I doubt this is the only thing stopping
>>> xentrace from working). I suspect people would be less impressed with
>>> breaking xentrace on x86. For PVH it probably is a requirement for it to
>>> keep working, I'm not sure though.
>> Ok, ENOSYS it is for remap_range() then.
>>>>   Then we can remove that old gunk from x86 as well
>>>> (eventually).
>>>> Thanks. I was starting to wonder if I'd been teleported back to the time
>>>> when I struggled with pointers...
>>>> Maybe it needs a better comment.
>>> The other thing I had missed was that this was a pure increment and not
>>> taking the value at the same time, which also confused me.
>>>
>>> Splitting the increment out from the dereference usually makes these
>>> things clearer, I was obviously just being a bit hard of thinking
>>> yesterday!
>> No worries. I will see about making a more readable comment (and for
>> ARM, I can remove the whole if/else and just do the one increment
>> (based on above discussion), so should make the code better.
> You can use the v3.8 tree as your base - it has the required PVH and ARM
> patches. There is one bug (where dom0 crashes) - and I just sent
> a git pull for that in your Linus's tree:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-linus-3.8
Thanks. I have attached a patch for x86 (+generic changes) and one for arm.
Both compile and the x86 has been tested with localhost migration.

Ian: Can you review the code - I'm fairly sure it does the right thing, 
without too much "extra" code, and achieves "what we want". It's not 
optimized, but if the hypercalls are batched by multicall, it shouldn't 
be horrible. In fact I expect it will be marginally better than before, 
because it saves one level of calls with 7 or so arguments per mapped 
page since we've pushed the loop down a level to the do_remap_mfn() - 
I've kept the structure similar to x86, should we decide to merge the 
code into something common in the future. No doubt it will diverge, but 
starting with something similar gives it a little chance... ;)

Note: the purpose of this post is mainly to confirm that the ARM 
solution is "on track".

--
Mats

[-- Attachment #2: xen-privcmd-remap-array-arm.patch --]
[-- Type: text/x-patch, Size: 5276 bytes --]

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 7a32976..dc50a53 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -73,7 +73,7 @@ static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn,
 }
 
 struct remap_data {
-	xen_pfn_t fgmfn; /* foreign domain's gmfn */
+	xen_pfn_t *fgmfn; /* foreign domain's gmfn */
 	pgprot_t prot;
 	domid_t  domid;
 	struct vm_area_struct *vma;
@@ -90,38 +90,120 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
 	unsigned long pfn = page_to_pfn(page);
 	pte_t pte = pfn_pte(pfn, info->prot);
 
-	if (map_foreign_page(pfn, info->fgmfn, info->domid))
+	// TODO: We should really batch these updates
+	if (map_foreign_page(pfn, *info->fgmfn, info->domid))
 		return -EFAULT;
 	set_pte_at(info->vma->vm_mm, addr, ptep, pte);
+	info->fgmfn++;
 
 	return 0;
 }
 
-int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
-			       unsigned long addr,
-			       xen_pfn_t mfn, int nr,
-			       pgprot_t prot, unsigned domid,
-			       struct page **pages)
+/* do_remap_mfn() - helper function to map foreign pages
+ * @vma:     the vma for the pages to be mapped into
+ * @addr:    the address at which to map the pages
+ * @mfn:     pointer to array of MFNs to map
+ * @nr:      the number entries in the MFN array
+ * @prot:    page protection mask
+ * @domid:   id of the domain that we are mapping from
+ * @pages:   page information (for PVH, not used currently)
+ *
+ * This function takes an array of mfns and maps nr pages from that into
+ * this kernel's memory. The owner of the pages is defined by domid. Where the
+ * pages are mapped is determined by addr, and vma is used for "accounting" of
+ * the pages.
+ *
+ * Return value is zero for success, negative for failure.
+ */
+static int do_remap_mfn(struct vm_area_struct *vma,
+			unsigned long addr,
+			xen_pfn_t *mfn, int nr,
+			pgprot_t prot, unsigned domid,
+			struct page **pages)
 {
 	int err;
 	struct remap_data data;
 
-	/* TBD: Batching, current sole caller only does page at a time */
-	if (nr > 1)
-		return -EINVAL;
+	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
 	data.fgmfn = mfn;
-	data.prot = prot;
+	data.prot  = prot;
 	data.domid = domid;
-	data.vma = vma;
-	data.index = 0;
+	data.vma   = vma;
 	data.pages = pages;
-	err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
-				  remap_pte_fn, &data);
-	return err;
+	data.index = 0;
+
+	while (nr) {
+		unsigned long range = 1 << PAGE_SHIFT;
+
+		err = apply_to_page_range(vma->vm_mm, addr, range,
+					  remap_pte_fn, &data);
+		/* Warning: We do probably need to care about what error we
+		   get here. However, currently, the remap_area_mfn_pte_fn is
+		   only likely to return EFAULT or some other "things are very
+		   bad" error code, which the rest of the calling code won't
+		   be able to fix up. So we just exit with the error we got.
+		*/
+		if (err)
+			return err;
+
+		nr--;
+		addr += range;
+	}
+	return 0;
+}
+
+/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages
+ * @vma:     the vma for the pages to be mapped into
+ * @addr:    the address at which to map the pages
+ * @mfn:     pointer to array of MFNs to map
+ * @nr:      the number entries in the MFN array
+ * @err_ptr: pointer to array of integers, one per MFN, for an error
+ *           value for each page. The err_ptr must not be NULL.
+ * @prot:    page protection mask
+ * @domid:   id of the domain that we are mapping from
+ * @pages:   page information (for PVH, not used currently)
+ *
+ * This function takes an array of mfns and maps nr pages from that into this
+ * kernel's memory. The owner of the pages is defined by domid. Where the pages
+ * are mapped is determined by addr, and vma is used for "accounting" of the
+ * pages. The err_ptr array is filled in on any page that is not sucessfully
+ * mapped in.
+ *
+ * Return value is zero for success, negative ERRNO value for failure.
+ * Note that the error value -ENOENT is considered a "retry", so when this
+ * error code is seen, another call should be made with the list of pages that
+ * are marked as -ENOENT in the err_ptr array.
+ */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       xen_pfn_t *mfn, int nr,
+			       int *err_ptr, pgprot_t prot,
+			       unsigned domid,
+			       struct page **pages)
+{
+	/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
+	 * and the consequences later is quite hard to detect what the actual
+	 * cause of "wrong memory was mapped in".
+	 * Note: This variant doesn't actually use err_ptr at the moment.
+	 */
+	BUG_ON(err_ptr == NULL);
+	return do_remap_mfn(vma, addr, mfn, nr, prot, domid, pages);
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
+/* Not used in ARM. Use xen_remap_domain_mfn_array(). */
+int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       xen_pfn_t mfn, int nr,
+			       pgprot_t prot, unsigned domid,
+			       struct page **pages)
+{
+	return -ENOSYS;
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
 
+
 int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
 			       int nr, struct page **pages)
 {

[-- Attachment #3: xen-privcmd-remap-array-x86.patch --]
[-- Type: text/x-patch, Size: 9470 bytes --]

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index dcf5f2d..a67774f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2477,7 +2477,8 @@ void __init xen_hvm_init_mmu_ops(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-	unsigned long mfn;
+	unsigned long *mfn;
+	bool contiguous;
 	pgprot_t prot;
 	struct mmu_update *mmu_update;
 };
@@ -2486,7 +2487,13 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
 				 unsigned long addr, void *data)
 {
 	struct remap_data *rmd = data;
-	pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
+	pte_t pte = pte_mkspecial(pfn_pte(*rmd->mfn, rmd->prot));
+	/* If we have a contigious range, just update the mfn itself,
+	   else update pointer to be "next mfn". */
+	if (rmd->contiguous)
+		(*rmd->mfn)++;
+	else
+		rmd->mfn++;
 
 	rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
 	rmd->mmu_update->val = pte_val_ma(pte);
@@ -2495,16 +2502,34 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
 	return 0;
 }
 
-int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
-			       unsigned long addr,
-			       unsigned long mfn, int nr,
-			       pgprot_t prot, unsigned domid)
-{
+/* do_remap_mfn() - helper function to map foreign pages
+ * @vma:     the vma for the pages to be mapped into
+ * @addr:    the address at which to map the pages
+ * @mfn:     pointer to array of MFNs to map
+ * @nr:      the number entries in the MFN array
+ * @err_ptr: pointer to array
+ * @prot:    page protection mask
+ * @domid:   id of the domain that we are mapping from
+ *
+ * This function takes an array of mfns and maps nr pages from that into
+ * this kernel's memory. The owner of the pages is defined by domid. Where the
+ * pages are mapped is determined by addr, and vma is used for "accounting" of
+ * the pages.
+ *
+ * Return value is zero for success, negative for failure.
+ *
+ * Note that err_ptr is used to indicate whether *mfn
+ * is a list or a "first mfn of a contiguous range". */
+static int do_remap_mfn(struct vm_area_struct *vma,
+			unsigned long addr,
+			unsigned long *mfn, int nr,
+			int *err_ptr, pgprot_t prot,
+			unsigned domid)
+{
+	int err, last_err = 0;
 	struct remap_data rmd;
 	struct mmu_update mmu_update[REMAP_BATCH_SIZE];
-	int batch;
 	unsigned long range;
-	int err = 0;
 
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return -EINVAL;
@@ -2515,9 +2540,15 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 
 	rmd.mfn = mfn;
 	rmd.prot = prot;
+	/* We use the err_ptr to indicate if there we are doing a contigious
+	 * mapping or a discontigious mapping. */
+	rmd.contiguous = !err_ptr;
 
 	while (nr) {
-		batch = min(REMAP_BATCH_SIZE, nr);
+		int index = 0;
+		int done = 0;
+		int batch = min(REMAP_BATCH_SIZE, nr);
+		int batch_left = batch;
 		range = (unsigned long)batch << PAGE_SHIFT;
 
 		rmd.mmu_update = mmu_update;
@@ -2526,19 +2557,92 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 		if (err)
 			goto out;
 
-		err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid);
-		if (err < 0)
-			goto out;
+		/* We record the error for each page that gives an error, but
+		 * continue mapping until the whole set is done */
+		do {
+			err = HYPERVISOR_mmu_update(&mmu_update[index],
+						    batch_left, &done, domid);
+			if (err < 0) {
+				if (!err_ptr)
+					goto out;
+				/* increment done so we skip the error item */
+				done++;
+				last_err = err_ptr[index] = err;
+			}
+			batch_left -= done;
+			index += done;
+		} while (batch_left);
 
 		nr -= batch;
 		addr += range;
+		if (err_ptr)
+			err_ptr += batch;
 	}
 
-	err = 0;
+	err = last_err;
 out:
 
 	xen_flush_tlb_all();
 
 	return err;
 }
+
+/* xen_remap_domain_mfn_range() - Used to map foreign pages
+ * @vma:   the vma for the pages to be mapped into
+ * @addr:  the address at which to map the pages
+ * @mfn:   the first MFN to map
+ * @nr:    the number of consecutive mfns to map
+ * @prot:  page protection mask
+ * @domid: id of the domain that we are mapping from
+ *
+ * This function takes a mfn and maps nr pages on from that into this kernel's
+ * memory. The owner of the pages is defined by domid. Where the pages are
+ * mapped is determined by addr, and vma is used for "accounting" of the
+ * pages.
+ *
+ * Return value is zero for success, negative ERRNO value for failure.
+ */
+int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       unsigned long mfn, int nr,
+			       pgprot_t prot, unsigned domid)
+{
+	return do_remap_mfn(vma, addr, &mfn, nr, NULL, prot, domid);
+}
 EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
+
+/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages
+ * @vma:     the vma for the pages to be mapped into
+ * @addr:    the address at which to map the pages
+ * @mfn:     pointer to array of MFNs to map
+ * @nr:      the number entries in the MFN array
+ * @err_ptr: pointer to array of integers, one per MFN, for an error
+ *           value for each page. The err_ptr must not be NULL.
+ * @prot:    page protection mask
+ * @domid:   id of the domain that we are mapping from
+ *
+ * This function takes an array of mfns and maps nr pages from that into this
+ * kernel's memory. The owner of the pages is defined by domid. Where the pages
+ * are mapped is determined by addr, and vma is used for "accounting" of the
+ * pages. The err_ptr array is filled in on any page that is not sucessfully
+ * mapped in.
+ *
+ * Return value is zero for success, negative ERRNO value for failure.
+ * Note that the error value -ENOENT is considered a "retry", so when this
+ * error code is seen, another call should be made with the list of pages that
+ * are marked as -ENOENT in the err_ptr array.
+ */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       unsigned long *mfn, int nr,
+			       int *err_ptr, pgprot_t prot,
+			       unsigned domid)
+{
+	/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
+	 * and the consequences later is quite hard to detect what the actual
+	 * cause of "wrong memory was mapped in".
+	 */
+	BUG_ON(err_ptr == NULL);
+	return do_remap_mfn(vma, addr, mfn, nr, err_ptr, prot, domid);
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 71f5c45..75f6e86 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -151,6 +151,40 @@ static int traverse_pages(unsigned nelem, size_t size,
 	return ret;
 }
 
+/*
+ * Similar to traverse_pages, but use each page as a "block" of
+ * data to be processed as one unit.
+ */
+static int traverse_pages_block(unsigned nelem, size_t size,
+				struct list_head *pos,
+				int (*fn)(void *data, int nr, void *state),
+				void *state)
+{
+	void *pagedata;
+	unsigned pageidx;
+	int ret = 0;
+
+	BUG_ON(size > PAGE_SIZE);
+
+	pageidx = PAGE_SIZE;
+
+	while (nelem) {
+		int nr = (PAGE_SIZE/size);
+		struct page *page;
+		if (nr > nelem)
+			nr = nelem;
+		pos = pos->next;
+		page = list_entry(pos, struct page, lru);
+		pagedata = page_address(page);
+		ret = (*fn)(pagedata, nr, state);
+		if (ret)
+			break;
+		nelem -= nr;
+	}
+
+	return ret;
+}
+
 struct mmap_mfn_state {
 	unsigned long va;
 	struct vm_area_struct *vma;
@@ -250,7 +284,7 @@ struct mmap_batch_state {
 	 *      0 for no errors
 	 *      1 if at least one error has happened (and no
 	 *          -ENOENT errors have happened)
-	 *      -ENOENT if at least 1 -ENOENT has happened.
+	 *      -ENOENT if at least one -ENOENT has happened.
 	 */
 	int global_error;
 	/* An array for individual errors */
@@ -260,17 +294,20 @@ struct mmap_batch_state {
 	xen_pfn_t __user *user_mfn;
 };
 
-static int mmap_batch_fn(void *data, void *state)
+static int mmap_batch_fn(void *data, int nr, void *state)
 {
 	xen_pfn_t *mfnp = data;
 	struct mmap_batch_state *st = state;
 	int ret;
 
-	ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
-					 st->vma->vm_page_prot, st->domain);
+	BUG_ON(nr < 0);
 
-	/* Store error code for second pass. */
-	*(st->err++) = ret;
+	ret = xen_remap_domain_mfn_array(st->vma,
+					 st->va & PAGE_MASK,
+					 mfnp, nr,
+					 st->err,
+					 st->vma->vm_page_prot,
+					 st->domain);
 
 	/* And see if it affects the global_error. */
 	if (ret < 0) {
@@ -282,7 +319,7 @@ static int mmap_batch_fn(void *data, void *state)
 				st->global_error = 1;
 		}
 	}
-	st->va += PAGE_SIZE;
+	st->va += PAGE_SIZE * nr;
 
 	return 0;
 }
@@ -378,8 +415,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
 	state.err           = err_array;
 
 	/* mmap_batch_fn guarantees ret == 0 */
-	BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t),
-			     &pagelist, mmap_batch_fn, &state));
+	BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
+				    &pagelist, mmap_batch_fn, &state));
 
 	up_write(&mm->mmap_sem);
 
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 6a198e4..22cad75 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -29,4 +29,9 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 			       unsigned long mfn, int nr,
 			       pgprot_t prot, unsigned domid);
 
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       unsigned long *mfn, int nr,
+			       int *err_ptr, pgprot_t prot,
+			       unsigned domid);
 #endif /* INCLUDE_XEN_OPS_H */

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-18 19:34               ` Mats Petersson
@ 2012-12-19 10:59                 ` Ian Campbell
  2012-12-19 12:10                   ` Mats Petersson
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2012-12-19 10:59 UTC (permalink / raw)
  To: Mats Petersson; +Cc: xen-devel, Konrad Rzeszutek Wilk

On Tue, 2012-12-18 at 19:34 +0000, Mats Petersson wrote:

> >> I think I'll do the minimal patch first, then, if I find some spare
> >> time, work on the "batching" variant.
> > OK. The batching is IMHO just using the multicall variant.

The XENMEM_add_to_physmap_range is itself batched (it contains ranges of
mfns etc), so we don't just want to batch the actual hypercalls we
really want to make sure each hypercall processes a batch, this will
lets us optimise the flushes in the hypervisor.

I don't know if they mutlicall infrastructure allows for that but it
would be pretty trivial to do it explicitly

I expect these patches will need to be folded into one to avoid a
bisecability hazard?

xen-privcmd-remap-array-arm.patch:
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 7a32976..dc50a53 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -73,7 +73,7 @@ static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn,
>  }
>  
>  struct remap_data {
> -       xen_pfn_t fgmfn; /* foreign domain's gmfn */
> +       xen_pfn_t *fgmfn; /* foreign domain's gmfn */
>         pgprot_t prot;
>         domid_t  domid;
>         struct vm_area_struct *vma;
> @@ -90,38 +90,120 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
>         unsigned long pfn = page_to_pfn(page);
>         pte_t pte = pfn_pte(pfn, info->prot);
>  
> -       if (map_foreign_page(pfn, info->fgmfn, info->domid))
> +       // TODO: We should really batch these updates
> +       if (map_foreign_page(pfn, *info->fgmfn, info->domid))
>                 return -EFAULT;
>         set_pte_at(info->vma->vm_mm, addr, ptep, pte);
> +       info->fgmfn++;

Looks good.
 
>         return 0;
>  }
>  
> -int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
> -                              unsigned long addr,
> -                              xen_pfn_t mfn, int nr,
> -                              pgprot_t prot, unsigned domid,
> -                              struct page **pages)
> +/* do_remap_mfn() - helper function to map foreign pages
> + * @vma:     the vma for the pages to be mapped into
> + * @addr:    the address at which to map the pages
> + * @mfn:     pointer to array of MFNs to map
> + * @nr:      the number entries in the MFN array
> + * @prot:    page protection mask
> + * @domid:   id of the domain that we are mapping from
> + * @pages:   page information (for PVH, not used currently)
> + *
> + * This function takes an array of mfns and maps nr pages from that into
> + * this kernel's memory. The owner of the pages is defined by domid. Where the
> + * pages are mapped is determined by addr, and vma is used for "accounting" of
> + * the pages.
> + *
> + * Return value is zero for success, negative for failure.
> + */
> +static int do_remap_mfn(struct vm_area_struct *vma,
> +                       unsigned long addr,
> +                       xen_pfn_t *mfn, int nr,
> +                       pgprot_t prot, unsigned domid,
> +                       struct page **pages)

Since xen_remap_domain_mfn_range isn't implemented on ARM the only
caller is xen_remap_domain_mfn_array so you might as well just call this
function ..._array.

> 
>  {
>         int err;
>         struct remap_data data;
>  
> -       /* TBD: Batching, current sole caller only does page at a time */
> -       if (nr > 1)
> -               return -EINVAL;
> +       BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));

Where does this restriction come from?

I think it is true of X86 PV MMU (which has certain requirements about
how and when PTEs are changed) but I don't think ARM or PVH need it
because they use two level paging so the PTEs are just normal native
ones with no extra restrictions.

Maybe it is useful to enforce similarity between PV and PVH/ARM though?


>         data.fgmfn = mfn;
> -       data.prot = prot;
> +       data.prot  = prot;
>         data.domid = domid;
> -       data.vma = vma;
> -       data.index = 0;
> +       data.vma   = vma;
> 
> 
>         data.pages = pages;
> -       err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
> -                                 remap_pte_fn, &data);
> -       return err;
> +       data.index = 0;
> +
> +       while (nr) {
> +               unsigned long range = 1 << PAGE_SHIFT;
> +
> +               err = apply_to_page_range(vma->vm_mm, addr, range,
> +                                         remap_pte_fn, &data);
> +               /* Warning: We do probably need to care about what error we
> +                  get here. However, currently, the remap_area_mfn_pte_fn is

                                                       ^ this isn't the name of the fn
> 
> +                  only likely to return EFAULT or some other "things are very
> +                  bad" error code, which the rest of the calling code won't
> +                  be able to fix up. So we just exit with the error we got.

It expect it is more important to accumulate the individual errors from
remap_pte_fn into err_ptr.

> +               */
> +               if (err)
> +                       return err;
> +
> +               nr--;
> +               addr += range;
> +       }
> 

This while loop (and therefore this change) is unnecessary. The single
call to apply_to_page_range is sufficient and as your TODO notes any
batching should happen in remap_pte_fn (which can handle both
accumulation and flushing when the  batch is large enough).

> +       return 0;
> +}
> +
> +/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages
> + * @vma:     the vma for the pages to be mapped into
> + * @addr:    the address at which to map the pages

physical address, right?

> + * @mfn:     pointer to array of MFNs to map
> + * @nr:      the number entries in the MFN array
> + * @err_ptr: pointer to array of integers, one per MFN, for an error
> + *           value for each page. The err_ptr must not be NULL.

Nothing seems to be filling this in?

> + * @prot:    page protection mask
> + * @domid:   id of the domain that we are mapping from
> + * @pages:   page information (for PVH, not used currently)

No such thing as PVH on ARM. Also pages is used by this code.

> 
> + *
> + * This function takes an array of mfns and maps nr pages from that into this
> + * kernel's memory. The owner of the pages is defined by domid. Where the pages
> + * are mapped is determined by addr, and vma is used for "accounting" of the
> + * pages. The err_ptr array is filled in on any page that is not sucessfully
> 
>                                                                   successfully
> 
> + * mapped in.
> + *
> + * Return value is zero for success, negative ERRNO value for failure.
> + * Note that the error value -ENOENT is considered a "retry", so when this
> + * error code is seen, another call should be made with the list of pages that
> + * are marked as -ENOENT in the err_ptr array.
> + */
> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
> +                              unsigned long addr,
> +                              xen_pfn_t *mfn, int nr,
> +                              int *err_ptr, pgprot_t prot,
> +                              unsigned domid,
> +                              struct page **pages)
> +{
> +       /* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
> +        * and the consequences later is quite hard to detect what the actual
> +        * cause of "wrong memory was mapped in".
> +        * Note: This variant doesn't actually use err_ptr at the moment.

True ;-)

> +        */
> +       BUG_ON(err_ptr == NULL);
> +       return do_remap_mfn(vma, addr, mfn, nr, prot, domid, pages);
> +}
> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
> +
> +/* Not used in ARM. Use xen_remap_domain_mfn_array(). */
> +int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
> +                              unsigned long addr,
> +                              xen_pfn_t mfn, int nr,
> +                              pgprot_t prot, unsigned domid,
> +                              struct page **pages)
> +{
> +       return -ENOSYS;
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
>  
> +
>  int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
>                                int nr, struct page **pages)
>  {
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-19 10:59                 ` Ian Campbell
@ 2012-12-19 12:10                   ` Mats Petersson
  2012-12-19 12:22                     ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Mats Petersson @ 2012-12-19 12:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 11393 bytes --]

New patch attached.

I haven't done the relevant spelling fixes etc, as I had a little mishap 
with git, and need to fix up a few things. Thought you may want to have 
a look over the ARM side meanwhile, and if this is OK, I will post an 
"official" V5 patch to the list.

Further comments below.

--
Mats
On 19/12/12 10:59, Ian Campbell wrote:
> On Tue, 2012-12-18 at 19:34 +0000, Mats Petersson wrote:
>
>>>> I think I'll do the minimal patch first, then, if I find some spare
>>>> time, work on the "batching" variant.
>>> OK. The batching is IMHO just using the multicall variant.
> The XENMEM_add_to_physmap_range is itself batched (it contains ranges of
> mfns etc), so we don't just want to batch the actual hypercalls we
> really want to make sure each hypercall processes a batch, this will
> lets us optimise the flushes in the hypervisor.
>
> I don't know if they mutlicall infrastructure allows for that but it
> would be pretty trivial to do it explicitly
Yes, I'm sure it is. I would prefer to do that AFTER my x86 patch has 
gone in, if that's possible. (Or that someone who can actually 
understands the ARM architecture and can test it on actual ARM does it...)
>
> I expect these patches will need to be folded into one to avoid a
> bisecability hazard?
Yes, that is certainly my plan. I just made two patches for ease of 
"what is ARM and what isn't" - but final submit should be as one patch.
>
> xen-privcmd-remap-array-arm.patch:
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index 7a32976..dc50a53 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -73,7 +73,7 @@ static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn,
>>   }
>>   
>>   struct remap_data {
>> -       xen_pfn_t fgmfn; /* foreign domain's gmfn */
>> +       xen_pfn_t *fgmfn; /* foreign domain's gmfn */
>>          pgprot_t prot;
>>          domid_t  domid;
>>          struct vm_area_struct *vma;
>> @@ -90,38 +90,120 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
>>          unsigned long pfn = page_to_pfn(page);
>>          pte_t pte = pfn_pte(pfn, info->prot);
>>   
>> -       if (map_foreign_page(pfn, info->fgmfn, info->domid))
>> +       // TODO: We should really batch these updates
>> +       if (map_foreign_page(pfn, *info->fgmfn, info->domid))
>>                  return -EFAULT;
>>          set_pte_at(info->vma->vm_mm, addr, ptep, pte);
>> +       info->fgmfn++;
> Looks good.
>   
>>          return 0;
>>   }
>>   
>> -int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>> -                              unsigned long addr,
>> -                              xen_pfn_t mfn, int nr,
>> -                              pgprot_t prot, unsigned domid,
>> -                              struct page **pages)
>> +/* do_remap_mfn() - helper function to map foreign pages
>> + * @vma:     the vma for the pages to be mapped into
>> + * @addr:    the address at which to map the pages
>> + * @mfn:     pointer to array of MFNs to map
>> + * @nr:      the number entries in the MFN array
>> + * @prot:    page protection mask
>> + * @domid:   id of the domain that we are mapping from
>> + * @pages:   page information (for PVH, not used currently)
>> + *
>> + * This function takes an array of mfns and maps nr pages from that into
>> + * this kernel's memory. The owner of the pages is defined by domid. Where the
>> + * pages are mapped is determined by addr, and vma is used for "accounting" of
>> + * the pages.
>> + *
>> + * Return value is zero for success, negative for failure.
>> + */
>> +static int do_remap_mfn(struct vm_area_struct *vma,
>> +                       unsigned long addr,
>> +                       xen_pfn_t *mfn, int nr,
>> +                       pgprot_t prot, unsigned domid,
>> +                       struct page **pages)
> Since xen_remap_domain_mfn_range isn't implemented on ARM the only
> caller is xen_remap_domain_mfn_array so you might as well just call this
> function ..._array.
Yes, could do. As I stated in the commentary text, I tried to keep the 
code similar in structure to x86.
[Actually, one iteration of my code had two API functions, one of which 
called the other, but it was considered a better solution to make one 
common function, and have the two x86 functions call that one].
>>   {
>>          int err;
>>          struct remap_data data;
>>   
>> -       /* TBD: Batching, current sole caller only does page at a time */
>> -       if (nr > 1)
>> -               return -EINVAL;
>> +       BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
> Where does this restriction come from?
>
> I think it is true of X86 PV MMU (which has certain requirements about
> how and when PTEs are changed) but I don't think ARM or PVH need it
> because they use two level paging so the PTEs are just normal native
> ones with no extra restrictions.
>
> Maybe it is useful to enforce similarity between PV and PVH/ARM though?
I don't know if it's useful or not. I'm sure removing it will make 
little difference, but since the VM flags are set by the calling code, 
the privcmd.c or higher level code would have to be correct for whatever 
architecture they are on. Not sure if it is "helpful" to allow certain 
combinations in one arch, when it's not in another. My choice would be 
to keep the restriction until there is a good reason to remove it, but 
if you feel it is beneficial to remove it, feel free to say so.
[Perhaps the ARM code should have a comment to the effect of "not needed 
on PVH or ARM, kept for compatibility with PVOPS on x86" - so when PVOPS 
is "retired" in some years time, it can be removed]
>
>
>>          data.fgmfn = mfn;
>> -       data.prot = prot;
>> +       data.prot  = prot;
>>          data.domid = domid;
>> -       data.vma = vma;
>> -       data.index = 0;
>> +       data.vma   = vma;
>>
>>
>>          data.pages = pages;
>> -       err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
>> -                                 remap_pte_fn, &data);
>> -       return err;
>> +       data.index = 0;
>> +
>> +       while (nr) {
>> +               unsigned long range = 1 << PAGE_SHIFT;
>> +
>> +               err = apply_to_page_range(vma->vm_mm, addr, range,
>> +                                         remap_pte_fn, &data);
>> +               /* Warning: We do probably need to care about what error we
>> +                  get here. However, currently, the remap_area_mfn_pte_fn is
>                                                         ^ this isn't the name of the fn
Fixed.
>> +                  only likely to return EFAULT or some other "things are very
>> +                  bad" error code, which the rest of the calling code won't
>> +                  be able to fix up. So we just exit with the error we got.
> It expect it is more important to accumulate the individual errors from
> remap_pte_fn into err_ptr.
Yes, but since that exits on error with EFAULT, the calling code won't 
"accept" the errors, and thus the whole house of cards fall apart at 
this point.

There should probably be a task to fix this up properly, hence the 
comment. But right now, any error besides ENOENT is "unacceptable" by 
the callers of this code. If you want me to add this to the comment, I'm 
happy to. But as long as remap_pte_fn returns EFAULT on error, nothing 
will work after an error.
>
>> +               */
>> +               if (err)
>> +                       return err;
>> +
>> +               nr--;
>> +               addr += range;
>> +       }
>>
> This while loop (and therefore this change) is unnecessary. The single
> call to apply_to_page_range is sufficient and as your TODO notes any
> batching should happen in remap_pte_fn (which can handle both
> accumulation and flushing when the  batch is large enough).
Ah, I see how you mean. I have updated the code accordingly.
>
>> +       return 0;
>> +}
>> +
>> +/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages
>> + * @vma:     the vma for the pages to be mapped into
>> + * @addr:    the address at which to map the pages
> physical address, right?
Virtual, at least if we assume that in " st->va & PAGE_MASK,"  'va' 
actually means virtual address - it would be rather devious to keep use 
the name va as a physical address - although I have seen such things in 
the past.

I shall amend the comments to say such "virtual address" to be more 
clear. [Not done in the attached patch, just realized this when 
re-reading my comments that I probably should...]
>
>> + * @mfn:     pointer to array of MFNs to map
>> + * @nr:      the number entries in the MFN array
>> + * @err_ptr: pointer to array of integers, one per MFN, for an error
>> + *           value for each page. The err_ptr must not be NULL.
> Nothing seems to be filling this in?
As discussed above (and below).
>
>> + * @prot:    page protection mask
>> + * @domid:   id of the domain that we are mapping from
>> + * @pages:   page information (for PVH, not used currently)
> No such thing as PVH on ARM. Also pages is used by this code.
Removed part in ().
>
>> + *
>> + * This function takes an array of mfns and maps nr pages from that into this
>> + * kernel's memory. The owner of the pages is defined by domid. Where the pages
>> + * are mapped is determined by addr, and vma is used for "accounting" of the
>> + * pages. The err_ptr array is filled in on any page that is not sucessfully
>>
>>                                                                    successfully
Thanks...
>>
>> + * mapped in.
>> + *
>> + * Return value is zero for success, negative ERRNO value for failure.
>> + * Note that the error value -ENOENT is considered a "retry", so when this
>> + * error code is seen, another call should be made with the list of pages that
>> + * are marked as -ENOENT in the err_ptr array.
>> + */
>> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
>> +                              unsigned long addr,
>> +                              xen_pfn_t *mfn, int nr,
>> +                              int *err_ptr, pgprot_t prot,
>> +                              unsigned domid,
>> +                              struct page **pages)
>> +{
>> +       /* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
>> +        * and the consequences later is quite hard to detect what the actual
>> +        * cause of "wrong memory was mapped in".
>> +        * Note: This variant doesn't actually use err_ptr at the moment.
> True ;-)
Do you prefer the "not used" comment moved up a bit?

--
Mats
>
>> +        */
>> +       BUG_ON(err_ptr == NULL);
>> +       return do_remap_mfn(vma, addr, mfn, nr, prot, domid, pages);
>> +}
>> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
>> +
>> +/* Not used in ARM. Use xen_remap_domain_mfn_array(). */
>> +int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>> +                              unsigned long addr,
>> +                              xen_pfn_t mfn, int nr,
>> +                              pgprot_t prot, unsigned domid,
>> +                              struct page **pages)
>> +{
>> +       return -ENOSYS;
>>   }
>>   EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
>>   
>> +
>>   int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
>>                                 int nr, struct page **pages)
>>   {
>>


[-- Attachment #2: 0001-Fixed-up-after-IanC-s-comments.patch --]
[-- Type: text/x-patch, Size: 5479 bytes --]

>From 7ec4da2e1a40963d459dec6c61e810e5badd390a Mon Sep 17 00:00:00 2001
From: Mats Petersson <mats.petersson@citrix.com>
Date: Wed, 19 Dec 2012 11:58:23 +0000
Subject: [PATCH] Fixed up after IanC's comments.

---
 arch/arm/xen/enlighten.c |  104 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 90 insertions(+), 14 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 7a32976..2bf8556 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -73,7 +73,7 @@ static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn,
 }
 
 struct remap_data {
-	xen_pfn_t fgmfn; /* foreign domain's gmfn */
+	xen_pfn_t *fgmfn; /* foreign domain's gmfn */
 	pgprot_t prot;
 	domid_t  domid;
 	struct vm_area_struct *vma;
@@ -90,38 +90,114 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
 	unsigned long pfn = page_to_pfn(page);
 	pte_t pte = pfn_pte(pfn, info->prot);
 
-	if (map_foreign_page(pfn, info->fgmfn, info->domid))
+	// TODO: We should really batch these updates
+	if (map_foreign_page(pfn, *info->fgmfn, info->domid))
 		return -EFAULT;
 	set_pte_at(info->vma->vm_mm, addr, ptep, pte);
+	info->fgmfn++;
 
 	return 0;
 }
 
-int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
-			       unsigned long addr,
-			       xen_pfn_t mfn, int nr,
-			       pgprot_t prot, unsigned domid,
-			       struct page **pages)
+/* do_remap_mfn() - helper function to map foreign pages
+ * @vma:     the vma for the pages to be mapped into
+ * @addr:    the address at which to map the pages
+ * @mfn:     pointer to array of MFNs to map
+ * @nr:      the number entries in the MFN array
+ * @prot:    page protection mask
+ * @domid:   id of the domain that we are mapping from
+ * @pages:   page information.
+ *
+ * This function takes an array of mfns and maps nr pages from that into
+ * this kernel's memory. The owner of the pages is defined by domid. Where the
+ * pages are mapped is determined by addr, and vma is used for "accounting" of
+ * the pages.
+ *
+ * Return value is zero for success, negative for failure.
+ */
+static int do_remap_mfn(struct vm_area_struct *vma,
+			unsigned long addr,
+			xen_pfn_t *mfn, int nr,
+			pgprot_t prot, unsigned domid,
+			struct page **pages)
 {
 	int err;
 	struct remap_data data;
 
-	/* TBD: Batching, current sole caller only does page at a time */
-	if (nr > 1)
-		return -EINVAL;
+	/* Kept here for the purpose of 
+	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
 	data.fgmfn = mfn;
-	data.prot = prot;
+	data.prot  = prot;
 	data.domid = domid;
-	data.vma = vma;
-	data.index = 0;
+	data.vma   = vma;
 	data.pages = pages;
-	err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
+	data.index = 0;
+
+	unsigned long range = nr << PAGE_SHIFT;
+	
+	err = apply_to_page_range(vma->vm_mm, addr, range,
 				  remap_pte_fn, &data);
+	/* Warning: We do probably need to care about what error we
+	   get here. However, currently, the remap_pte_fn is only 
+	   likely to return EFAULT or some other "things are very
+	   bad" error code, which the rest of the calling code won't
+	   be able to fix up. So we just exit with the error we got.
+	*/
 	return err;
 }
+
+/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages
+ * @vma:     the vma for the pages to be mapped into
+ * @addr:    the address at which to map the pages
+ * @mfn:     pointer to array of MFNs to map
+ * @nr:      the number entries in the MFN array
+ * @err_ptr: pointer to array of integers, one per MFN, for an error
+ *           value for each page. The err_ptr must not be NULL.
+ * @prot:    page protection mask
+ * @domid:   id of the domain that we are mapping from
+ * @pages:   page information
+ *
+ * This function takes an array of mfns and maps nr pages from that into this
+ * kernel's memory. The owner of the pages is defined by domid. Where the pages
+ * are mapped is determined by addr, and vma is used for "accounting" of the
+ * pages. The err_ptr array is filled in on any page that is not successfully
+ * mapped in.
+ *
+ * Return value is zero for success, negative ERRNO value for failure.
+ * Note that the error value -ENOENT is considered a "retry", so when this
+ * error code is seen, another call should be made with the list of pages that
+ * are marked as -ENOENT in the err_ptr array.
+ */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       xen_pfn_t *mfn, int nr,
+			       int *err_ptr, pgprot_t prot,
+			       unsigned domid,
+			       struct page **pages)
+{
+	/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
+	 * and the consequences later is quite hard to detect what the actual
+	 * cause of "wrong memory was mapped in".
+	 * Note: This variant doesn't actually use err_ptr at the moment.
+	 */
+	BUG_ON(err_ptr == NULL);
+	return do_remap_mfn(vma, addr, mfn, nr, prot, domid, pages);
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
+/* Not used in ARM. Use xen_remap_domain_mfn_array(). */
+int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       xen_pfn_t mfn, int nr,
+			       pgprot_t prot, unsigned domid,
+			       struct page **pages)
+{
+	return -ENOSYS;
+}
 EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
 
+
 int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
 			       int nr, struct page **pages)
 {
-- 
1.7.9.5


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-19 12:10                   ` Mats Petersson
@ 2012-12-19 12:22                     ` Ian Campbell
  2012-12-19 15:08                       ` Mats Petersson
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2012-12-19 12:22 UTC (permalink / raw)
  To: Mats Petersson; +Cc: xen-devel, Konrad Rzeszutek Wilk

On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote:

> >> +                  only likely to return EFAULT or some other "things are very
> >> +                  bad" error code, which the rest of the calling code won't
> >> +                  be able to fix up. So we just exit with the error we got.
> > It expect it is more important to accumulate the individual errors from
> > remap_pte_fn into err_ptr.
> Yes, but since that exits on error with EFAULT, the calling code won't 
> "accept" the errors, and thus the whole house of cards fall apart at 
> this point.
> 
> There should probably be a task to fix this up properly, hence the 
> comment. But right now, any error besides ENOENT is "unacceptable" by 
> the callers of this code. If you want me to add this to the comment, I'm 
> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing 
> will work after an error.

Are you sure? privcmd.c has some special casing for ENOENT but it looks
like it should just pass through other errors back to userspace.

In any case surely this needs fixing?

On the X86 side err_ptr is the result of the mmupdate hypercall which
can already be other than ENOENT, including EFAULT, ESRCH etc.

Ian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-19 12:22                     ` Ian Campbell
@ 2012-12-19 15:08                       ` Mats Petersson
  2012-12-19 15:45                         ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Mats Petersson @ 2012-12-19 15:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Konrad Rzeszutek Wilk

On 19/12/12 12:22, Ian Campbell wrote:
> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote:
>
>>>> +                  only likely to return EFAULT or some other "things are very
>>>> +                  bad" error code, which the rest of the calling code won't
>>>> +                  be able to fix up. So we just exit with the error we got.
>>> It expect it is more important to accumulate the individual errors from
>>> remap_pte_fn into err_ptr.
>> Yes, but since that exits on error with EFAULT, the calling code won't
>> "accept" the errors, and thus the whole house of cards fall apart at
>> this point.
>>
>> There should probably be a task to fix this up properly, hence the
>> comment. But right now, any error besides ENOENT is "unacceptable" by
>> the callers of this code. If you want me to add this to the comment, I'm
>> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing
>> will work after an error.
> Are you sure? privcmd.c has some special casing for ENOENT but it looks
> like it should just pass through other errors back to userspace.
>
> In any case surely this needs fixing?
>
> On the X86 side err_ptr is the result of the mmupdate hypercall which
> can already be other than ENOENT, including EFAULT, ESRCH etc.
Yes, but the ONLY error that is "acceptable" (as in, doesn't lead to the 
calling code revoking the mapping and returning an error) is ENOENT.

Or at least, that's how I believe it should SHOULD be - since only 
ENOENT is a "success" error code, anything else pretty much means that 
the operation requested didn't work properly. If you are aware of any 
use-case where EFAULT, ESRCH or other error codes would still result in 
a valid, usable memory mapping - I have a fair understanding of the xc_* 
code, and although I may not know every piece of that code, I'm fairly 
certainly that is the expected behaviour.

--
Mats
>
> Ian.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-19 15:08                       ` Mats Petersson
@ 2012-12-19 15:45                         ` Ian Campbell
  2012-12-19 15:47                           ` Mats Petersson
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2012-12-19 15:45 UTC (permalink / raw)
  To: Mats Petersson; +Cc: xen-devel, Konrad Rzeszutek Wilk

On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote:
> On 19/12/12 12:22, Ian Campbell wrote:
> > On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote:
> >
> >>>> +                  only likely to return EFAULT or some other "things are very
> >>>> +                  bad" error code, which the rest of the calling code won't
> >>>> +                  be able to fix up. So we just exit with the error we got.
> >>> It expect it is more important to accumulate the individual errors from
> >>> remap_pte_fn into err_ptr.
> >> Yes, but since that exits on error with EFAULT, the calling code won't
> >> "accept" the errors, and thus the whole house of cards fall apart at
> >> this point.
> >>
> >> There should probably be a task to fix this up properly, hence the
> >> comment. But right now, any error besides ENOENT is "unacceptable" by
> >> the callers of this code. If you want me to add this to the comment, I'm
> >> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing
> >> will work after an error.
> > Are you sure? privcmd.c has some special casing for ENOENT but it looks
> > like it should just pass through other errors back to userspace.
> >
> > In any case surely this needs fixing?
> >
> > On the X86 side err_ptr is the result of the mmupdate hypercall which
> > can already be other than ENOENT, including EFAULT, ESRCH etc.
> Yes, but the ONLY error that is "acceptable" (as in, doesn't lead to the 
> calling code revoking the mapping and returning an error) is ENOENT.

Hr, Probably the right thing is for map_foreign_page to propagate the
result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it
in err_ptr. That -EFAULT thing just looks wrong to me.

> 
> Or at least, that's how I believe it should SHOULD be - since only 
> ENOENT is a "success" error code, anything else pretty much means that 
> the operation requested didn't work properly. If you are aware of any 
> use-case where EFAULT, ESRCH or other error codes would still result in 
> a valid, usable memory mapping - I have a fair understanding of the xc_* 
> code, and although I may not know every piece of that code, I'm fairly 
> certainly that is the expected behaviour.
> 
> --
> Mats
> >
> > Ian.
> >
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-19 15:45                         ` Ian Campbell
@ 2012-12-19 15:47                           ` Mats Petersson
  2012-12-19 15:51                             ` Ian Campbell
  0 siblings, 1 reply; 10+ messages in thread
From: Mats Petersson @ 2012-12-19 15:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Konrad Rzeszutek Wilk

On 19/12/12 15:45, Ian Campbell wrote:
> On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote:
>> On 19/12/12 12:22, Ian Campbell wrote:
>>> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote:
>>>
>>>>>> +                  only likely to return EFAULT or some other "things are very
>>>>>> +                  bad" error code, which the rest of the calling code won't
>>>>>> +                  be able to fix up. So we just exit with the error we got.
>>>>> It expect it is more important to accumulate the individual errors from
>>>>> remap_pte_fn into err_ptr.
>>>> Yes, but since that exits on error with EFAULT, the calling code won't
>>>> "accept" the errors, and thus the whole house of cards fall apart at
>>>> this point.
>>>>
>>>> There should probably be a task to fix this up properly, hence the
>>>> comment. But right now, any error besides ENOENT is "unacceptable" by
>>>> the callers of this code. If you want me to add this to the comment, I'm
>>>> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing
>>>> will work after an error.
>>> Are you sure? privcmd.c has some special casing for ENOENT but it looks
>>> like it should just pass through other errors back to userspace.
>>>
>>> In any case surely this needs fixing?
>>>
>>> On the X86 side err_ptr is the result of the mmupdate hypercall which
>>> can already be other than ENOENT, including EFAULT, ESRCH etc.
>> Yes, but the ONLY error that is "acceptable" (as in, doesn't lead to the
>> calling code revoking the mapping and returning an error) is ENOENT.
> Hr, Probably the right thing is for map_foreign_page to propagate the
> result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it
> in err_ptr. That -EFAULT thing just looks wrong to me.
Ok, so you want me to fix that up, I suppose? I mean, I just copied the 
behaviour that was already there - just massaged the code around a bit...

--
Mats
>
>> Or at least, that's how I believe it should SHOULD be - since only
>> ENOENT is a "success" error code, anything else pretty much means that
>> the operation requested didn't work properly. If you are aware of any
>> use-case where EFAULT, ESRCH or other error codes would still result in
>> a valid, usable memory mapping - I have a fair understanding of the xc_*
>> code, and although I may not know every piece of that code, I'm fairly
>> certainly that is the expected behaviour.
>>
>> --
>> Mats
>>> Ian.
>>>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-19 15:47                           ` Mats Petersson
@ 2012-12-19 15:51                             ` Ian Campbell
  2012-12-19 15:59                               ` Mats Petersson
  0 siblings, 1 reply; 10+ messages in thread
From: Ian Campbell @ 2012-12-19 15:51 UTC (permalink / raw)
  To: Mats Petersson; +Cc: xen-devel, Konrad Rzeszutek Wilk

On Wed, 2012-12-19 at 15:47 +0000, Mats Petersson wrote:
> On 19/12/12 15:45, Ian Campbell wrote:
> > On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote:
> >> On 19/12/12 12:22, Ian Campbell wrote:
> >>> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote:
> >>>
> >>>>>> +                  only likely to return EFAULT or some other "things are very
> >>>>>> +                  bad" error code, which the rest of the calling code won't
> >>>>>> +                  be able to fix up. So we just exit with the error we got.
> >>>>> It expect it is more important to accumulate the individual errors from
> >>>>> remap_pte_fn into err_ptr.
> >>>> Yes, but since that exits on error with EFAULT, the calling code won't
> >>>> "accept" the errors, and thus the whole house of cards fall apart at
> >>>> this point.
> >>>>
> >>>> There should probably be a task to fix this up properly, hence the
> >>>> comment. But right now, any error besides ENOENT is "unacceptable" by
> >>>> the callers of this code. If you want me to add this to the comment, I'm
> >>>> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing
> >>>> will work after an error.
> >>> Are you sure? privcmd.c has some special casing for ENOENT but it looks
> >>> like it should just pass through other errors back to userspace.
> >>>
> >>> In any case surely this needs fixing?
> >>>
> >>> On the X86 side err_ptr is the result of the mmupdate hypercall which
> >>> can already be other than ENOENT, including EFAULT, ESRCH etc.
> >> Yes, but the ONLY error that is "acceptable" (as in, doesn't lead to the
> >> calling code revoking the mapping and returning an error) is ENOENT.
> > Hr, Probably the right thing is for map_foreign_page to propagate the
> > result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it
> > in err_ptr. That -EFAULT thing just looks wrong to me.
> Ok, so you want me to fix that up, I suppose? I mean, I just copied the 
> behaviour that was already there - just massaged the code around a bit...

Yes please, it didn't really matter before but I think it matters after
your changes.

> 
> --
> Mats
> >
> >> Or at least, that's how I believe it should SHOULD be - since only
> >> ENOENT is a "success" error code, anything else pretty much means that
> >> the operation requested didn't work properly. If you are aware of any
> >> use-case where EFAULT, ESRCH or other error codes would still result in
> >> a valid, usable memory mapping - I have a fair understanding of the xc_*
> >> code, and although I may not know every piece of that code, I'm fairly
> >> certainly that is the expected behaviour.
> >>
> >> --
> >> Mats
> >>> Ian.
> >>>
> >
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ARM fixes for my improved privcmd patch.
  2012-12-19 15:51                             ` Ian Campbell
@ 2012-12-19 15:59                               ` Mats Petersson
  0 siblings, 0 replies; 10+ messages in thread
From: Mats Petersson @ 2012-12-19 15:59 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Konrad Rzeszutek Wilk

On 19/12/12 15:51, Ian Campbell wrote:
> On Wed, 2012-12-19 at 15:47 +0000, Mats Petersson wrote:
>> On 19/12/12 15:45, Ian Campbell wrote:
>>> On Wed, 2012-12-19 at 15:08 +0000, Mats Petersson wrote:
>>>> On 19/12/12 12:22, Ian Campbell wrote:
>>>>> On Wed, 2012-12-19 at 12:10 +0000, Mats Petersson wrote:
>>>>>
>>>>>>>> +                  only likely to return EFAULT or some other "things are very
>>>>>>>> +                  bad" error code, which the rest of the calling code won't
>>>>>>>> +                  be able to fix up. So we just exit with the error we got.
>>>>>>> It expect it is more important to accumulate the individual errors from
>>>>>>> remap_pte_fn into err_ptr.
>>>>>> Yes, but since that exits on error with EFAULT, the calling code won't
>>>>>> "accept" the errors, and thus the whole house of cards fall apart at
>>>>>> this point.
>>>>>>
>>>>>> There should probably be a task to fix this up properly, hence the
>>>>>> comment. But right now, any error besides ENOENT is "unacceptable" by
>>>>>> the callers of this code. If you want me to add this to the comment, I'm
>>>>>> happy to. But as long as remap_pte_fn returns EFAULT on error, nothing
>>>>>> will work after an error.
>>>>> Are you sure? privcmd.c has some special casing for ENOENT but it looks
>>>>> like it should just pass through other errors back to userspace.
>>>>>
>>>>> In any case surely this needs fixing?
>>>>>
>>>>> On the X86 side err_ptr is the result of the mmupdate hypercall which
>>>>> can already be other than ENOENT, including EFAULT, ESRCH etc.
>>>> Yes, but the ONLY error that is "acceptable" (as in, doesn't lead to the
>>>> calling code revoking the mapping and returning an error) is ENOENT.
>>> Hr, Probably the right thing is for map_foreign_page to propagate the
>>> result of XENMEM_add_to_physmap_range and for remap_pte_fn to store it
>>> in err_ptr. That -EFAULT thing just looks wrong to me.
>> Ok, so you want me to fix that up, I suppose? I mean, I just copied the
>> behaviour that was already there - just massaged the code around a bit...
> Yes please, it didn't really matter before but I think it matters after
> your changes.
Ok, I will try to fix. But since I can't test it, it will still be "does 
it compile" testing... {Would be nice to understand what has changed - 
as far as I see, the old code was just as much broken as the new code}

--
Mats
>
>> --
>> Mats
>>>> Or at least, that's how I believe it should SHOULD be - since only
>>>> ENOENT is a "success" error code, anything else pretty much means that
>>>> the operation requested didn't work properly. If you are aware of any
>>>> use-case where EFAULT, ESRCH or other error codes would still result in
>>>> a valid, usable memory mapping - I have a fair understanding of the xc_*
>>>> code, and although I may not know every piece of that code, I'm fairly
>>>> certainly that is the expected behaviour.
>>>>
>>>> --
>>>> Mats
>>>>> Ian.
>>>>>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-12-19 15:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <50CB5B32.50406@citrix.com>
     [not found] ` <1355763448.14620.111.camel@zakaz.uk.xensource.com>
     [not found]   ` <50CF587B.5090602@citrix.com>
     [not found]     ` <1355829451.14620.188.camel@zakaz.uk.xensource.com>
     [not found]       ` <50D05358.30303@citrix.com>
     [not found]         ` <1355830856.14620.206.camel@zakaz.uk.xensource.com>
     [not found]           ` <50D074F5.6060202@citrix.com>
2012-12-18 16:07             ` ARM fixes for my improved privcmd patch Konrad Rzeszutek Wilk
2012-12-18 19:34               ` Mats Petersson
2012-12-19 10:59                 ` Ian Campbell
2012-12-19 12:10                   ` Mats Petersson
2012-12-19 12:22                     ` Ian Campbell
2012-12-19 15:08                       ` Mats Petersson
2012-12-19 15:45                         ` Ian Campbell
2012-12-19 15:47                           ` Mats Petersson
2012-12-19 15:51                             ` Ian Campbell
2012-12-19 15:59                               ` Mats Petersson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.