All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-01 11:51 David Vrabel
  2011-09-01 16:11   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 54+ messages in thread
From: David Vrabel @ 2011-09-01 11:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Jeremy Fitzhardinge, David Vrabel, Konrad Rzeszutek Wilk

From: David Vrabel <david.vrabel@citrix.com>

Xen backend drivers (e.g., blkback and netback) would sometimes fail
to map grant pages into the vmalloc address space allocated with
alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
could not find the page (in the L2 table) containing the PTEs it
needed to update.

(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000

netback and blkback were making the hypercall from a kernel thread
where task->active_mm != &init_mm and alloc_vm_area() was only
updating the page tables for init_mm.  The usual method of deferring
the update to the page tables of other processes (i.e., after taking a
fault) doesn't work as a fault cannot occur during the hypercall.

This would work on some systems depending on what else was using
vmalloc.

Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
(vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
comment to explain why it's needed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 mm/vmalloc.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7ef0903..5016f19 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size)
 		return NULL;
 	}
 
+	/*
+	 * If the allocated address space is passed to a hypercall
+	 * before being used then we cannot rely on a page fault to
+	 * trigger an update of the page tables.  So sync all the page
+	 * tables here.
+	 */
+	vmalloc_sync_all();
+
 	return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-01 11:51 [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area() David Vrabel
@ 2011-09-01 16:11   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-01 16:11 UTC (permalink / raw)
  To: David Vrabel, linux-kernel, akpm, namhyung, rientjes, linux-mm
  Cc: xen-devel, Jeremy Fitzhardinge, paulmck

On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>

Andrew,

I was wondering if you would be Ok with this patch for 3.1.

It is a revert (I can prepare a proper revert if you would like
that instead of this patch).

The users of this particular function (alloc_vm_area) are just
Xen. There are no others.

> 
> Xen backend drivers (e.g., blkback and netback) would sometimes fail
> to map grant pages into the vmalloc address space allocated with
> alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> could not find the page (in the L2 table) containing the PTEs it
> needed to update.
> 
> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> 
> netback and blkback were making the hypercall from a kernel thread
> where task->active_mm != &init_mm and alloc_vm_area() was only
> updating the page tables for init_mm.  The usual method of deferring
> the update to the page tables of other processes (i.e., after taking a
> fault) doesn't work as a fault cannot occur during the hypercall.
> 
> This would work on some systems depending on what else was using
> vmalloc.
> 
> Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> comment to explain why it's needed.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  mm/vmalloc.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 7ef0903..5016f19 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size)
>  		return NULL;
>  	}
>  
> +	/*
> +	 * If the allocated address space is passed to a hypercall
> +	 * before being used then we cannot rely on a page fault to
> +	 * trigger an update of the page tables.  So sync all the page
> +	 * tables here.
> +	 */
> +	vmalloc_sync_all();
> +
>  	return area;
>  }
>  EXPORT_SYMBOL_GPL(alloc_vm_area);
> -- 
> 1.7.2.5

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-01 16:11   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-01 16:11 UTC (permalink / raw)
  To: David Vrabel, linux-kernel, akpm, namhyung, rientjes, linux-mm
  Cc: xen-devel, Jeremy Fitzhardinge, paulmck

On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>

Andrew,

I was wondering if you would be Ok with this patch for 3.1.

It is a revert (I can prepare a proper revert if you would like
that instead of this patch).

The users of this particular function (alloc_vm_area) are just
Xen. There are no others.

> 
> Xen backend drivers (e.g., blkback and netback) would sometimes fail
> to map grant pages into the vmalloc address space allocated with
> alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> could not find the page (in the L2 table) containing the PTEs it
> needed to update.
> 
> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> 
> netback and blkback were making the hypercall from a kernel thread
> where task->active_mm != &init_mm and alloc_vm_area() was only
> updating the page tables for init_mm.  The usual method of deferring
> the update to the page tables of other processes (i.e., after taking a
> fault) doesn't work as a fault cannot occur during the hypercall.
> 
> This would work on some systems depending on what else was using
> vmalloc.
> 
> Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> comment to explain why it's needed.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  mm/vmalloc.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 7ef0903..5016f19 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size)
>  		return NULL;
>  	}
>  
> +	/*
> +	 * If the allocated address space is passed to a hypercall
> +	 * before being used then we cannot rely on a page fault to
> +	 * trigger an update of the page tables.  So sync all the page
> +	 * tables here.
> +	 */
> +	vmalloc_sync_all();
> +
>  	return area;
>  }
>  EXPORT_SYMBOL_GPL(alloc_vm_area);
> -- 
> 1.7.2.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-01 16:11   ` Konrad Rzeszutek Wilk
@ 2011-09-01 20:37     ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 54+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-01 20:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, linux-kernel, akpm, namhyung, rientjes, linux-mm,
	xen-devel, paulmck

On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
> Andrew,
>
> I was wondering if you would be Ok with this patch for 3.1.
>
> It is a revert (I can prepare a proper revert if you would like
> that instead of this patch).
>
> The users of this particular function (alloc_vm_area) are just
> Xen. There are no others.

I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
necessary, and ultimately try to work out ways of avoiding it altogether
(like have some hypercall wrapper which touches the arg memory to make
sure its mapped?).

    J

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-01 20:37     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 54+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-01 20:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, linux-kernel, akpm, namhyung, rientjes, linux-mm,
	xen-devel, paulmck

On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
> Andrew,
>
> I was wondering if you would be Ok with this patch for 3.1.
>
> It is a revert (I can prepare a proper revert if you would like
> that instead of this patch).
>
> The users of this particular function (alloc_vm_area) are just
> Xen. There are no others.

I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
necessary, and ultimately try to work out ways of avoiding it altogether
(like have some hypercall wrapper which touches the arg memory to make
sure its mapped?).

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-01 20:37     ` Jeremy Fitzhardinge
@ 2011-09-01 21:17       ` Andrew Morton
  -1 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-09-01 21:17 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Konrad Rzeszutek Wilk, David Vrabel, linux-kernel, namhyung,
	rientjes, linux-mm, xen-devel, paulmck

On Thu, 01 Sep 2011 13:37:46 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> > Andrew,
> >
> > I was wondering if you would be Ok with this patch for 3.1.
> >
> > It is a revert (I can prepare a proper revert if you would like
> > that instead of this patch).

David's patch looks better than a straight reversion.

Problem is, I can't find David's original email anywhere.  Someone's
been playing games with To: headers?

> > The users of this particular function (alloc_vm_area) are just
> > Xen. There are no others.
> 
> I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
> necessary,

What would that patch look like?  Bear in mind that we'll need something
suitable for 3.1 and for a 3.0 backport.

> and ultimately try to work out ways of avoiding it altogether
> (like have some hypercall wrapper which touches the arg memory to make
> sure its mapped?).


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-01 21:17       ` Andrew Morton
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-09-01 21:17 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Konrad Rzeszutek Wilk, David Vrabel, linux-kernel, namhyung,
	rientjes, linux-mm, xen-devel, paulmck

On Thu, 01 Sep 2011 13:37:46 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> > Andrew,
> >
> > I was wondering if you would be Ok with this patch for 3.1.
> >
> > It is a revert (I can prepare a proper revert if you would like
> > that instead of this patch).

David's patch looks better than a straight reversion.

Problem is, I can't find David's original email anywhere.  Someone's
been playing games with To: headers?

> > The users of this particular function (alloc_vm_area) are just
> > Xen. There are no others.
> 
> I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
> necessary,

What would that patch look like?  Bear in mind that we'll need something
suitable for 3.1 and for a 3.0 backport.

> and ultimately try to work out ways of avoiding it altogether
> (like have some hypercall wrapper which touches the arg memory to make
> sure its mapped?).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-01 20:37     ` Jeremy Fitzhardinge
@ 2011-09-02  7:22       ` Ian Campbell
  -1 siblings, 0 replies; 54+ messages in thread
From: Ian Campbell @ 2011-09-02  7:22 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Konrad Rzeszutek Wilk, xen-devel, namhyung, linux-kernel,
	linux-mm, David Vrabel, rientjes, akpm, paulmck

On Thu, 2011-09-01 at 21:37 +0100, Jeremy Fitzhardinge wrote:
> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> > Andrew,
> >
> > I was wondering if you would be Ok with this patch for 3.1.
> >
> > It is a revert (I can prepare a proper revert if you would like
> > that instead of this patch).
> >
> > The users of this particular function (alloc_vm_area) are just
> > Xen. There are no others.
> 
> I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
> necessary, and ultimately try to work out ways of avoiding it altogether
> (like have some hypercall wrapper which touches the arg memory to make
> sure its mapped?).

That only syncs the current pagetable though. If that is sufficient (and
it could well be) then perhaps just doing a vmalloc_sync_one on the
current page tables directly would be better than faulting to do it?

It's the sort of thing you could hide inside the gnttab_set_map_op type
helpers I guess?

Ian.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-02  7:22       ` Ian Campbell
  0 siblings, 0 replies; 54+ messages in thread
From: Ian Campbell @ 2011-09-02  7:22 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Konrad Rzeszutek Wilk, xen-devel, namhyung, linux-kernel,
	linux-mm, David Vrabel, rientjes, akpm, paulmck

On Thu, 2011-09-01 at 21:37 +0100, Jeremy Fitzhardinge wrote:
> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> > Andrew,
> >
> > I was wondering if you would be Ok with this patch for 3.1.
> >
> > It is a revert (I can prepare a proper revert if you would like
> > that instead of this patch).
> >
> > The users of this particular function (alloc_vm_area) are just
> > Xen. There are no others.
> 
> I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
> necessary, and ultimately try to work out ways of avoiding it altogether
> (like have some hypercall wrapper which touches the arg memory to make
> sure its mapped?).

That only syncs the current pagetable though. If that is sufficient (and
it could well be) then perhaps just doing a vmalloc_sync_one on the
current page tables directly would be better than faulting to do it?

It's the sort of thing you could hide inside the gnttab_set_map_op type
helpers I guess?

Ian.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-02  7:22       ` Ian Campbell
@ 2011-09-02  7:31         ` Keir Fraser
  -1 siblings, 0 replies; 54+ messages in thread
From: Keir Fraser @ 2011-09-02  7:31 UTC (permalink / raw)
  To: Ian Campbell, Jeremy Fitzhardinge
  Cc: xen-devel, Konrad Rzeszutek Wilk, namhyung, linux-kernel,
	linux-mm, David Vrabel, rientjes, akpm, paulmck

On 02/09/2011 08:22, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:

> On Thu, 2011-09-01 at 21:37 +0100, Jeremy Fitzhardinge wrote:
>> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>> Andrew,
>>> 
>>> I was wondering if you would be Ok with this patch for 3.1.
>>> 
>>> It is a revert (I can prepare a proper revert if you would like
>>> that instead of this patch).
>>> 
>>> The users of this particular function (alloc_vm_area) are just
>>> Xen. There are no others.
>> 
>> I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
>> necessary, and ultimately try to work out ways of avoiding it altogether
>> (like have some hypercall wrapper which touches the arg memory to make
>> sure its mapped?).
> 
> That only syncs the current pagetable though. If that is sufficient (and
> it could well be) then perhaps just doing a vmalloc_sync_one on the
> current page tables directly would be better than faulting to do it?

It's probably sufficient unless the kernel has non-voluntary preemption, and
we are not preempt_disable()d.

 -- Keir

> It's the sort of thing you could hide inside the gnttab_set_map_op type
> helpers I guess?
> 
> Ian.
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-02  7:31         ` Keir Fraser
  0 siblings, 0 replies; 54+ messages in thread
From: Keir Fraser @ 2011-09-02  7:31 UTC (permalink / raw)
  To: Ian Campbell, Jeremy Fitzhardinge
  Cc: xen-devel, Konrad Rzeszutek Wilk, namhyung, linux-kernel,
	linux-mm, David Vrabel, rientjes, akpm, paulmck

On 02/09/2011 08:22, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:

> On Thu, 2011-09-01 at 21:37 +0100, Jeremy Fitzhardinge wrote:
>> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>> Andrew,
>>> 
>>> I was wondering if you would be Ok with this patch for 3.1.
>>> 
>>> It is a revert (I can prepare a proper revert if you would like
>>> that instead of this patch).
>>> 
>>> The users of this particular function (alloc_vm_area) are just
>>> Xen. There are no others.
>> 
>> I'd prefer to put explicit vmalloc_sync_all()s in the callsites where
>> necessary, and ultimately try to work out ways of avoiding it altogether
>> (like have some hypercall wrapper which touches the arg memory to make
>> sure its mapped?).
> 
> That only syncs the current pagetable though. If that is sufficient (and
> it could well be) then perhaps just doing a vmalloc_sync_one on the
> current page tables directly would be better than faulting to do it?

It's probably sufficient unless the kernel has non-voluntary preemption, and
we are not preempt_disable()d.

 -- Keir

> It's the sort of thing you could hide inside the gnttab_set_map_op type
> helpers I guess?
> 
> Ian.
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-01 21:17       ` Andrew Morton
  (?)
@ 2011-09-02 11:39         ` David Vrabel
  -1 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2011-09-02 11:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, linux-kernel,
	namhyung, rientjes, linux-mm, xen-devel, paulmck

On 01/09/11 22:17, Andrew Morton wrote:
> On Thu, 01 Sep 2011 13:37:46 -0700
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> 
>> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>> Andrew,
>>>
>>> I was wondering if you would be Ok with this patch for 3.1.
>>>
>>> It is a revert (I can prepare a proper revert if you would like
>>> that instead of this patch).
> 
> David's patch looks better than a straight reversion.
> 
> Problem is, I can't find David's original email anywhere.  Someone's
> been playing games with To: headers?

Sorry, I should have Cc'd linux-kernel and others on the original patch.

>From 6844ca07140e08f29454ca7b3fa459571c7ba428 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@citrix.com>
Date: Thu, 1 Sep 2011 11:42:50 +0100
Subject: [PATCH] mm: sync vmalloc address space page tables in
alloc_vm_area()

Xen backend drivers (e.g., blkback and netback) would sometimes fail
to map grant pages into the vmalloc address space allocated with
alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
could not find the page (in the L2 table) containing the PTEs it
needed to update.

(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000

netback and blkback were making the hypercall from a kernel thread
where task->active_mm != &init_mm and alloc_vm_area() was only
updating the page tables for init_mm.  The usual method of deferring
the update to the page tables of other processes (i.e., after taking a
fault) doesn't work as a fault cannot occur during the hypercall.

This would work on some systems depending on what else was using
vmalloc.

Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
(vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
comment to explain why it's needed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 mm/vmalloc.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7ef0903..5016f19 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size)
 		return NULL;
 	}

+	/*
+	 * If the allocated address space is passed to a hypercall
+	 * before being used then we cannot rely on a page fault to
+	 * trigger an update of the page tables.  So sync all the page
+	 * tables here.
+	 */
+	vmalloc_sync_all();
+
 	return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-02 11:39         ` David Vrabel
  0 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2011-09-02 11:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, linux-kernel,
	namhyung, rientjes, linux-mm, xen-devel, paulmck

On 01/09/11 22:17, Andrew Morton wrote:
> On Thu, 01 Sep 2011 13:37:46 -0700
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> 
>> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>> Andrew,
>>>
>>> I was wondering if you would be Ok with this patch for 3.1.
>>>
>>> It is a revert (I can prepare a proper revert if you would like
>>> that instead of this patch).
> 
> David's patch looks better than a straight reversion.
> 
> Problem is, I can't find David's original email anywhere.  Someone's
> been playing games with To: headers?

Sorry, I should have Cc'd linux-kernel and others on the original patch.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-02 11:39         ` David Vrabel
  0 siblings, 0 replies; 54+ messages in thread
From: David Vrabel @ 2011-09-02 11:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, linux-kernel,
	namhyung, rientjes, linux-mm, xen-devel, paulmck

On 01/09/11 22:17, Andrew Morton wrote:
> On Thu, 01 Sep 2011 13:37:46 -0700
> Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> 
>> On 09/01/2011 09:11 AM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Sep 01, 2011 at 12:51:03PM +0100, David Vrabel wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>> Andrew,
>>>
>>> I was wondering if you would be Ok with this patch for 3.1.
>>>
>>> It is a revert (I can prepare a proper revert if you would like
>>> that instead of this patch).
> 
> David's patch looks better than a straight reversion.
> 
> Problem is, I can't find David's original email anywhere.  Someone's
> been playing games with To: headers?

Sorry, I should have Cc'd linux-kernel and others on the original patch.

>From 6844ca07140e08f29454ca7b3fa459571c7ba428 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@citrix.com>
Date: Thu, 1 Sep 2011 11:42:50 +0100
Subject: [PATCH] mm: sync vmalloc address space page tables in
alloc_vm_area()

Xen backend drivers (e.g., blkback and netback) would sometimes fail
to map grant pages into the vmalloc address space allocated with
alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
could not find the page (in the L2 table) containing the PTEs it
needed to update.

(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000

netback and blkback were making the hypercall from a kernel thread
where task->active_mm != &init_mm and alloc_vm_area() was only
updating the page tables for init_mm.  The usual method of deferring
the update to the page tables of other processes (i.e., after taking a
fault) doesn't work as a fault cannot occur during the hypercall.

This would work on some systems depending on what else was using
vmalloc.

Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
(vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
comment to explain why it's needed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 mm/vmalloc.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7ef0903..5016f19 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2140,6 +2140,14 @@ struct vm_struct *alloc_vm_area(size_t size)
 		return NULL;
 	}

+	/*
+	 * If the allocated address space is passed to a hypercall
+	 * before being used then we cannot rely on a page fault to
+	 * trigger an update of the page tables.  So sync all the page
+	 * tables here.
+	 */
+	vmalloc_sync_all();
+
 	return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
-- 
1.7.2.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-02 11:39         ` David Vrabel
@ 2011-09-02 22:32           ` Andrew Morton
  -1 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-09-02 22:32 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, linux-kernel,
	namhyung, rientjes, linux-mm, xen-devel, paulmck

On Fri, 2 Sep 2011 12:39:19 +0100
David Vrabel <david.vrabel@citrix.com> wrote:

> Xen backend drivers (e.g., blkback and netback) would sometimes fail
> to map grant pages into the vmalloc address space allocated with
> alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> could not find the page (in the L2 table) containing the PTEs it
> needed to update.
> 
> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> 
> netback and blkback were making the hypercall from a kernel thread
> where task->active_mm != &init_mm and alloc_vm_area() was only
> updating the page tables for init_mm.  The usual method of deferring
> the update to the page tables of other processes (i.e., after taking a
> fault) doesn't work as a fault cannot occur during the hypercall.
> 
> This would work on some systems depending on what else was using
> vmalloc.
> 
> Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> comment to explain why it's needed.

oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
*think* that's the outcome of this discussion, for the short-term?


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-02 22:32           ` Andrew Morton
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-09-02 22:32 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, linux-kernel,
	namhyung, rientjes, linux-mm, xen-devel, paulmck

On Fri, 2 Sep 2011 12:39:19 +0100
David Vrabel <david.vrabel@citrix.com> wrote:

> Xen backend drivers (e.g., blkback and netback) would sometimes fail
> to map grant pages into the vmalloc address space allocated with
> alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> could not find the page (in the L2 table) containing the PTEs it
> needed to update.
> 
> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> 
> netback and blkback were making the hypercall from a kernel thread
> where task->active_mm != &init_mm and alloc_vm_area() was only
> updating the page tables for init_mm.  The usual method of deferring
> the update to the page tables of other processes (i.e., after taking a
> fault) doesn't work as a fault cannot occur during the hypercall.
> 
> This would work on some systems depending on what else was using
> vmalloc.
> 
> Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> comment to explain why it's needed.

oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
*think* that's the outcome of this discussion, for the short-term?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-02 22:32           ` Andrew Morton
@ 2011-09-02 23:04             ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 54+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-02 23:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, namhyung,
	rientjes, linux-mm, xen-devel, paulmck

On 09/02/2011 03:32 PM, Andrew Morton wrote:
> On Fri, 2 Sep 2011 12:39:19 +0100
> David Vrabel <david.vrabel@citrix.com> wrote:
>
>> Xen backend drivers (e.g., blkback and netback) would sometimes fail
>> to map grant pages into the vmalloc address space allocated with
>> alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
>> could not find the page (in the L2 table) containing the PTEs it
>> needed to update.
>>
>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
>>
>> netback and blkback were making the hypercall from a kernel thread
>> where task->active_mm != &init_mm and alloc_vm_area() was only
>> updating the page tables for init_mm.  The usual method of deferring
>> the update to the page tables of other processes (i.e., after taking a
>> fault) doesn't work as a fault cannot occur during the hypercall.
>>
>> This would work on some systems depending on what else was using
>> vmalloc.
>>
>> Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
>> (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
>> comment to explain why it's needed.
> oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> *think* that's the outcome of this discussion, for the short-term?


Sure, that will get things going for now.

Thanks,
    J


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-02 23:04             ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 54+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-02 23:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, namhyung,
	rientjes, linux-mm, xen-devel, paulmck

On 09/02/2011 03:32 PM, Andrew Morton wrote:
> On Fri, 2 Sep 2011 12:39:19 +0100
> David Vrabel <david.vrabel@citrix.com> wrote:
>
>> Xen backend drivers (e.g., blkback and netback) would sometimes fail
>> to map grant pages into the vmalloc address space allocated with
>> alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
>> could not find the page (in the L2 table) containing the PTEs it
>> needed to update.
>>
>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
>>
>> netback and blkback were making the hypercall from a kernel thread
>> where task->active_mm != &init_mm and alloc_vm_area() was only
>> updating the page tables for init_mm.  The usual method of deferring
>> the update to the page tables of other processes (i.e., after taking a
>> fault) doesn't work as a fault cannot occur during the hypercall.
>>
>> This would work on some systems depending on what else was using
>> vmalloc.
>>
>> Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
>> (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
>> comment to explain why it's needed.
> oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> *think* that's the outcome of this discussion, for the short-term?


Sure, that will get things going for now.

Thanks,
    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-02 22:32           ` Andrew Morton
@ 2011-09-06 16:35             ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-06 16:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote:
> On Fri, 2 Sep 2011 12:39:19 +0100
> David Vrabel <david.vrabel@citrix.com> wrote:
> 
> > Xen backend drivers (e.g., blkback and netback) would sometimes fail
> > to map grant pages into the vmalloc address space allocated with
> > alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> > could not find the page (in the L2 table) containing the PTEs it
> > needed to update.
> > 
> > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> > 
> > netback and blkback were making the hypercall from a kernel thread
> > where task->active_mm != &init_mm and alloc_vm_area() was only
> > updating the page tables for init_mm.  The usual method of deferring
> > the update to the page tables of other processes (i.e., after taking a
> > fault) doesn't work as a fault cannot occur during the hypercall.
> > 
> > This would work on some systems depending on what else was using
> > vmalloc.
> > 
> > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> > comment to explain why it's needed.
> 
> oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> *think* that's the outcome of this discussion, for the short-term?

<nods> Yup. Thanks!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-09-06 16:35             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-06 16:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote:
> On Fri, 2 Sep 2011 12:39:19 +0100
> David Vrabel <david.vrabel@citrix.com> wrote:
> 
> > Xen backend drivers (e.g., blkback and netback) would sometimes fail
> > to map grant pages into the vmalloc address space allocated with
> > alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> > could not find the page (in the L2 table) containing the PTEs it
> > needed to update.
> > 
> > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> > 
> > netback and blkback were making the hypercall from a kernel thread
> > where task->active_mm != &init_mm and alloc_vm_area() was only
> > updating the page tables for init_mm.  The usual method of deferring
> > the update to the page tables of other processes (i.e., after taking a
> > fault) doesn't work as a fault cannot occur during the hypercall.
> > 
> > This would work on some systems depending on what else was using
> > vmalloc.
> > 
> > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> > comment to explain why it's needed.
> 
> oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> *think* that's the outcome of this discussion, for the short-term?

<nods> Yup. Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-09-06 16:35             ` Konrad Rzeszutek Wilk
  (?)
@ 2011-11-05 13:38               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-05 13:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

On Tue, Sep 06, 2011 at 12:35:53PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote:
> > On Fri, 2 Sep 2011 12:39:19 +0100
> > David Vrabel <david.vrabel@citrix.com> wrote:
> > 
> > > Xen backend drivers (e.g., blkback and netback) would sometimes fail
> > > to map grant pages into the vmalloc address space allocated with
> > > alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> > > could not find the page (in the L2 table) containing the PTEs it
> > > needed to update.
> > > 
> > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> > > 
> > > netback and blkback were making the hypercall from a kernel thread
> > > where task->active_mm != &init_mm and alloc_vm_area() was only
> > > updating the page tables for init_mm.  The usual method of deferring
> > > the update to the page tables of other processes (i.e., after taking a
> > > fault) doesn't work as a fault cannot occur during the hypercall.
> > > 
> > > This would work on some systems depending on what else was using
> > > vmalloc.
> > > 
> > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> > > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> > > comment to explain why it's needed.
> > 
> > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > *think* that's the outcome of this discussion, for the short-term?
> 
> <nods> Yup. Thanks!

Hey Andrew,

The long term outcome is the patchset that David worked on. I've sent
a GIT PULL to Linus to pick up the Xen related patches that switch over
the users of the right API:

 (xen) stable/vmalloc-3.2 for Linux 3.2-rc0

(https://lkml.org/lkml/2011/10/29/82)

And then on top of that use this patch:
[Note, I am still waiting for Linus to pull that patchset above.. so not
sure on the outcome. perhaps a better way would be for you to pull
all patches in your tree?]

Also, not sure what you thought of this patch below?

>From b9acd3abc12972be0d938d7bc2466d899023e757 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@citrix.com>
Date: Thu, 29 Sep 2011 16:53:32 +0100
Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs
 directly

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs.  This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/grant-table.c         |    2 +-
 drivers/xen/xenbus/xenbus_client.c |   11 ++++++++---
 include/linux/vmalloc.h            |    2 +-
 mm/vmalloc.c                       |   27 +++++++++++++--------------
 4 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 6bbfd7a..5a40d24 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
 
 	if (shared == NULL) {
 		struct vm_struct *area =
-			alloc_vm_area(PAGE_SIZE * max_nr_gframes);
+			alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL);
 		BUG_ON(area == NULL);
 		shared = area->addr;
 		*__shared = shared;
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 229d3ad..52bc57f 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -34,6 +34,7 @@
 #include <linux/types.h>
 #include <linux/vmalloc.h>
 #include <asm/xen/hypervisor.h>
+#include <asm/xen/page.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
 #include <xen/events.h>
@@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
 int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
 {
 	struct gnttab_map_grant_ref op = {
-		.flags = GNTMAP_host_map,
+		.flags = GNTMAP_host_map | GNTMAP_contains_pte,
 		.ref   = gnt_ref,
 		.dom   = dev->otherend_id,
 	};
 	struct vm_struct *area;
+	pte_t *pte;
 
 	*vaddr = NULL;
 
-	area = alloc_vm_area(PAGE_SIZE);
+	area = alloc_vm_area(PAGE_SIZE, &pte);
 	if (!area)
 		return -ENOMEM;
 
-	op.host_addr = (unsigned long)area->addr;
+	op.host_addr = arbitrary_virt_to_machine(pte).maddr;
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
 		BUG();
@@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
 	struct gnttab_unmap_grant_ref op = {
 		.host_addr = (unsigned long)vaddr,
 	};
+	unsigned int level;
 
 	/* It'd be nice if linux/vmalloc.h provided a find_vm_area(void *addr)
 	 * method so that we don't have to muck with vmalloc internals here.
@@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
 	}
 
 	op.handle = (grant_handle_t)area->phys_addr;
+	op.host_addr = arbitrary_virt_to_machine(
+		lookup_address((unsigned long)vaddr, &level)).maddr;
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
 		BUG();
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9332e52..1a77252 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size)
 #endif
 
 /* Allocate/destroy a 'vmalloc' VM area. */
-extern struct vm_struct *alloc_vm_area(size_t size);
+extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
 extern void free_vm_area(struct vm_struct *area);
 
 /* for /dev/kmem */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5016f19..b5deec6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2105,23 +2105,30 @@ void  __attribute__((weak)) vmalloc_sync_all(void)
 
 static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
 {
-	/* apply_to_page_range() does all the hard work. */
+	pte_t ***p = data;
+
+	if (p) {
+		*(*p) = pte;
+		(*p)++;
+	}
 	return 0;
 }
 
 /**
  *	alloc_vm_area - allocate a range of kernel address space
  *	@size:		size of the area
+ *	@ptes:		returns the PTEs for the address space
  *
  *	Returns:	NULL on failure, vm_struct on success
  *
  *	This function reserves a range of kernel address space, and
  *	allocates pagetables to map that range.  No actual mappings
- *	are created.  If the kernel address space is not shared
- *	between processes, it syncs the pagetable across all
- *	processes.
+ *	are created.
+ *
+ *	If @ptes is non-NULL, pointers to the PTEs (in init_mm)
+ *	allocated for the VM area are returned.
  */
-struct vm_struct *alloc_vm_area(size_t size)
+struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
 {
 	struct vm_struct *area;
 
@@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size)
 	 * of kernel virtual address space and mapped into init_mm.
 	 */
 	if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
-				area->size, f, NULL)) {
+				size, f, ptes ? &ptes : NULL)) {
 		free_vm_area(area);
 		return NULL;
 	}
 
-	/*
-	 * If the allocated address space is passed to a hypercall
-	 * before being used then we cannot rely on a page fault to
-	 * trigger an update of the page tables.  So sync all the page
-	 * tables here.
-	 */
-	vmalloc_sync_all();
-
 	return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
-- 
1.7.7.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-11-05 13:38               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-05 13:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

On Tue, Sep 06, 2011 at 12:35:53PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote:
> > On Fri, 2 Sep 2011 12:39:19 +0100
> > David Vrabel <david.vrabel@citrix.com> wrote:
> > 
> > > Xen backend drivers (e.g., blkback and netback) would sometimes fail
> > > to map grant pages into the vmalloc address space allocated with
> > > alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> > > could not find the page (in the L2 table) containing the PTEs it
> > > needed to update.
> > > 
> > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> > > 
> > > netback and blkback were making the hypercall from a kernel thread
> > > where task->active_mm != &init_mm and alloc_vm_area() was only
> > > updating the page tables for init_mm.  The usual method of deferring
> > > the update to the page tables of other processes (i.e., after taking a
> > > fault) doesn't work as a fault cannot occur during the hypercall.
> > > 
> > > This would work on some systems depending on what else was using
> > > vmalloc.
> > > 
> > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> > > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> > > comment to explain why it's needed.
> > 
> > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > *think* that's the outcome of this discussion, for the short-term?
> 
> <nods> Yup. Thanks!

Hey Andrew,

The long term outcome is the patchset that David worked on. I've sent
a GIT PULL to Linus to pick up the Xen related patches that switch over
the users of the right API:

 (xen) stable/vmalloc-3.2 for Linux 3.2-rc0

(https://lkml.org/lkml/2011/10/29/82)

And then on top of that use this patch:
[Note, I am still waiting for Linus to pull that patchset above.. so not
sure on the outcome. perhaps a better way would be for you to pull
all patches in your tree?]

Also, not sure what you thought of this patch below?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-11-05 13:38               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-05 13:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

On Tue, Sep 06, 2011 at 12:35:53PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 02, 2011 at 03:32:04PM -0700, Andrew Morton wrote:
> > On Fri, 2 Sep 2011 12:39:19 +0100
> > David Vrabel <david.vrabel@citrix.com> wrote:
> > 
> > > Xen backend drivers (e.g., blkback and netback) would sometimes fail
> > > to map grant pages into the vmalloc address space allocated with
> > > alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen
> > > could not find the page (in the L2 table) containing the PTEs it
> > > needed to update.
> > > 
> > > (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
> > > 
> > > netback and blkback were making the hypercall from a kernel thread
> > > where task->active_mm != &init_mm and alloc_vm_area() was only
> > > updating the page tables for init_mm.  The usual method of deferring
> > > the update to the page tables of other processes (i.e., after taking a
> > > fault) doesn't work as a fault cannot occur during the hypercall.
> > > 
> > > This would work on some systems depending on what else was using
> > > vmalloc.
> > > 
> > > Fix this by reverting ef691947d8a3d479e67652312783aedcf629320a
> > > (vmalloc: remove vmalloc_sync_all() from alloc_vm_area()) and add a
> > > comment to explain why it's needed.
> > 
> > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > *think* that's the outcome of this discussion, for the short-term?
> 
> <nods> Yup. Thanks!

Hey Andrew,

The long term outcome is the patchset that David worked on. I've sent
a GIT PULL to Linus to pick up the Xen related patches that switch over
the users of the right API:

 (xen) stable/vmalloc-3.2 for Linux 3.2-rc0

(https://lkml.org/lkml/2011/10/29/82)

And then on top of that use this patch:
[Note, I am still waiting for Linus to pull that patchset above.. so not
sure on the outcome. perhaps a better way would be for you to pull
all patches in your tree?]

Also, not sure what you thought of this patch below?

>From b9acd3abc12972be0d938d7bc2466d899023e757 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@citrix.com>
Date: Thu, 29 Sep 2011 16:53:32 +0100
Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs
 directly

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs.  This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/grant-table.c         |    2 +-
 drivers/xen/xenbus/xenbus_client.c |   11 ++++++++---
 include/linux/vmalloc.h            |    2 +-
 mm/vmalloc.c                       |   27 +++++++++++++--------------
 4 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 6bbfd7a..5a40d24 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
 
 	if (shared == NULL) {
 		struct vm_struct *area =
-			alloc_vm_area(PAGE_SIZE * max_nr_gframes);
+			alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL);
 		BUG_ON(area == NULL);
 		shared = area->addr;
 		*__shared = shared;
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 229d3ad..52bc57f 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -34,6 +34,7 @@
 #include <linux/types.h>
 #include <linux/vmalloc.h>
 #include <asm/xen/hypervisor.h>
+#include <asm/xen/page.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
 #include <xen/events.h>
@@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
 int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
 {
 	struct gnttab_map_grant_ref op = {
-		.flags = GNTMAP_host_map,
+		.flags = GNTMAP_host_map | GNTMAP_contains_pte,
 		.ref   = gnt_ref,
 		.dom   = dev->otherend_id,
 	};
 	struct vm_struct *area;
+	pte_t *pte;
 
 	*vaddr = NULL;
 
-	area = alloc_vm_area(PAGE_SIZE);
+	area = alloc_vm_area(PAGE_SIZE, &pte);
 	if (!area)
 		return -ENOMEM;
 
-	op.host_addr = (unsigned long)area->addr;
+	op.host_addr = arbitrary_virt_to_machine(pte).maddr;
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
 		BUG();
@@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
 	struct gnttab_unmap_grant_ref op = {
 		.host_addr = (unsigned long)vaddr,
 	};
+	unsigned int level;
 
 	/* It'd be nice if linux/vmalloc.h provided a find_vm_area(void *addr)
 	 * method so that we don't have to muck with vmalloc internals here.
@@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
 	}
 
 	op.handle = (grant_handle_t)area->phys_addr;
+	op.host_addr = arbitrary_virt_to_machine(
+		lookup_address((unsigned long)vaddr, &level)).maddr;
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
 		BUG();
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9332e52..1a77252 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size)
 #endif
 
 /* Allocate/destroy a 'vmalloc' VM area. */
-extern struct vm_struct *alloc_vm_area(size_t size);
+extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
 extern void free_vm_area(struct vm_struct *area);
 
 /* for /dev/kmem */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5016f19..b5deec6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2105,23 +2105,30 @@ void  __attribute__((weak)) vmalloc_sync_all(void)
 
 static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
 {
-	/* apply_to_page_range() does all the hard work. */
+	pte_t ***p = data;
+
+	if (p) {
+		*(*p) = pte;
+		(*p)++;
+	}
 	return 0;
 }
 
 /**
  *	alloc_vm_area - allocate a range of kernel address space
  *	@size:		size of the area
+ *	@ptes:		returns the PTEs for the address space
  *
  *	Returns:	NULL on failure, vm_struct on success
  *
  *	This function reserves a range of kernel address space, and
  *	allocates pagetables to map that range.  No actual mappings
- *	are created.  If the kernel address space is not shared
- *	between processes, it syncs the pagetable across all
- *	processes.
+ *	are created.
+ *
+ *	If @ptes is non-NULL, pointers to the PTEs (in init_mm)
+ *	allocated for the VM area are returned.
  */
-struct vm_struct *alloc_vm_area(size_t size)
+struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
 {
 	struct vm_struct *area;
 
@@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size)
 	 * of kernel virtual address space and mapped into init_mm.
 	 */
 	if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
-				area->size, f, NULL)) {
+				size, f, ptes ? &ptes : NULL)) {
 		free_vm_area(area);
 		return NULL;
 	}
 
-	/*
-	 * If the allocated address space is passed to a hypercall
-	 * before being used then we cannot rely on a page fault to
-	 * trigger an update of the page tables.  So sync all the page
-	 * tables here.
-	 */
-	vmalloc_sync_all();
-
 	return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
-- 
1.7.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-05 13:38               ` Konrad Rzeszutek Wilk
@ 2011-11-07 20:36                 ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-07 20:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

[-- Attachment #1: Type: text/plain, Size: 6489 bytes --]

> > > 
> > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > > *think* that's the outcome of this discussion, for the short-term?
> > 
> > <nods> Yup. Thanks!
> 
> Hey Andrew,
> 
> The long term outcome is the patchset that David worked on. I've sent
> a GIT PULL to Linus to pick up the Xen related patches that switch over
> the users of the right API:
> 
>  (xen) stable/vmalloc-3.2 for Linux 3.2-rc0
> 
> (https://lkml.org/lkml/2011/10/29/82)

And Linus picked it up.
.. snip..
> 
> Also, not sure what you thought of this patch below?

Patch included as attachment for easier review..
> 
> From b9acd3abc12972be0d938d7bc2466d899023e757 Mon Sep 17 00:00:00 2001
> From: David Vrabel <david.vrabel@citrix.com>
> Date: Thu, 29 Sep 2011 16:53:32 +0100
> Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs
>  directly
> 
> When mapping a foreign page with xenbus_map_ring_valloc() with the
> GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
> pass a pointer to the PTE (in init_mm).
> 
> After the page is mapped, the usual fault mechanism can be used to
> update additional MMs.  This allows the vmalloc_sync_all() to be
> removed from alloc_vm_area().
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/grant-table.c         |    2 +-
>  drivers/xen/xenbus/xenbus_client.c |   11 ++++++++---
>  include/linux/vmalloc.h            |    2 +-
>  mm/vmalloc.c                       |   27 +++++++++++++--------------
>  4 files changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> index 6bbfd7a..5a40d24 100644
> --- a/arch/x86/xen/grant-table.c
> +++ b/arch/x86/xen/grant-table.c
> @@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
>  
>  	if (shared == NULL) {
>  		struct vm_struct *area =
> -			alloc_vm_area(PAGE_SIZE * max_nr_gframes);
> +			alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL);
>  		BUG_ON(area == NULL);
>  		shared = area->addr;
>  		*__shared = shared;
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index 229d3ad..52bc57f 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -34,6 +34,7 @@
>  #include <linux/types.h>
>  #include <linux/vmalloc.h>
>  #include <asm/xen/hypervisor.h>
> +#include <asm/xen/page.h>
>  #include <xen/interface/xen.h>
>  #include <xen/interface/event_channel.h>
>  #include <xen/events.h>
> @@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
>  int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
>  {
>  	struct gnttab_map_grant_ref op = {
> -		.flags = GNTMAP_host_map,
> +		.flags = GNTMAP_host_map | GNTMAP_contains_pte,
>  		.ref   = gnt_ref,
>  		.dom   = dev->otherend_id,
>  	};
>  	struct vm_struct *area;
> +	pte_t *pte;
>  
>  	*vaddr = NULL;
>  
> -	area = alloc_vm_area(PAGE_SIZE);
> +	area = alloc_vm_area(PAGE_SIZE, &pte);
>  	if (!area)
>  		return -ENOMEM;
>  
> -	op.host_addr = (unsigned long)area->addr;
> +	op.host_addr = arbitrary_virt_to_machine(pte).maddr;
>  
>  	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
>  		BUG();
> @@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
>  	struct gnttab_unmap_grant_ref op = {
>  		.host_addr = (unsigned long)vaddr,
>  	};
> +	unsigned int level;
>  
>  	/* It'd be nice if linux/vmalloc.h provided a find_vm_area(void *addr)
>  	 * method so that we don't have to muck with vmalloc internals here.
> @@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
>  	}
>  
>  	op.handle = (grant_handle_t)area->phys_addr;
> +	op.host_addr = arbitrary_virt_to_machine(
> +		lookup_address((unsigned long)vaddr, &level)).maddr;
>  
>  	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
>  		BUG();
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 9332e52..1a77252 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size)
>  #endif
>  
>  /* Allocate/destroy a 'vmalloc' VM area. */
> -extern struct vm_struct *alloc_vm_area(size_t size);
> +extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
>  extern void free_vm_area(struct vm_struct *area);
>  
>  /* for /dev/kmem */
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 5016f19..b5deec6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2105,23 +2105,30 @@ void  __attribute__((weak)) vmalloc_sync_all(void)
>  
>  static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
>  {
> -	/* apply_to_page_range() does all the hard work. */
> +	pte_t ***p = data;
> +
> +	if (p) {
> +		*(*p) = pte;
> +		(*p)++;
> +	}
>  	return 0;
>  }
>  
>  /**
>   *	alloc_vm_area - allocate a range of kernel address space
>   *	@size:		size of the area
> + *	@ptes:		returns the PTEs for the address space
>   *
>   *	Returns:	NULL on failure, vm_struct on success
>   *
>   *	This function reserves a range of kernel address space, and
>   *	allocates pagetables to map that range.  No actual mappings
> - *	are created.  If the kernel address space is not shared
> - *	between processes, it syncs the pagetable across all
> - *	processes.
> + *	are created.
> + *
> + *	If @ptes is non-NULL, pointers to the PTEs (in init_mm)
> + *	allocated for the VM area are returned.
>   */
> -struct vm_struct *alloc_vm_area(size_t size)
> +struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
>  {
>  	struct vm_struct *area;
>  
> @@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size)
>  	 * of kernel virtual address space and mapped into init_mm.
>  	 */
>  	if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
> -				area->size, f, NULL)) {
> +				size, f, ptes ? &ptes : NULL)) {
>  		free_vm_area(area);
>  		return NULL;
>  	}
>  
> -	/*
> -	 * If the allocated address space is passed to a hypercall
> -	 * before being used then we cannot rely on a page fault to
> -	 * trigger an update of the page tables.  So sync all the page
> -	 * tables here.
> -	 */
> -	vmalloc_sync_all();
> -
>  	return area;
>  }
>  EXPORT_SYMBOL_GPL(alloc_vm_area);
> -- 
> 1.7.7.1
> 

[-- Attachment #2: 0001-xen-map-foreign-pages-for-shared-rings-by-updating-t.patch --]
[-- Type: text/plain, Size: 5537 bytes --]

>From 21396dbbb3f976c8b4deb9f696994458dbe00db0 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@citrix.com>
Date: Thu, 29 Sep 2011 16:53:32 +0100
Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs
 directly

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs.  This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/grant-table.c         |    2 +-
 drivers/xen/xenbus/xenbus_client.c |   11 ++++++++---
 include/linux/vmalloc.h            |    2 +-
 mm/vmalloc.c                       |   27 +++++++++++++--------------
 4 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 6bbfd7a..5a40d24 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
 
 	if (shared == NULL) {
 		struct vm_struct *area =
-			alloc_vm_area(PAGE_SIZE * max_nr_gframes);
+			alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL);
 		BUG_ON(area == NULL);
 		shared = area->addr;
 		*__shared = shared;
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 81c3ce6..1906125 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -35,6 +35,7 @@
 #include <linux/vmalloc.h>
 #include <linux/export.h>
 #include <asm/xen/hypervisor.h>
+#include <asm/xen/page.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
 #include <xen/events.h>
@@ -436,19 +437,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
 int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
 {
 	struct gnttab_map_grant_ref op = {
-		.flags = GNTMAP_host_map,
+		.flags = GNTMAP_host_map | GNTMAP_contains_pte,
 		.ref   = gnt_ref,
 		.dom   = dev->otherend_id,
 	};
 	struct vm_struct *area;
+	pte_t *pte;
 
 	*vaddr = NULL;
 
-	area = alloc_vm_area(PAGE_SIZE);
+	area = alloc_vm_area(PAGE_SIZE, &pte);
 	if (!area)
 		return -ENOMEM;
 
-	op.host_addr = (unsigned long)area->addr;
+	op.host_addr = arbitrary_virt_to_machine(pte).maddr;
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
 		BUG();
@@ -527,6 +529,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
 	struct gnttab_unmap_grant_ref op = {
 		.host_addr = (unsigned long)vaddr,
 	};
+	unsigned int level;
 
 	/* It'd be nice if linux/vmalloc.h provided a find_vm_area(void *addr)
 	 * method so that we don't have to muck with vmalloc internals here.
@@ -548,6 +551,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
 	}
 
 	op.handle = (grant_handle_t)area->phys_addr;
+	op.host_addr = arbitrary_virt_to_machine(
+		lookup_address((unsigned long)vaddr, &level)).maddr;
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
 		BUG();
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 687fb11..4bde182 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -119,7 +119,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size)
 #endif
 
 /* Allocate/destroy a 'vmalloc' VM area. */
-extern struct vm_struct *alloc_vm_area(size_t size);
+extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
 extern void free_vm_area(struct vm_struct *area);
 
 /* for /dev/kmem */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b669aa6..3231bf3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2141,23 +2141,30 @@ void  __attribute__((weak)) vmalloc_sync_all(void)
 
 static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
 {
-	/* apply_to_page_range() does all the hard work. */
+	pte_t ***p = data;
+
+	if (p) {
+		*(*p) = pte;
+		(*p)++;
+	}
 	return 0;
 }
 
 /**
  *	alloc_vm_area - allocate a range of kernel address space
  *	@size:		size of the area
+ *	@ptes:		returns the PTEs for the address space
  *
  *	Returns:	NULL on failure, vm_struct on success
  *
  *	This function reserves a range of kernel address space, and
  *	allocates pagetables to map that range.  No actual mappings
- *	are created.  If the kernel address space is not shared
- *	between processes, it syncs the pagetable across all
- *	processes.
+ *	are created.
+ *
+ *	If @ptes is non-NULL, pointers to the PTEs (in init_mm)
+ *	allocated for the VM area are returned.
  */
-struct vm_struct *alloc_vm_area(size_t size)
+struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
 {
 	struct vm_struct *area;
 
@@ -2171,19 +2178,11 @@ struct vm_struct *alloc_vm_area(size_t size)
 	 * of kernel virtual address space and mapped into init_mm.
 	 */
 	if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
-				area->size, f, NULL)) {
+				size, f, ptes ? &ptes : NULL)) {
 		free_vm_area(area);
 		return NULL;
 	}
 
-	/*
-	 * If the allocated address space is passed to a hypercall
-	 * before being used then we cannot rely on a page fault to
-	 * trigger an update of the page tables.  So sync all the page
-	 * tables here.
-	 */
-	vmalloc_sync_all();
-
 	return area;
 }
 EXPORT_SYMBOL_GPL(alloc_vm_area);
-- 
1.7.7.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-11-07 20:36                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-07 20:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

[-- Attachment #1: Type: text/plain, Size: 6489 bytes --]

> > > 
> > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > > *think* that's the outcome of this discussion, for the short-term?
> > 
> > <nods> Yup. Thanks!
> 
> Hey Andrew,
> 
> The long term outcome is the patchset that David worked on. I've sent
> a GIT PULL to Linus to pick up the Xen related patches that switch over
> the users of the right API:
> 
>  (xen) stable/vmalloc-3.2 for Linux 3.2-rc0
> 
> (https://lkml.org/lkml/2011/10/29/82)

And Linus picked it up.
.. snip..
> 
> Also, not sure what you thought of this patch below?

Patch included as attachment for easier review..
> 
> From b9acd3abc12972be0d938d7bc2466d899023e757 Mon Sep 17 00:00:00 2001
> From: David Vrabel <david.vrabel@citrix.com>
> Date: Thu, 29 Sep 2011 16:53:32 +0100
> Subject: [PATCH] xen: map foreign pages for shared rings by updating the PTEs
>  directly
> 
> When mapping a foreign page with xenbus_map_ring_valloc() with the
> GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
> pass a pointer to the PTE (in init_mm).
> 
> After the page is mapped, the usual fault mechanism can be used to
> update additional MMs.  This allows the vmalloc_sync_all() to be
> removed from alloc_vm_area().
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/grant-table.c         |    2 +-
>  drivers/xen/xenbus/xenbus_client.c |   11 ++++++++---
>  include/linux/vmalloc.h            |    2 +-
>  mm/vmalloc.c                       |   27 +++++++++++++--------------
>  4 files changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> index 6bbfd7a..5a40d24 100644
> --- a/arch/x86/xen/grant-table.c
> +++ b/arch/x86/xen/grant-table.c
> @@ -71,7 +71,7 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
>  
>  	if (shared == NULL) {
>  		struct vm_struct *area =
> -			alloc_vm_area(PAGE_SIZE * max_nr_gframes);
> +			alloc_vm_area(PAGE_SIZE * max_nr_gframes, NULL);
>  		BUG_ON(area == NULL);
>  		shared = area->addr;
>  		*__shared = shared;
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index 229d3ad..52bc57f 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -34,6 +34,7 @@
>  #include <linux/types.h>
>  #include <linux/vmalloc.h>
>  #include <asm/xen/hypervisor.h>
> +#include <asm/xen/page.h>
>  #include <xen/interface/xen.h>
>  #include <xen/interface/event_channel.h>
>  #include <xen/events.h>
> @@ -435,19 +436,20 @@ EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
>  int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
>  {
>  	struct gnttab_map_grant_ref op = {
> -		.flags = GNTMAP_host_map,
> +		.flags = GNTMAP_host_map | GNTMAP_contains_pte,
>  		.ref   = gnt_ref,
>  		.dom   = dev->otherend_id,
>  	};
>  	struct vm_struct *area;
> +	pte_t *pte;
>  
>  	*vaddr = NULL;
>  
> -	area = alloc_vm_area(PAGE_SIZE);
> +	area = alloc_vm_area(PAGE_SIZE, &pte);
>  	if (!area)
>  		return -ENOMEM;
>  
> -	op.host_addr = (unsigned long)area->addr;
> +	op.host_addr = arbitrary_virt_to_machine(pte).maddr;
>  
>  	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
>  		BUG();
> @@ -526,6 +528,7 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
>  	struct gnttab_unmap_grant_ref op = {
>  		.host_addr = (unsigned long)vaddr,
>  	};
> +	unsigned int level;
>  
>  	/* It'd be nice if linux/vmalloc.h provided a find_vm_area(void *addr)
>  	 * method so that we don't have to muck with vmalloc internals here.
> @@ -547,6 +550,8 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
>  	}
>  
>  	op.handle = (grant_handle_t)area->phys_addr;
> +	op.host_addr = arbitrary_virt_to_machine(
> +		lookup_address((unsigned long)vaddr, &level)).maddr;
>  
>  	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
>  		BUG();
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 9332e52..1a77252 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -118,7 +118,7 @@ unmap_kernel_range(unsigned long addr, unsigned long size)
>  #endif
>  
>  /* Allocate/destroy a 'vmalloc' VM area. */
> -extern struct vm_struct *alloc_vm_area(size_t size);
> +extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
>  extern void free_vm_area(struct vm_struct *area);
>  
>  /* for /dev/kmem */
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 5016f19..b5deec6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2105,23 +2105,30 @@ void  __attribute__((weak)) vmalloc_sync_all(void)
>  
>  static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
>  {
> -	/* apply_to_page_range() does all the hard work. */
> +	pte_t ***p = data;
> +
> +	if (p) {
> +		*(*p) = pte;
> +		(*p)++;
> +	}
>  	return 0;
>  }
>  
>  /**
>   *	alloc_vm_area - allocate a range of kernel address space
>   *	@size:		size of the area
> + *	@ptes:		returns the PTEs for the address space
>   *
>   *	Returns:	NULL on failure, vm_struct on success
>   *
>   *	This function reserves a range of kernel address space, and
>   *	allocates pagetables to map that range.  No actual mappings
> - *	are created.  If the kernel address space is not shared
> - *	between processes, it syncs the pagetable across all
> - *	processes.
> + *	are created.
> + *
> + *	If @ptes is non-NULL, pointers to the PTEs (in init_mm)
> + *	allocated for the VM area are returned.
>   */
> -struct vm_struct *alloc_vm_area(size_t size)
> +struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
>  {
>  	struct vm_struct *area;
>  
> @@ -2135,19 +2142,11 @@ struct vm_struct *alloc_vm_area(size_t size)
>  	 * of kernel virtual address space and mapped into init_mm.
>  	 */
>  	if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
> -				area->size, f, NULL)) {
> +				size, f, ptes ? &ptes : NULL)) {
>  		free_vm_area(area);
>  		return NULL;
>  	}
>  
> -	/*
> -	 * If the allocated address space is passed to a hypercall
> -	 * before being used then we cannot rely on a page fault to
> -	 * trigger an update of the page tables.  So sync all the page
> -	 * tables here.
> -	 */
> -	vmalloc_sync_all();
> -
>  	return area;
>  }
>  EXPORT_SYMBOL_GPL(alloc_vm_area);
> -- 
> 1.7.7.1
> 

[-- Attachment #2: 0001-xen-map-foreign-pages-for-shared-rings-by-updating-t.patch --]
[-- Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-07 20:36                 ` Konrad Rzeszutek Wilk
@ 2011-11-08 23:01                   ` Andrew Morton
  -1 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-11-08 23:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

On Mon, 7 Nov 2011 15:36:13 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> > > > 
> > > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > > > *think* that's the outcome of this discussion, for the short-term?
> > > 
> > > <nods> Yup. Thanks!
> > 
> > Hey Andrew,
> > 
> > The long term outcome is the patchset that David worked on. I've sent
> > a GIT PULL to Linus to pick up the Xen related patches that switch over
> > the users of the right API:
> > 
> >  (xen) stable/vmalloc-3.2 for Linux 3.2-rc0
> > 
> > (https://lkml.org/lkml/2011/10/29/82)
> 
> And Linus picked it up.

I've no idea what's going on here.

> .. snip..
> > 
> > Also, not sure what you thought of this patch below?
> 
> Patch included as attachment for easier review..

The patch "xen: map foreign pages for shared rings by updating the PTEs
directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-11-08 23:01                   ` Andrew Morton
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-11-08 23:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, Jeremy Fitzhardinge, xen-devel, namhyung,
	linux-kernel, linux-mm, rientjes, paulmck

On Mon, 7 Nov 2011 15:36:13 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> > > > 
> > > > oookay, I queued this for 3.1 and tagged it for a 3.0.x backport.  I
> > > > *think* that's the outcome of this discussion, for the short-term?
> > > 
> > > <nods> Yup. Thanks!
> > 
> > Hey Andrew,
> > 
> > The long term outcome is the patchset that David worked on. I've sent
> > a GIT PULL to Linus to pick up the Xen related patches that switch over
> > the users of the right API:
> > 
> >  (xen) stable/vmalloc-3.2 for Linux 3.2-rc0
> > 
> > (https://lkml.org/lkml/2011/10/29/82)
> 
> And Linus picked it up.

I've no idea what's going on here.

> .. snip..
> > 
> > Also, not sure what you thought of this patch below?
> 
> Patch included as attachment for easier review..

The patch "xen: map foreign pages for shared rings by updating the PTEs
directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-08 23:01                   ` Andrew Morton
@ 2011-11-08 23:31                     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-08 23:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeremy Fitzhardinge, xen-devel, namhyung, linux-kernel, linux-mm,
	David Vrabel, rientjes, paulmck

> > And Linus picked it up.
> 
> I've no idea what's going on here.

Heh. Sorry for being so confusing. Merge windows are a bit stressful and
I sometimes end up writing run-on sentences.
> 
> > .. snip..
> > > 
> > > Also, not sure what you thought of this patch below?
> > 
> > Patch included as attachment for easier review..
> 
> The patch "xen: map foreign pages for shared rings by updating the PTEs
> directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next.

<nods>. That is b/c it does not have your Ack. The patch applies cleanly to
3.2-rc1 (as all the other patches that it depends on are now in 3.2-rc1).

I am humbly asking for you to:
 a) review the patch (which you did) and get an idea whether you are OK (sounds like you are)
 b) pick it up in your -mm tree.

or alternately:
 b) give an Ack on the patch so I can put it in my linux-next and push it
    for 3.2-rc1.

Thank you for you consideration!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-11-08 23:31                     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-08 23:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeremy Fitzhardinge, xen-devel, namhyung, linux-kernel, linux-mm,
	David Vrabel, rientjes, paulmck

> > And Linus picked it up.
> 
> I've no idea what's going on here.

Heh. Sorry for being so confusing. Merge windows are a bit stressful and
I sometimes end up writing run-on sentences.
> 
> > .. snip..
> > > 
> > > Also, not sure what you thought of this patch below?
> > 
> > Patch included as attachment for easier review..
> 
> The patch "xen: map foreign pages for shared rings by updating the PTEs
> directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next.

<nods>. That is b/c it does not have your Ack. The patch applies cleanly to
3.2-rc1 (as all the other patches that it depends on are now in 3.2-rc1).

I am humbly asking for you to:
 a) review the patch (which you did) and get an idea whether you are OK (sounds like you are)
 b) pick it up in your -mm tree.

or alternately:
 b) give an Ack on the patch so I can put it in my linux-next and push it
    for 3.2-rc1.

Thank you for you consideration!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-08 23:31                     ` Konrad Rzeszutek Wilk
@ 2011-11-08 23:36                       ` Andrew Morton
  -1 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-11-08 23:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, xen-devel, namhyung, linux-kernel, linux-mm,
	David Vrabel, rientjes, paulmck

On Tue, 8 Nov 2011 18:31:32 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> > > And Linus picked it up.
> > 
> > I've no idea what's going on here.
> 
> Heh. Sorry for being so confusing. Merge windows are a bit stressful and
> I sometimes end up writing run-on sentences.
> > 
> > > .. snip..
> > > > 
> > > > Also, not sure what you thought of this patch below?
> > > 
> > > Patch included as attachment for easier review..
> > 
> > The patch "xen: map foreign pages for shared rings by updating the PTEs
> > directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next.
> 
> <nods>. That is b/c it does not have your Ack. The patch applies cleanly to
> 3.2-rc1 (as all the other patches that it depends on are now in 3.2-rc1).
> 
> I am humbly asking for you to:
>  a) review the patch (which you did) and get an idea whether you are OK (sounds like you are)

Yup.

>  b) pick it up in your -mm tree.

No added benefit there.

> or alternately:
>  b) give an Ack on the patch so I can put it in my linux-next and push it
>     for 3.2-rc1.

That's OK by me.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
@ 2011-11-08 23:36                       ` Andrew Morton
  0 siblings, 0 replies; 54+ messages in thread
From: Andrew Morton @ 2011-11-08 23:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, xen-devel, namhyung, linux-kernel, linux-mm,
	David Vrabel, rientjes, paulmck

On Tue, 8 Nov 2011 18:31:32 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> > > And Linus picked it up.
> > 
> > I've no idea what's going on here.
> 
> Heh. Sorry for being so confusing. Merge windows are a bit stressful and
> I sometimes end up writing run-on sentences.
> > 
> > > .. snip..
> > > > 
> > > > Also, not sure what you thought of this patch below?
> > > 
> > > Patch included as attachment for easier review..
> > 
> > The patch "xen: map foreign pages for shared rings by updating the PTEs
> > directly" (whcih looks harnless enough) is not present in 3.2-rc1 or linux-next.
> 
> <nods>. That is b/c it does not have your Ack. The patch applies cleanly to
> 3.2-rc1 (as all the other patches that it depends on are now in 3.2-rc1).
> 
> I am humbly asking for you to:
>  a) review the patch (which you did) and get an idea whether you are OK (sounds like you are)

Yup.

>  b) pick it up in your -mm tree.

No added benefit there.

> or alternately:
>  b) give an Ack on the patch so I can put it in my linux-next and push it
>     for 3.2-rc1.

That's OK by me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-07 20:36                 ` Konrad Rzeszutek Wilk
  (?)
  (?)
@ 2011-11-28  9:36                 ` Jan Beulich
  2011-11-28 10:19                   ` Ian Campbell
  -1 siblings, 1 reply; 54+ messages in thread
From: Jan Beulich @ 2011-11-28  9:36 UTC (permalink / raw)
  To: David Vrabel, Konrad Rzeszutek Wilk; +Cc: xen-devel

>>> On 07.11.11 at 21:36, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> Patch included as attachment for easier review..

I just noticed that this patch made it upstream meanwhile, and that I
should have paid more attention earlier: Once again this adds x86
specific bits to (supposedly) architecture independent Xen code
(lookup_address(), use of GNTMAP_contains_pte). Unless everyone
agrees that x86 is going to remain the only architecture Xen will support
going forward (no ia64, no ARM, nothing else), patches doing so (at
least outside proper #ifdef-s or alike) should really be rejected.

Besides that, the patch also appears to close the road to running
backends in HVM - use of GNTMAP_contains_pte isn't permitted for
paging_mode_external() guests, so it's even a step backwards for
x86.

Jan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-28  9:36                 ` Jan Beulich
@ 2011-11-28 10:19                   ` Ian Campbell
  2011-11-28 11:36                     ` David Vrabel
  0 siblings, 1 reply; 54+ messages in thread
From: Ian Campbell @ 2011-11-28 10:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, David Vrabel, Konrad Rzeszutek Wilk

On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote:
> >>> On 07.11.11 at 21:36, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > Patch included as attachment for easier review..
> 
> I just noticed that this patch made it upstream meanwhile, and that I
> should have paid more attention earlier: Once again this adds x86
> specific bits to (supposedly) architecture independent Xen code
> (lookup_address(), use of GNTMAP_contains_pte). Unless everyone
> agrees that x86 is going to remain the only architecture Xen will support
> going forward (no ia64, no ARM, nothing else), patches doing so (at
> least outside proper #ifdef-s or alike) should really be rejected.

I think it is up to those interested in such architectures to ensure
that a working baseline exists in the first place.

The ARM stuff hasn't even been submitted yet. When the arm stuff is
submitted it will naturally include fixes for these sorts issues as
necessary and at that point we can talk about regressions and reviewing
patches in order to avoid them (until then we can't really know what a
"regression" is). There is nothing unusual about that and nothing about
patches we take right now commit us to never supporting another arch in
the future so lets not get carried away here.

The IA64 support in mainline Linux does not appear to have anyone
interested in working on it. AFAICT it hasn't really been touched (other
than odd fixes and tree-wide cleanups) since it was first committed
(circa 2.6.25 IIRC). It doesn't even build right now and looks like it
hasn't built since at least 2.6.36, based on the one failure I happened
to look at.

I know you've been working on fixing up the hypervisor side of ia64
support things recently but it is not clear that there is an existing or
viable user or developer base for that port right now.

> Besides that, the patch also appears to close the road to running
> backends in HVM - use of GNTMAP_contains_pte isn't permitted for
> paging_mode_external() guests, so it's even a step backwards for
> x86.

That is a bigger concern.

Ian.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-28 10:19                   ` Ian Campbell
@ 2011-11-28 11:36                     ` David Vrabel
  2011-11-28 11:48                       ` Ian Campbell
  2011-11-28 15:20                       ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 54+ messages in thread
From: David Vrabel @ 2011-11-28 11:36 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On 28/11/11 10:19, Ian Campbell wrote:
> On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote:
>> Besides that, the patch also appears to close the road to running
>> backends in HVM - use of GNTMAP_contains_pte isn't permitted for
>> paging_mode_external() guests, so it's even a step backwards for
>> x86.
> 
> That is a bigger concern.

Daniel de Graaf has a patch series for this.

http://lists.xen.org/archives/html/xen-devel/2011-10/msg01484.html

Specifically: [PATCH 1/6] xenbus: Support HVM backends

http://lists.xen.org/archives/html/xen-devel/2011-10/msg01486.html

David

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-28 11:36                     ` David Vrabel
@ 2011-11-28 11:48                       ` Ian Campbell
  2011-11-28 15:20                       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 54+ messages in thread
From: Ian Campbell @ 2011-11-28 11:48 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Mon, 2011-11-28 at 11:36 +0000, David Vrabel wrote:
> On 28/11/11 10:19, Ian Campbell wrote:
> > On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote:
> >> Besides that, the patch also appears to close the road to running
> >> backends in HVM - use of GNTMAP_contains_pte isn't permitted for
> >> paging_mode_external() guests, so it's even a step backwards for
> >> x86.
> > 
> > That is a bigger concern.
> 
> Daniel de Graaf has a patch series for this.
> 
> http://lists.xen.org/archives/html/xen-devel/2011-10/msg01484.html
> 
> Specifically: [PATCH 1/6] xenbus: Support HVM backends
> 
> http://lists.xen.org/archives/html/xen-devel/2011-10/msg01486.html

I'd forgotten about that, thanks!

Ian.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area()
  2011-11-28 11:36                     ` David Vrabel
  2011-11-28 11:48                       ` Ian Campbell
@ 2011-11-28 15:20                       ` Konrad Rzeszutek Wilk
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
  1 sibling, 1 reply; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-11-28 15:20 UTC (permalink / raw)
  To: David Vrabel, dgdegra
  Cc: xen-devel, Ian Campbell, Jan Beulich, Konrad Rzeszutek Wilk

On Mon, Nov 28, 2011 at 11:36:49AM +0000, David Vrabel wrote:
> On 28/11/11 10:19, Ian Campbell wrote:
> > On Mon, 2011-11-28 at 09:36 +0000, Jan Beulich wrote:
> >> Besides that, the patch also appears to close the road to running
> >> backends in HVM - use of GNTMAP_contains_pte isn't permitted for
> >> paging_mode_external() guests, so it's even a step backwards for
> >> x86.
> > 
> > That is a bigger concern.
> 
> Daniel de Graaf has a patch series for this.
> 
> http://lists.xen.org/archives/html/xen-devel/2011-10/msg01484.html
> 
> Specifically: [PATCH 1/6] xenbus: Support HVM backends

Hmm, and I somehow loss track of those.

Daniel,

Could you repost that patch series on top of v3.2-rc3 please?
Or perhaps all of the patches you have so far in one big shot?

> 
> http://lists.xen.org/archives/html/xen-devel/2011-10/msg01486.html
> 
> David
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 00/12] HVM backends and gnt/event fixups
  2011-11-28 15:20                       ` Konrad Rzeszutek Wilk
@ 2011-11-28 16:48                         ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 01/12] xenbus: Support HVM backends Daniel De Graaf
                                             ` (11 more replies)
  0 siblings, 12 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:48 UTC (permalink / raw)
  To: konrad.wilk; +Cc: xen-devel, david.vrabel, JBeulich, Ian.Campbell

Reposting my current list of un-applied patches on top of v3.2-rc3

xen/{net,blk}back support for running in HVM:
[PATCH 01/12] xenbus: Support HVM backends
[PATCH 02/12] xenbus: Use grant-table wrapper functions
[PATCH 03/12] xen/grant-table: Support mappings required by blkback
[PATCH 04/12] xen/blkback: use grant-table.c hypercall wrappers
[PATCH 05/12] xen/netback: Enable netback on HVM guests
[PATCH 06/12] xen/blkback: Enable blkback on HVM guests

Event channel reference counts (these are on konrad/stable/for-linus-3.3
already, just reposted for reference):
[PATCH 07/12] xen/event: Add reference counting to event channels
[PATCH 08/12] xen/gntalloc: Change gref_lock to a mutex
[PATCH 09/12] xen/gnt{dev,alloc}: reserve event channels for notify

Fix on top of #7-9 (this could also be merged into #7):
[PATCH 10/12] xen/events: prevent calling evtchn_get on invalid channels

Fixes for gntalloc reference/page leaks:
[PATCH 11/12] xen/gntalloc: release grant references on page free
[PATCH 12/12] xen/gntalloc: fix reference counts on multi-page mappings

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 01/12] xenbus: Support HVM backends
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-12-14 19:03                             ` Konrad Rzeszutek Wilk
  2011-11-28 16:49                           ` [PATCH 02/12] xenbus: Use grant-table wrapper functions Daniel De Graaf
                                             ` (10 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

Add HVM implementations of xenbus_(map,unmap)_ring_v(alloc,free) so
that ring mappings can be done without using GNTMAP_contains_pte which
is not supported on HVM.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/xen/xenbus/xenbus_client.c |  155 +++++++++++++++++++++++++++++-------
 1 files changed, 125 insertions(+), 30 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 1906125..688c4b4 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -32,16 +32,27 @@
 
 #include <linux/slab.h>
 #include <linux/types.h>
+#include <linux/spinlock.h>
 #include <linux/vmalloc.h>
 #include <linux/export.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/page.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
+#include <xen/balloon.h>
 #include <xen/events.h>
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
 
+struct xenbus_map_node {
+	struct list_head next;
+	struct page *page;
+	grant_handle_t handle;
+};
+
+static DEFINE_SPINLOCK(xenbus_valloc_lock);
+static LIST_HEAD(xenbus_valloc_pages);
+
 const char *xenbus_strstate(enum xenbus_state state)
 {
 	static const char *const name[] = {
@@ -420,21 +431,8 @@ int xenbus_free_evtchn(struct xenbus_device *dev, int port)
 EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
 
 
-/**
- * xenbus_map_ring_valloc
- * @dev: xenbus device
- * @gnt_ref: grant reference
- * @vaddr: pointer to address to be filled out by mapping
- *
- * Based on Rusty Russell's skeleton driver's map_page.
- * Map a page of memory into this domain from another domain's grant table.
- * xenbus_map_ring_valloc allocates a page of virtual address space, maps the
- * page to that address, and sets *vaddr to that address.
- * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h)
- * or -ENOMEM on error. If an error is returned, device will switch to
- * XenbusStateClosing and the error message will be saved in XenStore.
- */
-int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
+static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
+                                     int gnt_ref, void **vaddr)
 {
 	struct gnttab_map_grant_ref op = {
 		.flags = GNTMAP_host_map | GNTMAP_contains_pte,
@@ -469,6 +467,64 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
 	*vaddr = area->addr;
 	return 0;
 }
+
+static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
+                                     int gnt_ref, void **vaddr)
+{
+	struct xenbus_map_node *node;
+	int err;
+	void *addr;
+
+	*vaddr = NULL;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	err = alloc_xenballooned_pages(1, &node->page, false);
+	if (err)
+		goto out_err;
+
+	addr = pfn_to_kaddr(page_to_pfn(node->page));
+
+	err = xenbus_map_ring(dev, gnt_ref, &node->handle, addr);
+	if (err)
+		goto out_err;
+
+	spin_lock(&xenbus_valloc_lock);
+	list_add(&node->next, &xenbus_valloc_pages);
+	spin_unlock(&xenbus_valloc_lock);
+
+	*vaddr = addr;
+	return 0;
+
+ out_err:
+	free_xenballooned_pages(1, &node->page);
+	kfree(node);
+	return err;
+}
+
+/**
+ * xenbus_map_ring_valloc
+ * @dev: xenbus device
+ * @gnt_ref: grant reference
+ * @vaddr: pointer to address to be filled out by mapping
+ *
+ * Based on Rusty Russell's skeleton driver's map_page.
+ * Map a page of memory into this domain from another domain's grant table.
+ * xenbus_map_ring_valloc allocates a page of virtual address space, maps the
+ * page to that address, and sets *vaddr to that address.
+ * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h)
+ * or -ENOMEM on error. If an error is returned, device will switch to
+ * XenbusStateClosing and the error message will be saved in XenStore.
+ */
+int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
+{
+	if (xen_pv_domain())
+		return xenbus_map_ring_valloc_pv(dev, gnt_ref, vaddr);
+	else
+		return xenbus_map_ring_valloc_hvm(dev, gnt_ref, vaddr);
+}
 EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc);
 
 
@@ -510,20 +566,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
 }
 EXPORT_SYMBOL_GPL(xenbus_map_ring);
 
-
-/**
- * xenbus_unmap_ring_vfree
- * @dev: xenbus device
- * @vaddr: addr to unmap
- *
- * Based on Rusty Russell's skeleton driver's unmap_page.
- * Unmap a page of memory in this domain that was imported from another domain.
- * Use xenbus_unmap_ring_vfree if you mapped in your memory with
- * xenbus_map_ring_valloc (it will free the virtual address space).
- * Returns 0 on success and returns GNTST_* on error
- * (see xen/include/interface/grant_table.h).
- */
-int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
+static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
 {
 	struct vm_struct *area;
 	struct gnttab_unmap_grant_ref op = {
@@ -566,8 +609,60 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
 
 	return op.status;
 }
-EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
 
+static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
+{
+	int rv;
+	struct xenbus_map_node *node;
+	void *addr;
+
+	spin_lock(&xenbus_valloc_lock);
+	list_for_each_entry(node, &xenbus_valloc_pages, next) {
+		addr = pfn_to_kaddr(page_to_pfn(node->page));
+		if (addr == vaddr) {
+			list_del(&node->next);
+			goto found;
+		}
+	}
+	node = NULL;
+ found:
+	spin_unlock(&xenbus_valloc_lock);
+
+	if (!node) {
+		xenbus_dev_error(dev, -ENOENT,
+				 "can't find mapped virtual address %p", vaddr);
+		return -ENOENT;
+	}
+
+	rv = xenbus_unmap_ring(dev, node->handle, addr);
+
+	if (!rv)
+		free_xenballooned_pages(1, &node->page);
+
+	kfree(node);
+	return rv;
+}
+
+/**
+ * xenbus_unmap_ring_vfree
+ * @dev: xenbus device
+ * @vaddr: addr to unmap
+ *
+ * Based on Rusty Russell's skeleton driver's unmap_page.
+ * Unmap a page of memory in this domain that was imported from another domain.
+ * Use xenbus_unmap_ring_vfree if you mapped in your memory with
+ * xenbus_map_ring_valloc (it will free the virtual address space).
+ * Returns 0 on success and returns GNTST_* on error
+ * (see xen/include/interface/grant_table.h).
+ */
+int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
+{
+	if (xen_pv_domain())
+		return xenbus_unmap_ring_vfree_pv(dev, vaddr);
+	else
+		return xenbus_unmap_ring_vfree_hvm(dev, vaddr);
+}
+EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
 
 /**
  * xenbus_unmap_ring
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 02/12] xenbus: Use grant-table wrapper functions
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 01/12] xenbus: Support HVM backends Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 03/12] xen/grant-table: Support mappings required by blkback Daniel De Graaf
                                             ` (9 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

The gnttab_set_{map,unmap}_op functions should be used instead of
directly populating the fields of gnttab_map_grant_ref.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/xen/xenbus/xenbus_client.c |   17 +++++++----------
 1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 688c4b4..a90f959 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -545,12 +545,10 @@ EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc);
 int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
 		    grant_handle_t *handle, void *vaddr)
 {
-	struct gnttab_map_grant_ref op = {
-		.host_addr = (unsigned long)vaddr,
-		.flags     = GNTMAP_host_map,
-		.ref       = gnt_ref,
-		.dom       = dev->otherend_id,
-	};
+	struct gnttab_map_grant_ref op;
+
+	gnttab_set_map_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, gnt_ref,
+	                  dev->otherend_id);
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &op, 1))
 		BUG();
@@ -677,10 +675,9 @@ EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
 int xenbus_unmap_ring(struct xenbus_device *dev,
 		      grant_handle_t handle, void *vaddr)
 {
-	struct gnttab_unmap_grant_ref op = {
-		.host_addr = (unsigned long)vaddr,
-		.handle    = handle,
-	};
+	struct gnttab_unmap_grant_ref op;
+
+	gnttab_set_unmap_op(&op, (phys_addr_t)vaddr, GNTMAP_host_map, handle);
 
 	if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, &op, 1))
 		BUG();
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 03/12] xen/grant-table: Support mappings required by blkback
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 01/12] xenbus: Support HVM backends Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 02/12] xenbus: Use grant-table wrapper functions Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 04/12] xen/blkback: use grant-table.c hypercall wrappers Daniel De Graaf
                                             ` (8 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

Allow mappings without GNTMAP_contains_pte and allow unmapping to
specify if the PTEs should be cleared.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/xen/gntdev.c      |    3 ++-
 drivers/xen/grant-table.c |   23 ++++-------------------
 include/xen/grant_table.h |    2 +-
 3 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index afca14d..65bff07 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -312,7 +312,8 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
 		}
 	}
 
-	err = gnttab_unmap_refs(map->unmap_ops + offset, map->pages + offset, pages);
+	err = gnttab_unmap_refs(map->unmap_ops + offset, map->pages + offset,
+	                        pages, true);
 	if (err)
 		return err;
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index bf1c094..a02d139 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -472,24 +472,9 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 				(map_ops[i].host_addr & ~PAGE_MASK));
 			mfn = pte_mfn(*pte);
 		} else {
-			/* If you really wanted to do this:
-			 * mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
-			 *
-			 * The reason we do not implement it is b/c on the
-			 * unmap path (gnttab_unmap_refs) we have no means of
-			 * checking whether the page is !GNTMAP_contains_pte.
-			 *
-			 * That is without some extra data-structure to carry
-			 * the struct page, bool clear_pte, and list_head next
-			 * tuples and deal with allocation/delallocation, etc.
-			 *
-			 * The users of this API set the GNTMAP_contains_pte
-			 * flag so lets just return not supported until it
-			 * becomes neccessary to implement.
-			 */
-			return -EOPNOTSUPP;
+			mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
 		}
-		ret = m2p_add_override(mfn, pages[i], &kmap_ops[i]);
+		ret = m2p_add_override(mfn, pages[i], kmap_ops ? &kmap_ops[i] : NULL);
 		if (ret)
 			return ret;
 	}
@@ -499,7 +484,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 EXPORT_SYMBOL_GPL(gnttab_map_refs);
 
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
-		struct page **pages, unsigned int count)
+		struct page **pages, unsigned int count, bool clear_pte)
 {
 	int i, ret;
 
@@ -511,7 +496,7 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 		return ret;
 
 	for (i = 0; i < count; i++) {
-		ret = m2p_remove_override(pages[i], true /* clear the PTE */);
+		ret = m2p_remove_override(pages[i], clear_pte);
 		if (ret)
 			return ret;
 	}
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 11e2dfc..37da54d 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -158,6 +158,6 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 			struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count);
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
-		      struct page **pages, unsigned int count);
+		      struct page **pages, unsigned int count, bool clear_pte);
 
 #endif /* __ASM_GNTTAB_H__ */
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 04/12] xen/blkback: use grant-table.c hypercall wrappers
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (2 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 03/12] xen/grant-table: Support mappings required by blkback Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 05/12] xen/netback: Enable netback on HVM guests Daniel De Graaf
                                             ` (7 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/block/xen-blkback/blkback.c |   29 ++++-------------------------
 1 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 15ec4db..1e256dc 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -324,6 +324,7 @@ struct seg_buf {
 static void xen_blkbk_unmap(struct pending_req *req)
 {
 	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	unsigned int i, invcount = 0;
 	grant_handle_t handle;
 	int ret;
@@ -335,25 +336,12 @@ static void xen_blkbk_unmap(struct pending_req *req)
 		gnttab_set_unmap_op(&unmap[invcount], vaddr(req, i),
 				    GNTMAP_host_map, handle);
 		pending_handle(req, i) = BLKBACK_INVALID_HANDLE;
+		pages[invcount] = virt_to_page(vaddr(req, i));
 		invcount++;
 	}
 
-	ret = HYPERVISOR_grant_table_op(
-		GNTTABOP_unmap_grant_ref, unmap, invcount);
+	ret = gnttab_unmap_refs(unmap, pages, invcount, false);
 	BUG_ON(ret);
-	/*
-	 * Note, we use invcount, so nr->pages, so we can't index
-	 * using vaddr(req, i).
-	 */
-	for (i = 0; i < invcount; i++) {
-		ret = m2p_remove_override(
-			virt_to_page(unmap[i].host_addr), false);
-		if (ret) {
-			pr_alert(DRV_PFX "Failed to remove M2P override for %lx\n",
-				 (unsigned long)unmap[i].host_addr);
-			continue;
-		}
-	}
 }
 
 static int xen_blkbk_map(struct blkif_request *req,
@@ -381,7 +369,7 @@ static int xen_blkbk_map(struct blkif_request *req,
 				  pending_req->blkif->domid);
 	}
 
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map, nseg);
+	ret = gnttab_map_refs(map, NULL, &blkbk->pending_page(pending_req, 0), nseg);
 	BUG_ON(ret);
 
 	/*
@@ -401,15 +389,6 @@ static int xen_blkbk_map(struct blkif_request *req,
 		if (ret)
 			continue;
 
-		ret = m2p_add_override(PFN_DOWN(map[i].dev_bus_addr),
-			blkbk->pending_page(pending_req, i), NULL);
-		if (ret) {
-			pr_alert(DRV_PFX "Failed to install M2P override for %lx (ret: %d)\n",
-				 (unsigned long)map[i].dev_bus_addr, ret);
-			/* We could switch over to GNTTABOP_copy */
-			continue;
-		}
-
 		seg[i].buf  = map[i].dev_bus_addr |
 			(req->u.rw.seg[i].first_sect << 9);
 	}
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 05/12] xen/netback: Enable netback on HVM guests
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (3 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 04/12] xen/blkback: use grant-table.c hypercall wrappers Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 06/12] xen/blkback: Enable blkback " Daniel De Graaf
                                             ` (6 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/net/xen-netback/netback.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 0cb594c..951c713 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1638,7 +1638,7 @@ static int __init netback_init(void)
 	int rc = 0;
 	int group;
 
-	if (!xen_pv_domain())
+	if (!xen_domain())
 		return -ENODEV;
 
 	xen_netbk_group_nr = num_online_cpus();
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 06/12] xen/blkback: Enable blkback on HVM guests
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (4 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 05/12] xen/netback: Enable netback on HVM guests Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 07/12] xen/event: Add reference counting to event channels Daniel De Graaf
                                             ` (5 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/block/xen-blkback/blkback.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 1e256dc..fbffdf0 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -823,7 +823,7 @@ static int __init xen_blkif_init(void)
 	int i, mmap_pages;
 	int rc = 0;
 
-	if (!xen_pv_domain())
+	if (!xen_domain())
 		return -ENODEV;
 
 	blkbk = kzalloc(sizeof(struct xen_blkbk), GFP_KERNEL);
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 07/12] xen/event: Add reference counting to event channels
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (5 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 06/12] xen/blkback: Enable blkback " Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-12-14 18:56                             ` Konrad Rzeszutek Wilk
  2011-11-28 16:49                           ` [PATCH 08/12] xen/gntalloc: Change gref_lock to a mutex Daniel De Graaf
                                             ` (4 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

Event channels exposed to userspace by the evtchn module may be used by
other modules in an asynchronous manner, which requires that reference
counting be used to prevent the event channel from being closed before
the signals are delivered.

The reference count on new event channels defaults to -1 which indicates
the event channel is not referenced outside the kernel; evtchn_get fails
if called on such an event channel. The event channels made visible to
userspace by evtchn have a normal reference count.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/events.c |   74 +++++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/xen/evtchn.c |    2 +-
 include/xen/events.h |    7 +++++
 3 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 6e075cd..a3bcd61 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -87,6 +87,7 @@ enum xen_irq_type {
  */
 struct irq_info {
 	struct list_head list;
+	int refcnt;
 	enum xen_irq_type type;	/* type */
 	unsigned irq;
 	unsigned short evtchn;	/* event channel */
@@ -406,6 +407,7 @@ static void xen_irq_init(unsigned irq)
 		panic("Unable to allocate metadata for IRQ%d\n", irq);
 
 	info->type = IRQT_UNBOUND;
+	info->refcnt = -1;
 
 	irq_set_handler_data(irq, info);
 
@@ -469,6 +471,8 @@ static void xen_free_irq(unsigned irq)
 
 	irq_set_handler_data(irq, NULL);
 
+	WARN_ON(info->refcnt > 0);
+
 	kfree(info);
 
 	/* Legacy IRQ descriptors are managed by the arch. */
@@ -637,7 +641,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
 	if (irq != -1) {
 		printk(KERN_INFO "xen_map_pirq_gsi: returning irq %d for gsi %u\n",
 		       irq, gsi);
-		goto out;	/* XXX need refcount? */
+		goto out;
 	}
 
 	irq = xen_allocate_irq_gsi(gsi);
@@ -939,9 +943,16 @@ static void unbind_from_irq(unsigned int irq)
 {
 	struct evtchn_close close;
 	int evtchn = evtchn_from_irq(irq);
+	struct irq_info *info = irq_get_handler_data(irq);
 
 	mutex_lock(&irq_mapping_update_lock);
 
+	if (info->refcnt > 0) {
+		info->refcnt--;
+		if (info->refcnt != 0)
+			goto done;
+	}
+
 	if (VALID_EVTCHN(evtchn)) {
 		close.port = evtchn;
 		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
@@ -970,6 +981,7 @@ static void unbind_from_irq(unsigned int irq)
 
 	xen_free_irq(irq);
 
+ done:
 	mutex_unlock(&irq_mapping_update_lock);
 }
 
@@ -1065,6 +1077,66 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id)
 }
 EXPORT_SYMBOL_GPL(unbind_from_irqhandler);
 
+int evtchn_make_refcounted(unsigned int evtchn)
+{
+	int irq = evtchn_to_irq[evtchn];
+	struct irq_info *info;
+
+	if (irq == -1)
+		return -ENOENT;
+
+	info = irq_get_handler_data(irq);
+
+	if (!info)
+		return -ENOENT;
+
+	WARN_ON(info->refcnt != -1);
+
+	info->refcnt = 1;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(evtchn_make_refcounted);
+
+int evtchn_get(unsigned int evtchn)
+{
+	int irq;
+	struct irq_info *info;
+	int err = -ENOENT;
+
+	mutex_lock(&irq_mapping_update_lock);
+
+	irq = evtchn_to_irq[evtchn];
+	if (irq == -1)
+		goto done;
+
+	info = irq_get_handler_data(irq);
+
+	if (!info)
+		goto done;
+
+	err = -EINVAL;
+	if (info->refcnt <= 0)
+		goto done;
+
+	info->refcnt++;
+	err = 0;
+ done:
+	mutex_unlock(&irq_mapping_update_lock);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(evtchn_get);
+
+void evtchn_put(unsigned int evtchn)
+{
+	int irq = evtchn_to_irq[evtchn];
+	if (WARN_ON(irq == -1))
+		return;
+	unbind_from_irq(irq);
+}
+EXPORT_SYMBOL_GPL(evtchn_put);
+
 void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector)
 {
 	int irq = per_cpu(ipi_to_irq, cpu)[vector];
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index dbc13e9..b1f60a0 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -268,7 +268,7 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port)
 	rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
 				       u->name, (void *)(unsigned long)port);
 	if (rc >= 0)
-		rc = 0;
+		rc = evtchn_make_refcounted(port);
 
 	return rc;
 }
diff --git a/include/xen/events.h b/include/xen/events.h
index d287997..0f77370 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -37,6 +37,13 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
  */
 void unbind_from_irqhandler(unsigned int irq, void *dev_id);
 
+/*
+ * Allow extra references to event channels exposed to userspace by evtchn
+ */
+int evtchn_make_refcounted(unsigned int evtchn);
+int evtchn_get(unsigned int evtchn);
+void evtchn_put(unsigned int evtchn);
+
 void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector);
 int resend_irq_on_evtchn(unsigned int irq);
 void rebind_evtchn_irq(int evtchn, int irq);
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 08/12] xen/gntalloc: Change gref_lock to a mutex
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (6 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 07/12] xen/event: Add reference counting to event channels Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 09/12] xen/gnt{dev, alloc}: reserve event channels for notify Daniel De Graaf
                                             ` (3 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

The event channel release function cannot be called under a spinlock
because it can attempt to acquire a mutex due to the event channel
reference acquired when setting up unmap notifications.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/gntalloc.c |   41 +++++++++++++++++++++--------------------
 1 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index e1c4c6e..2a7e9f8 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -74,7 +74,7 @@ MODULE_PARM_DESC(limit, "Maximum number of grants that may be allocated by "
 		"the gntalloc device");
 
 static LIST_HEAD(gref_list);
-static DEFINE_SPINLOCK(gref_lock);
+static DEFINE_MUTEX(gref_mutex);
 static int gref_size;
 
 struct notify_info {
@@ -143,15 +143,15 @@ static int add_grefs(struct ioctl_gntalloc_alloc_gref *op,
 	}
 
 	/* Add to gref lists. */
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	list_splice_tail(&queue_gref, &gref_list);
 	list_splice_tail(&queue_file, &priv->list);
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 
 	return 0;
 
 undo:
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	gref_size -= (op->count - i);
 
 	list_for_each_entry(gref, &queue_file, next_file) {
@@ -167,7 +167,7 @@ undo:
 	 */
 	if (unlikely(!list_empty(&queue_gref)))
 		list_splice_tail(&queue_gref, &gref_list);
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 	return rc;
 }
 
@@ -251,7 +251,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp)
 
 	pr_debug("%s: priv %p\n", __func__, priv);
 
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	while (!list_empty(&priv->list)) {
 		gref = list_entry(priv->list.next,
 			struct gntalloc_gref, next_file);
@@ -261,7 +261,7 @@ static int gntalloc_release(struct inode *inode, struct file *filp)
 			__del_gref(gref);
 	}
 	kfree(priv);
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 
 	return 0;
 }
@@ -286,21 +286,21 @@ static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv,
 		goto out;
 	}
 
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	/* Clean up pages that were at zero (local) users but were still mapped
 	 * by remote domains. Since those pages count towards the limit that we
 	 * are about to enforce, removing them here is a good idea.
 	 */
 	do_cleanup();
 	if (gref_size + op.count > limit) {
-		spin_unlock(&gref_lock);
+		mutex_unlock(&gref_mutex);
 		rc = -ENOSPC;
 		goto out_free;
 	}
 	gref_size += op.count;
 	op.index = priv->index;
 	priv->index += op.count * PAGE_SIZE;
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 
 	rc = add_grefs(&op, gref_ids, priv);
 	if (rc < 0)
@@ -343,7 +343,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv,
 		goto dealloc_grant_out;
 	}
 
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	gref = find_grefs(priv, op.index, op.count);
 	if (gref) {
 		/* Remove from the file list only, and decrease reference count.
@@ -363,7 +363,7 @@ static long gntalloc_ioctl_dealloc(struct gntalloc_file_private_data *priv,
 
 	do_cleanup();
 
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 dealloc_grant_out:
 	return rc;
 }
@@ -383,7 +383,7 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv,
 	index = op.index & ~(PAGE_SIZE - 1);
 	pgoff = op.index & (PAGE_SIZE - 1);
 
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 
 	gref = find_grefs(priv, index, 1);
 	if (!gref) {
@@ -400,8 +400,9 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv,
 	gref->notify.pgoff = pgoff;
 	gref->notify.event = op.event_channel_port;
 	rc = 0;
+
  unlock_out:
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 	return rc;
 }
 
@@ -433,9 +434,9 @@ static void gntalloc_vma_open(struct vm_area_struct *vma)
 	if (!gref)
 		return;
 
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	gref->users++;
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 }
 
 static void gntalloc_vma_close(struct vm_area_struct *vma)
@@ -444,11 +445,11 @@ static void gntalloc_vma_close(struct vm_area_struct *vma)
 	if (!gref)
 		return;
 
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	gref->users--;
 	if (gref->users == 0)
 		__del_gref(gref);
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 }
 
 static struct vm_operations_struct gntalloc_vmops = {
@@ -471,7 +472,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
 		return -EINVAL;
 	}
 
-	spin_lock(&gref_lock);
+	mutex_lock(&gref_mutex);
 	gref = find_grefs(priv, vma->vm_pgoff << PAGE_SHIFT, count);
 	if (gref == NULL) {
 		rv = -ENOENT;
@@ -499,7 +500,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
 	rv = 0;
 
 out_unlock:
-	spin_unlock(&gref_lock);
+	mutex_unlock(&gref_mutex);
 	return rv;
 }
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 09/12] xen/gnt{dev, alloc}: reserve event channels for notify
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (7 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 08/12] xen/gntalloc: Change gref_lock to a mutex Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 10/12] xen/events: prevent calling evtchn_get on invalid channels Daniel De Graaf
                                             ` (2 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

When using the unmap notify ioctl, the event channel used for
notification needs to be reserved to avoid it being deallocated prior to
sending the notification.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/gntalloc.c |   21 ++++++++++++++++++++-
 drivers/xen/gntdev.c   |   31 ++++++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 2a7e9f8..60eee4e 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -178,8 +178,10 @@ static void __del_gref(struct gntalloc_gref *gref)
 		tmp[gref->notify.pgoff] = 0;
 		kunmap(gref->page);
 	}
-	if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT)
+	if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) {
 		notify_remote_via_evtchn(gref->notify.event);
+		evtchn_put(gref->notify.event);
+	}
 
 	gref->notify.flags = 0;
 
@@ -396,6 +398,23 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv,
 		goto unlock_out;
 	}
 
+	/* We need to grab a reference to the event channel we are going to use
+	 * to send the notify before releasing the reference we may already have
+	 * (if someone has called this ioctl twice). This is required so that
+	 * it is possible to change the clear_byte part of the notification
+	 * without disturbing the event channel part, which may now be the last
+	 * reference to that event channel.
+	 */
+	if (op.action & UNMAP_NOTIFY_SEND_EVENT) {
+		if (evtchn_get(op.event_channel_port)) {
+			rc = -EINVAL;
+			goto unlock_out;
+		}
+	}
+
+	if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT)
+		evtchn_put(gref->notify.event);
+
 	gref->notify.flags = op.action;
 	gref->notify.pgoff = pgoff;
 	gref->notify.event = op.event_channel_port;
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 65bff07..4e189f5 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -193,8 +193,10 @@ static void gntdev_put_map(struct grant_map *map)
 
 	atomic_sub(map->count, &pages_mapped);
 
-	if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT)
+	if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) {
 		notify_remote_via_evtchn(map->notify.event);
+		evtchn_put(map->notify.event);
+	}
 
 	if (map->pages) {
 		if (!use_ptemod)
@@ -600,6 +602,8 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
 	struct ioctl_gntdev_unmap_notify op;
 	struct grant_map *map;
 	int rc;
+	int out_flags;
+	unsigned int out_event;
 
 	if (copy_from_user(&op, u, sizeof(op)))
 		return -EFAULT;
@@ -607,6 +611,21 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
 	if (op.action & ~(UNMAP_NOTIFY_CLEAR_BYTE|UNMAP_NOTIFY_SEND_EVENT))
 		return -EINVAL;
 
+	/* We need to grab a reference to the event channel we are going to use
+	 * to send the notify before releasing the reference we may already have
+	 * (if someone has called this ioctl twice). This is required so that
+	 * it is possible to change the clear_byte part of the notification
+	 * without disturbing the event channel part, which may now be the last
+	 * reference to that event channel.
+	 */
+	if (op.action & UNMAP_NOTIFY_SEND_EVENT) {
+		if (evtchn_get(op.event_channel_port))
+			return -EINVAL;
+	}
+
+	out_flags = op.action;
+	out_event = op.event_channel_port;
+
 	spin_lock(&priv->lock);
 
 	list_for_each_entry(map, &priv->maps, next) {
@@ -625,12 +644,22 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
 		goto unlock_out;
 	}
 
+	out_flags = map->notify.flags;
+	out_event = map->notify.event;
+
 	map->notify.flags = op.action;
 	map->notify.addr = op.index - (map->index << PAGE_SHIFT);
 	map->notify.event = op.event_channel_port;
+
 	rc = 0;
+
  unlock_out:
 	spin_unlock(&priv->lock);
+
+	/* Drop the reference to the event channel we did not save in the map */
+	if (out_flags & UNMAP_NOTIFY_SEND_EVENT)
+		evtchn_put(out_event);
+
 	return rc;
 }
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 10/12] xen/events: prevent calling evtchn_get on invalid channels
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (8 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 09/12] xen/gnt{dev, alloc}: reserve event channels for notify Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 11/12] xen/gntalloc: release grant references on page free Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 12/12] xen/gntalloc: fix reference counts on multi-page mappings Daniel De Graaf
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

The event channel number provided to evtchn_get can be provided by
userspace, so needs to be checked against the maximum number of event
channels prior to using it to index into evtchn_to_irq.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/xen/events.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index a3bcd61..e5e5812 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1104,6 +1104,9 @@ int evtchn_get(unsigned int evtchn)
 	struct irq_info *info;
 	int err = -ENOENT;
 
+	if (evtchn >= NR_EVENT_CHANNELS)
+		return -EINVAL;
+
 	mutex_lock(&irq_mapping_update_lock);
 
 	irq = evtchn_to_irq[evtchn];
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 11/12] xen/gntalloc: release grant references on page free
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (9 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 10/12] xen/events: prevent calling evtchn_get on invalid channels Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  2011-11-28 16:49                           ` [PATCH 12/12] xen/gntalloc: fix reference counts on multi-page mappings Daniel De Graaf
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

gnttab_end_foreign_access_ref does not return the grant reference it is
passed to the free list; gnttab_free_grant_reference needs to be
explicitly called. While gnttab_end_foreign_access provides a wrapper
for this, it is unsuitable because it does not return errors.

Reported-by: Anil Madhavapeddy <anil@recoil.org>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/xen/gntalloc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 60eee4e..7b936cc 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -191,6 +191,8 @@ static void __del_gref(struct gntalloc_gref *gref)
 
 		if (!gnttab_end_foreign_access_ref(gref->gref_id, 0))
 			return;
+
+		gnttab_free_grant_reference(gref->gref_id);
 	}
 
 	gref_size--;
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 12/12] xen/gntalloc: fix reference counts on multi-page mappings
  2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
                                             ` (10 preceding siblings ...)
  2011-11-28 16:49                           ` [PATCH 11/12] xen/gntalloc: release grant references on page free Daniel De Graaf
@ 2011-11-28 16:49                           ` Daniel De Graaf
  11 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-11-28 16:49 UTC (permalink / raw)
  To: konrad.wilk
  Cc: Daniel De Graaf, xen-devel, david.vrabel, JBeulich, Ian.Campbell

When a multi-page mapping of gntalloc is created, the reference counts
of all pages in the vma are incremented. However, the vma open/close
operations only adjusted the reference count of the first page in the
mapping, leaking the other pages. Store a struct in the vm_private_data
to track the original page count to properly free the pages when the
last reference to the vma is closed.

Reported-by: Anil Madhavapeddy <anil@recoil.org>
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 drivers/xen/gntalloc.c |   56 ++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 7b936cc..e2400c8 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -99,6 +99,12 @@ struct gntalloc_file_private_data {
 	uint64_t index;
 };
 
+struct gntalloc_vma_private_data {
+	struct gntalloc_gref *gref;
+	int users;
+	int count;
+};
+
 static void __del_gref(struct gntalloc_gref *gref);
 
 static void do_cleanup(void)
@@ -451,25 +457,39 @@ static long gntalloc_ioctl(struct file *filp, unsigned int cmd,
 
 static void gntalloc_vma_open(struct vm_area_struct *vma)
 {
-	struct gntalloc_gref *gref = vma->vm_private_data;
-	if (!gref)
+	struct gntalloc_vma_private_data *priv = vma->vm_private_data;
+
+	if (!priv)
 		return;
 
 	mutex_lock(&gref_mutex);
-	gref->users++;
+	priv->users++;
 	mutex_unlock(&gref_mutex);
 }
 
 static void gntalloc_vma_close(struct vm_area_struct *vma)
 {
-	struct gntalloc_gref *gref = vma->vm_private_data;
-	if (!gref)
+	struct gntalloc_vma_private_data *priv = vma->vm_private_data;
+	struct gntalloc_gref *gref, *next;
+	int i;
+
+	if (!priv)
 		return;
 
 	mutex_lock(&gref_mutex);
-	gref->users--;
-	if (gref->users == 0)
-		__del_gref(gref);
+	priv->users--;
+	if (priv->users == 0) {
+		gref = priv->gref;
+		for (i = 0; i < priv->count; i++) {
+			gref->users--;
+			next = list_entry(gref->next_gref.next,
+					  struct gntalloc_gref, next_gref);
+			if (gref->users == 0)
+				__del_gref(gref);
+			gref = next;
+		}
+		kfree(priv);
+	}
 	mutex_unlock(&gref_mutex);
 }
 
@@ -481,19 +501,25 @@ static struct vm_operations_struct gntalloc_vmops = {
 static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct gntalloc_file_private_data *priv = filp->private_data;
+	struct gntalloc_vma_private_data *vm_priv;
 	struct gntalloc_gref *gref;
 	int count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 	int rv, i;
 
-	pr_debug("%s: priv %p, page %lu+%d\n", __func__,
-		       priv, vma->vm_pgoff, count);
-
 	if (!(vma->vm_flags & VM_SHARED)) {
 		printk(KERN_ERR "%s: Mapping must be shared.\n", __func__);
 		return -EINVAL;
 	}
 
+	vm_priv = kmalloc(sizeof(*vm_priv), GFP_KERNEL);
+	if (!vm_priv)
+		return -ENOMEM;
+
 	mutex_lock(&gref_mutex);
+
+	pr_debug("%s: priv %p,%p, page %lu+%d\n", __func__,
+		       priv, vm_priv, vma->vm_pgoff, count);
+
 	gref = find_grefs(priv, vma->vm_pgoff << PAGE_SHIFT, count);
 	if (gref == NULL) {
 		rv = -ENOENT;
@@ -502,9 +528,13 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)
 		goto out_unlock;
 	}
 
-	vma->vm_private_data = gref;
+	vm_priv->gref = gref;
+	vm_priv->users = 1;
+	vm_priv->count = count;
+
+	vma->vm_private_data = vm_priv;
 
-	vma->vm_flags |= VM_RESERVED;
+	vma->vm_flags |= VM_RESERVED | VM_DONTEXPAND;
 
 	vma->vm_ops = &gntalloc_vmops;
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/12] xen/event: Add reference counting to event channels
  2011-11-28 16:49                           ` [PATCH 07/12] xen/event: Add reference counting to event channels Daniel De Graaf
@ 2011-12-14 18:56                             ` Konrad Rzeszutek Wilk
  2011-12-14 19:52                               ` Daniel De Graaf
  0 siblings, 1 reply; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-14 18:56 UTC (permalink / raw)
  To: Daniel De Graaf; +Cc: xen-devel, david.vrabel, JBeulich, Ian.Campbell

On Mon, Nov 28, 2011 at 11:49:06AM -0500, Daniel De Graaf wrote:
> Event channels exposed to userspace by the evtchn module may be used by
> other modules in an asynchronous manner, which requires that reference
> counting be used to prevent the event channel from being closed before
> the signals are delivered.
> 
> The reference count on new event channels defaults to -1 which indicates
> the event channel is not referenced outside the kernel; evtchn_get fails
> if called on such an event channel. The event channels made visible to
> userspace by evtchn have a normal reference count.

So I've 7 through 12 in my branch now. (7,8 and 9 are in the #linux-next
and the 10-12 are in the #testing).

The other 1-6 you have might need to be parceled out (especially the netback
one which usually goes through David Miller). Also I have to double-check -
but did somebody review those 1-6 patches?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 01/12] xenbus: Support HVM backends
  2011-11-28 16:49                           ` [PATCH 01/12] xenbus: Support HVM backends Daniel De Graaf
@ 2011-12-14 19:03                             ` Konrad Rzeszutek Wilk
  2011-12-14 19:53                               ` Daniel De Graaf
  0 siblings, 1 reply; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-14 19:03 UTC (permalink / raw)
  To: Daniel De Graaf; +Cc: xen-devel, david.vrabel, JBeulich, Ian.Campbell

On Mon, Nov 28, 2011 at 11:49:00AM -0500, Daniel De Graaf wrote:
> Add HVM implementations of xenbus_(map,unmap)_ring_v(alloc,free) so
> that ring mappings can be done without using GNTMAP_contains_pte which
> is not supported on HVM.


So what else besides these patches should I do to load the blkback/netback
drivers in a HVM domain? There are some xen toolstack patches patches I presume?
Can you tell me what the c/s are (if any?)

Thanks!
> 
> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> ---
>  drivers/xen/xenbus/xenbus_client.c |  155 +++++++++++++++++++++++++++++-------
>  1 files changed, 125 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index 1906125..688c4b4 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -32,16 +32,27 @@
>  
>  #include <linux/slab.h>
>  #include <linux/types.h>
> +#include <linux/spinlock.h>
>  #include <linux/vmalloc.h>
>  #include <linux/export.h>
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/page.h>
>  #include <xen/interface/xen.h>
>  #include <xen/interface/event_channel.h>
> +#include <xen/balloon.h>
>  #include <xen/events.h>
>  #include <xen/grant_table.h>
>  #include <xen/xenbus.h>
>  
> +struct xenbus_map_node {
> +	struct list_head next;
> +	struct page *page;
> +	grant_handle_t handle;
> +};
> +
> +static DEFINE_SPINLOCK(xenbus_valloc_lock);
> +static LIST_HEAD(xenbus_valloc_pages);
> +
>  const char *xenbus_strstate(enum xenbus_state state)
>  {
>  	static const char *const name[] = {
> @@ -420,21 +431,8 @@ int xenbus_free_evtchn(struct xenbus_device *dev, int port)
>  EXPORT_SYMBOL_GPL(xenbus_free_evtchn);
>  
>  
> -/**
> - * xenbus_map_ring_valloc
> - * @dev: xenbus device
> - * @gnt_ref: grant reference
> - * @vaddr: pointer to address to be filled out by mapping
> - *
> - * Based on Rusty Russell's skeleton driver's map_page.
> - * Map a page of memory into this domain from another domain's grant table.
> - * xenbus_map_ring_valloc allocates a page of virtual address space, maps the
> - * page to that address, and sets *vaddr to that address.
> - * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h)
> - * or -ENOMEM on error. If an error is returned, device will switch to
> - * XenbusStateClosing and the error message will be saved in XenStore.
> - */
> -int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
> +static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
> +                                     int gnt_ref, void **vaddr)
>  {
>  	struct gnttab_map_grant_ref op = {
>  		.flags = GNTMAP_host_map | GNTMAP_contains_pte,
> @@ -469,6 +467,64 @@ int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
>  	*vaddr = area->addr;
>  	return 0;
>  }
> +
> +static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
> +                                     int gnt_ref, void **vaddr)
> +{
> +	struct xenbus_map_node *node;
> +	int err;
> +	void *addr;
> +
> +	*vaddr = NULL;
> +
> +	node = kzalloc(sizeof(*node), GFP_KERNEL);
> +	if (!node)
> +		return -ENOMEM;
> +
> +	err = alloc_xenballooned_pages(1, &node->page, false);
> +	if (err)
> +		goto out_err;
> +
> +	addr = pfn_to_kaddr(page_to_pfn(node->page));
> +
> +	err = xenbus_map_ring(dev, gnt_ref, &node->handle, addr);
> +	if (err)
> +		goto out_err;
> +
> +	spin_lock(&xenbus_valloc_lock);
> +	list_add(&node->next, &xenbus_valloc_pages);
> +	spin_unlock(&xenbus_valloc_lock);
> +
> +	*vaddr = addr;
> +	return 0;
> +
> + out_err:
> +	free_xenballooned_pages(1, &node->page);
> +	kfree(node);
> +	return err;
> +}
> +
> +/**
> + * xenbus_map_ring_valloc
> + * @dev: xenbus device
> + * @gnt_ref: grant reference
> + * @vaddr: pointer to address to be filled out by mapping
> + *
> + * Based on Rusty Russell's skeleton driver's map_page.
> + * Map a page of memory into this domain from another domain's grant table.
> + * xenbus_map_ring_valloc allocates a page of virtual address space, maps the
> + * page to that address, and sets *vaddr to that address.
> + * Returns 0 on success, and GNTST_* (see xen/include/interface/grant_table.h)
> + * or -ENOMEM on error. If an error is returned, device will switch to
> + * XenbusStateClosing and the error message will be saved in XenStore.
> + */
> +int xenbus_map_ring_valloc(struct xenbus_device *dev, int gnt_ref, void **vaddr)
> +{
> +	if (xen_pv_domain())
> +		return xenbus_map_ring_valloc_pv(dev, gnt_ref, vaddr);
> +	else
> +		return xenbus_map_ring_valloc_hvm(dev, gnt_ref, vaddr);
> +}
>  EXPORT_SYMBOL_GPL(xenbus_map_ring_valloc);
>  
>  
> @@ -510,20 +566,7 @@ int xenbus_map_ring(struct xenbus_device *dev, int gnt_ref,
>  }
>  EXPORT_SYMBOL_GPL(xenbus_map_ring);
>  
> -
> -/**
> - * xenbus_unmap_ring_vfree
> - * @dev: xenbus device
> - * @vaddr: addr to unmap
> - *
> - * Based on Rusty Russell's skeleton driver's unmap_page.
> - * Unmap a page of memory in this domain that was imported from another domain.
> - * Use xenbus_unmap_ring_vfree if you mapped in your memory with
> - * xenbus_map_ring_valloc (it will free the virtual address space).
> - * Returns 0 on success and returns GNTST_* on error
> - * (see xen/include/interface/grant_table.h).
> - */
> -int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
> +static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
>  {
>  	struct vm_struct *area;
>  	struct gnttab_unmap_grant_ref op = {
> @@ -566,8 +609,60 @@ int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
>  
>  	return op.status;
>  }
> -EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
>  
> +static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
> +{
> +	int rv;
> +	struct xenbus_map_node *node;
> +	void *addr;
> +
> +	spin_lock(&xenbus_valloc_lock);
> +	list_for_each_entry(node, &xenbus_valloc_pages, next) {
> +		addr = pfn_to_kaddr(page_to_pfn(node->page));
> +		if (addr == vaddr) {
> +			list_del(&node->next);
> +			goto found;
> +		}
> +	}
> +	node = NULL;
> + found:
> +	spin_unlock(&xenbus_valloc_lock);
> +
> +	if (!node) {
> +		xenbus_dev_error(dev, -ENOENT,
> +				 "can't find mapped virtual address %p", vaddr);
> +		return -ENOENT;
> +	}
> +
> +	rv = xenbus_unmap_ring(dev, node->handle, addr);
> +
> +	if (!rv)
> +		free_xenballooned_pages(1, &node->page);
> +
> +	kfree(node);
> +	return rv;
> +}
> +
> +/**
> + * xenbus_unmap_ring_vfree
> + * @dev: xenbus device
> + * @vaddr: addr to unmap
> + *
> + * Based on Rusty Russell's skeleton driver's unmap_page.
> + * Unmap a page of memory in this domain that was imported from another domain.
> + * Use xenbus_unmap_ring_vfree if you mapped in your memory with
> + * xenbus_map_ring_valloc (it will free the virtual address space).
> + * Returns 0 on success and returns GNTST_* on error
> + * (see xen/include/interface/grant_table.h).
> + */
> +int xenbus_unmap_ring_vfree(struct xenbus_device *dev, void *vaddr)
> +{
> +	if (xen_pv_domain())
> +		return xenbus_unmap_ring_vfree_pv(dev, vaddr);
> +	else
> +		return xenbus_unmap_ring_vfree_hvm(dev, vaddr);
> +}
> +EXPORT_SYMBOL_GPL(xenbus_unmap_ring_vfree);
>  
>  /**
>   * xenbus_unmap_ring
> -- 
> 1.7.7.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/12] xen/event: Add reference counting to event channels
  2011-12-14 18:56                             ` Konrad Rzeszutek Wilk
@ 2011-12-14 19:52                               ` Daniel De Graaf
  2011-12-16 19:35                                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 54+ messages in thread
From: Daniel De Graaf @ 2011-12-14 19:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, david.vrabel, JBeulich, Ian.Campbell

On 12/14/2011 01:56 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 28, 2011 at 11:49:06AM -0500, Daniel De Graaf wrote:
>> Event channels exposed to userspace by the evtchn module may be used by
>> other modules in an asynchronous manner, which requires that reference
>> counting be used to prevent the event channel from being closed before
>> the signals are delivered.
>>
>> The reference count on new event channels defaults to -1 which indicates
>> the event channel is not referenced outside the kernel; evtchn_get fails
>> if called on such an event channel. The event channels made visible to
>> userspace by evtchn have a normal reference count.
> 
> So I've 7 through 12 in my branch now. (7,8 and 9 are in the #linux-next
> and the 10-12 are in the #testing).
> 
> The other 1-6 you have might need to be parceled out (especially the netback
> one which usually goes through David Miller). Also I have to double-check -
> but did somebody review those 1-6 patches?
> 

The last review I've seen of these patches is this thread:

http://lists.xen.org/archives/html/xen-devel/2011-10/threads.html#01484

I'll post a v3 for those in a bit incorporating the comments in that
thread; the netback one was (unofficially?) acked by David Miller. I don't
see which of the other patches need to be parceled out as the files they
touch look like they are still being committed with only Xen developer
sign-offs.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 01/12] xenbus: Support HVM backends
  2011-12-14 19:03                             ` Konrad Rzeszutek Wilk
@ 2011-12-14 19:53                               ` Daniel De Graaf
  0 siblings, 0 replies; 54+ messages in thread
From: Daniel De Graaf @ 2011-12-14 19:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, david.vrabel, JBeulich, Ian.Campbell

On 12/14/2011 02:03 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 28, 2011 at 11:49:00AM -0500, Daniel De Graaf wrote:
>> Add HVM implementations of xenbus_(map,unmap)_ring_v(alloc,free) so
>> that ring mappings can be done without using GNTMAP_contains_pte which
>> is not supported on HVM.
> 
> 
> So what else besides these patches should I do to load the blkback/netback
> drivers in a HVM domain? There are some xen toolstack patches patches I presume?
> Can you tell me what the c/s are (if any?)
> 
> Thanks!

No toolstack changes are required for netback; the interface simply shows
up in the guest the same as it does in a PV domain. The same is true for
blkback, but the toolstack does not currently support specifying backend
domains (PV or HVM) for block devices. I posted a patch earlier [1] adding
this support, but the conversion from block device name to major/minor
number is done in the toolstack's domain, not the domain exporting the
block device.

[1] http://lists.xen.org/archives/html/xen-devel/2011-10/threads.html#00818

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/12] xen/event: Add reference counting to event channels
  2011-12-14 19:52                               ` Daniel De Graaf
@ 2011-12-16 19:35                                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 54+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-16 19:35 UTC (permalink / raw)
  To: Daniel De Graaf
  Cc: Ian.Campbell, xen-devel, david.vrabel, JBeulich, Konrad Rzeszutek Wilk

On Wed, Dec 14, 2011 at 02:52:00PM -0500, Daniel De Graaf wrote:
> On 12/14/2011 01:56 PM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 28, 2011 at 11:49:06AM -0500, Daniel De Graaf wrote:
> >> Event channels exposed to userspace by the evtchn module may be used by
> >> other modules in an asynchronous manner, which requires that reference
> >> counting be used to prevent the event channel from being closed before
> >> the signals are delivered.
> >>
> >> The reference count on new event channels defaults to -1 which indicates
> >> the event channel is not referenced outside the kernel; evtchn_get fails
> >> if called on such an event channel. The event channels made visible to
> >> userspace by evtchn have a normal reference count.
> > 
> > So I've 7 through 12 in my branch now. (7,8 and 9 are in the #linux-next
> > and the 10-12 are in the #testing).
> > 
> > The other 1-6 you have might need to be parceled out (especially the netback
> > one which usually goes through David Miller). Also I have to double-check -
> > but did somebody review those 1-6 patches?
> > 
> 
> The last review I've seen of these patches is this thread:
> 
> http://lists.xen.org/archives/html/xen-devel/2011-10/threads.html#01484
> 
> I'll post a v3 for those in a bit incorporating the comments in that
> thread; the netback one was (unofficially?) acked by David Miller. I don't

Yup. I see it now. Thx
> see which of the other patches need to be parceled out as the files they
> touch look like they are still being committed with only Xen developer
> sign-offs.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2011-12-16 19:35 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01 11:51 [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area() David Vrabel
2011-09-01 16:11 ` [Revert] " Konrad Rzeszutek Wilk
2011-09-01 16:11   ` Konrad Rzeszutek Wilk
2011-09-01 20:37   ` Jeremy Fitzhardinge
2011-09-01 20:37     ` Jeremy Fitzhardinge
2011-09-01 21:17     ` Andrew Morton
2011-09-01 21:17       ` Andrew Morton
2011-09-02 11:39       ` David Vrabel
2011-09-02 11:39         ` David Vrabel
2011-09-02 11:39         ` David Vrabel
2011-09-02 22:32         ` Andrew Morton
2011-09-02 22:32           ` Andrew Morton
2011-09-02 23:04           ` Jeremy Fitzhardinge
2011-09-02 23:04             ` Jeremy Fitzhardinge
2011-09-06 16:35           ` [Xen-devel] " Konrad Rzeszutek Wilk
2011-09-06 16:35             ` Konrad Rzeszutek Wilk
2011-11-05 13:38             ` Konrad Rzeszutek Wilk
2011-11-05 13:38               ` Konrad Rzeszutek Wilk
2011-11-05 13:38               ` Konrad Rzeszutek Wilk
2011-11-07 20:36               ` Konrad Rzeszutek Wilk
2011-11-07 20:36                 ` Konrad Rzeszutek Wilk
2011-11-08 23:01                 ` Andrew Morton
2011-11-08 23:01                   ` Andrew Morton
2011-11-08 23:31                   ` Konrad Rzeszutek Wilk
2011-11-08 23:31                     ` Konrad Rzeszutek Wilk
2011-11-08 23:36                     ` Andrew Morton
2011-11-08 23:36                       ` Andrew Morton
2011-11-28  9:36                 ` Jan Beulich
2011-11-28 10:19                   ` Ian Campbell
2011-11-28 11:36                     ` David Vrabel
2011-11-28 11:48                       ` Ian Campbell
2011-11-28 15:20                       ` Konrad Rzeszutek Wilk
2011-11-28 16:48                         ` [PATCH 00/12] HVM backends and gnt/event fixups Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 01/12] xenbus: Support HVM backends Daniel De Graaf
2011-12-14 19:03                             ` Konrad Rzeszutek Wilk
2011-12-14 19:53                               ` Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 02/12] xenbus: Use grant-table wrapper functions Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 03/12] xen/grant-table: Support mappings required by blkback Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 04/12] xen/blkback: use grant-table.c hypercall wrappers Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 05/12] xen/netback: Enable netback on HVM guests Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 06/12] xen/blkback: Enable blkback " Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 07/12] xen/event: Add reference counting to event channels Daniel De Graaf
2011-12-14 18:56                             ` Konrad Rzeszutek Wilk
2011-12-14 19:52                               ` Daniel De Graaf
2011-12-16 19:35                                 ` Konrad Rzeszutek Wilk
2011-11-28 16:49                           ` [PATCH 08/12] xen/gntalloc: Change gref_lock to a mutex Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 09/12] xen/gnt{dev, alloc}: reserve event channels for notify Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 10/12] xen/events: prevent calling evtchn_get on invalid channels Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 11/12] xen/gntalloc: release grant references on page free Daniel De Graaf
2011-11-28 16:49                           ` [PATCH 12/12] xen/gntalloc: fix reference counts on multi-page mappings Daniel De Graaf
2011-09-02  7:22     ` [Xen-devel] Re: [Revert] Re: [PATCH] mm: sync vmalloc address space page tables in alloc_vm_area() Ian Campbell
2011-09-02  7:22       ` Ian Campbell
2011-09-02  7:31       ` Keir Fraser
2011-09-02  7:31         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.