All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv1 0/2] mm: infrastructure for correctly handling foreign pages on Xen
@ 2015-01-08 15:28 ` David Vrabel
  0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: David Vrabel, linux-mm, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

These two patches are the common parts of a larger Xen series[1]
fixing several long-standing bugs the handling of foreign[2] pages in
Xen guests.

The first patch is required to fix get_user_pages[_fast]() with
userspace space mappings of such foreign pages.  Basically, pte_page()
doesn't work so an alternate mechanism is needed to get the page from
a VMA and address.  By requiring mappings needing this method are
'special' this should not have an impact on the common use cases.

The second patch isn't essential but helps with readability of the
resulting user of the page flag.

For further background reading see:

  http://xenbits.xen.org/people/dvrabel/grant-improvements-C.pdf

David

[1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg00326.html

[2] Another guest's page temporarily granted to this guest.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv1 0/2] mm: infrastructure for correctly handling foreign pages on Xen
@ 2015-01-08 15:28 ` David Vrabel
  0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: David Vrabel, linux-mm, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

These two patches are the common parts of a larger Xen series[1]
fixing several long-standing bugs the handling of foreign[2] pages in
Xen guests.

The first patch is required to fix get_user_pages[_fast]() with
userspace space mappings of such foreign pages.  Basically, pte_page()
doesn't work so an alternate mechanism is needed to get the page from
a VMA and address.  By requiring mappings needing this method are
'special' this should not have an impact on the common use cases.

The second patch isn't essential but helps with readability of the
resulting user of the page flag.

For further background reading see:

  http://xenbits.xen.org/people/dvrabel/grant-improvements-C.pdf

David

[1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg00326.html

[2] Another guest's page temporarily granted to this guest.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
  2015-01-08 15:28 ` David Vrabel
@ 2015-01-08 15:28   ` David Vrabel
  -1 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: David Vrabel, linux-mm, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

Add an optional array of pages to struct vm_area_struct that can be
used find the page backing a VMA.  This is useful in cases where the
normal mechanisms for finding the page don't work.  This array is only
inspected if the PTE is special.

Splitting a VMA with such an array of pages is trivially done by
adjusting vma->pages.  The original creator of the VMA must only free
the page array once all sub-VMAs are closed (e.g., by ref-counting in
vm_ops->open and vm_ops->close).

One use case is a Xen PV guest mapping foreign pages into userspace.

In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
example) must do an MFN to PFN (M2P) lookup before it can get the
page.  For foreign pages (those owned by another guest) the M2P lookup
returns the PFN as seen by the foreign guest (which would be
completely the wrong page for the local guest).

This cannot be fixed up improving the M2P lookup since one MFN may be
mapped onto two or more pages so getting the right page is impossible
given just the MFN.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/mm_types.h |    8 ++++++++
 mm/memory.c              |    2 ++
 mm/mmap.c                |   12 +++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6d34aa2..4f34609 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -309,6 +309,14 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA
 	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
 #endif
+	/*
+	 * Array of pages to override the default vm_normal_page()
+	 * result iff the PTE is special.
+	 *
+	 * The memory for this should be refcounted in vm_ops->open
+	 * and vm_ops->close.
+	 */
+	struct page **pages;
 };
 
 struct core_thread {
diff --git a/mm/memory.c b/mm/memory.c
index ca920d1..98520f6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -754,6 +754,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	if (HAVE_PTE_SPECIAL) {
 		if (likely(!pte_special(pte)))
 			goto check_pfn;
+		if (vma->pages)
+			return vma->pages[(addr - vma->vm_start) >> PAGE_SHIFT];
 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
 			return NULL;
 		if (!is_zero_pfn(pfn))
diff --git a/mm/mmap.c b/mm/mmap.c
index 7b36aa7..504dc5c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2448,6 +2448,7 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	      unsigned long addr, int new_below)
 {
 	struct vm_area_struct *new;
+	unsigned long delta;
 	int err = -ENOMEM;
 
 	if (is_vm_hugetlb_page(vma) && (addr &
@@ -2463,11 +2464,20 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	INIT_LIST_HEAD(&new->anon_vma_chain);
 
+	delta = (addr - vma->vm_start) >> PAGE_SHIFT;
+
 	if (new_below)
 		new->vm_end = addr;
 	else {
 		new->vm_start = addr;
-		new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
+		new->vm_pgoff += delta;
+	}
+
+	if (vma->pages) {
+		if (new_below)
+			vma->pages += delta;
+		else
+			new->pages += delta;
 	}
 
 	err = vma_dup_policy(vma, new);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
  2015-01-08 15:28 ` David Vrabel
  (?)
  (?)
@ 2015-01-08 15:28 ` David Vrabel
  -1 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: linux-mm, Boris Ostrovsky, David Vrabel, xen-devel

Add an optional array of pages to struct vm_area_struct that can be
used find the page backing a VMA.  This is useful in cases where the
normal mechanisms for finding the page don't work.  This array is only
inspected if the PTE is special.

Splitting a VMA with such an array of pages is trivially done by
adjusting vma->pages.  The original creator of the VMA must only free
the page array once all sub-VMAs are closed (e.g., by ref-counting in
vm_ops->open and vm_ops->close).

One use case is a Xen PV guest mapping foreign pages into userspace.

In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
example) must do an MFN to PFN (M2P) lookup before it can get the
page.  For foreign pages (those owned by another guest) the M2P lookup
returns the PFN as seen by the foreign guest (which would be
completely the wrong page for the local guest).

This cannot be fixed up improving the M2P lookup since one MFN may be
mapped onto two or more pages so getting the right page is impossible
given just the MFN.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/mm_types.h |    8 ++++++++
 mm/memory.c              |    2 ++
 mm/mmap.c                |   12 +++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6d34aa2..4f34609 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -309,6 +309,14 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA
 	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
 #endif
+	/*
+	 * Array of pages to override the default vm_normal_page()
+	 * result iff the PTE is special.
+	 *
+	 * The memory for this should be refcounted in vm_ops->open
+	 * and vm_ops->close.
+	 */
+	struct page **pages;
 };
 
 struct core_thread {
diff --git a/mm/memory.c b/mm/memory.c
index ca920d1..98520f6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -754,6 +754,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	if (HAVE_PTE_SPECIAL) {
 		if (likely(!pte_special(pte)))
 			goto check_pfn;
+		if (vma->pages)
+			return vma->pages[(addr - vma->vm_start) >> PAGE_SHIFT];
 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
 			return NULL;
 		if (!is_zero_pfn(pfn))
diff --git a/mm/mmap.c b/mm/mmap.c
index 7b36aa7..504dc5c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2448,6 +2448,7 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	      unsigned long addr, int new_below)
 {
 	struct vm_area_struct *new;
+	unsigned long delta;
 	int err = -ENOMEM;
 
 	if (is_vm_hugetlb_page(vma) && (addr &
@@ -2463,11 +2464,20 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	INIT_LIST_HEAD(&new->anon_vma_chain);
 
+	delta = (addr - vma->vm_start) >> PAGE_SHIFT;
+
 	if (new_below)
 		new->vm_end = addr;
 	else {
 		new->vm_start = addr;
-		new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
+		new->vm_pgoff += delta;
+	}
+
+	if (vma->pages) {
+		if (new_below)
+			vma->pages += delta;
+		else
+			new->pages += delta;
 	}
 
 	err = vma_dup_policy(vma, new);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
@ 2015-01-08 15:28   ` David Vrabel
  0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: David Vrabel, linux-mm, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

Add an optional array of pages to struct vm_area_struct that can be
used find the page backing a VMA.  This is useful in cases where the
normal mechanisms for finding the page don't work.  This array is only
inspected if the PTE is special.

Splitting a VMA with such an array of pages is trivially done by
adjusting vma->pages.  The original creator of the VMA must only free
the page array once all sub-VMAs are closed (e.g., by ref-counting in
vm_ops->open and vm_ops->close).

One use case is a Xen PV guest mapping foreign pages into userspace.

In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
example) must do an MFN to PFN (M2P) lookup before it can get the
page.  For foreign pages (those owned by another guest) the M2P lookup
returns the PFN as seen by the foreign guest (which would be
completely the wrong page for the local guest).

This cannot be fixed up improving the M2P lookup since one MFN may be
mapped onto two or more pages so getting the right page is impossible
given just the MFN.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/mm_types.h |    8 ++++++++
 mm/memory.c              |    2 ++
 mm/mmap.c                |   12 +++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6d34aa2..4f34609 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -309,6 +309,14 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA
 	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
 #endif
+	/*
+	 * Array of pages to override the default vm_normal_page()
+	 * result iff the PTE is special.
+	 *
+	 * The memory for this should be refcounted in vm_ops->open
+	 * and vm_ops->close.
+	 */
+	struct page **pages;
 };
 
 struct core_thread {
diff --git a/mm/memory.c b/mm/memory.c
index ca920d1..98520f6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -754,6 +754,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	if (HAVE_PTE_SPECIAL) {
 		if (likely(!pte_special(pte)))
 			goto check_pfn;
+		if (vma->pages)
+			return vma->pages[(addr - vma->vm_start) >> PAGE_SHIFT];
 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
 			return NULL;
 		if (!is_zero_pfn(pfn))
diff --git a/mm/mmap.c b/mm/mmap.c
index 7b36aa7..504dc5c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2448,6 +2448,7 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 	      unsigned long addr, int new_below)
 {
 	struct vm_area_struct *new;
+	unsigned long delta;
 	int err = -ENOMEM;
 
 	if (is_vm_hugetlb_page(vma) && (addr &
@@ -2463,11 +2464,20 @@ static int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	INIT_LIST_HEAD(&new->anon_vma_chain);
 
+	delta = (addr - vma->vm_start) >> PAGE_SHIFT;
+
 	if (new_below)
 		new->vm_end = addr;
 	else {
 		new->vm_start = addr;
-		new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
+		new->vm_pgoff += delta;
+	}
+
+	if (vma->pages) {
+		if (new_below)
+			vma->pages += delta;
+		else
+			new->pages += delta;
 	}
 
 	err = vma_dup_policy(vma, new);
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] mm: add 'foreign' alias for the 'pinned' page flag
  2015-01-08 15:28 ` David Vrabel
@ 2015-01-08 15:28   ` David Vrabel
  -1 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: David Vrabel, linux-mm, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Jenny Herbert

From: Jenny Herbert <jennifer.herbert@citrix.com>

The foreign page flag will be used by Xen guests to mark pages that
have grant mappings of frames from other (foreign) guests.

The foreign flag is an alias for the existing (Xen-specific) pinned
flag.  This is safe because pinned is only used on pages used for page
tables and these cannot also be foreign.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/page-flags.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e1f5fcd..7734cc8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -123,6 +123,7 @@ enum pageflags {
 	/* XEN */
 	PG_pinned = PG_owner_priv_1,
 	PG_savepinned = PG_dirty,
+	PG_foreign = PG_owner_priv_1,
 
 	/* SLOB */
 	PG_slob_free = PG_private,
@@ -215,6 +216,7 @@ __PAGEFLAG(Slab, slab)
 PAGEFLAG(Checked, checked)		/* Used by some filesystems */
 PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned)	/* Xen */
 PAGEFLAG(SavePinned, savepinned);			/* Xen */
+PAGEFLAG(Foreign, foreign);				/* Xen */
 PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
 PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
 	__SETPAGEFLAG(SwapBacked, swapbacked)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] mm: add 'foreign' alias for the 'pinned' page flag
  2015-01-08 15:28 ` David Vrabel
                   ` (2 preceding siblings ...)
  (?)
@ 2015-01-08 15:28 ` David Vrabel
  -1 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Jenny Herbert, linux-mm, David Vrabel, xen-devel, Boris Ostrovsky

From: Jenny Herbert <jennifer.herbert@citrix.com>

The foreign page flag will be used by Xen guests to mark pages that
have grant mappings of frames from other (foreign) guests.

The foreign flag is an alias for the existing (Xen-specific) pinned
flag.  This is safe because pinned is only used on pages used for page
tables and these cannot also be foreign.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/page-flags.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e1f5fcd..7734cc8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -123,6 +123,7 @@ enum pageflags {
 	/* XEN */
 	PG_pinned = PG_owner_priv_1,
 	PG_savepinned = PG_dirty,
+	PG_foreign = PG_owner_priv_1,
 
 	/* SLOB */
 	PG_slob_free = PG_private,
@@ -215,6 +216,7 @@ __PAGEFLAG(Slab, slab)
 PAGEFLAG(Checked, checked)		/* Used by some filesystems */
 PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned)	/* Xen */
 PAGEFLAG(SavePinned, savepinned);			/* Xen */
+PAGEFLAG(Foreign, foreign);				/* Xen */
 PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
 PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
 	__SETPAGEFLAG(SwapBacked, swapbacked)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] mm: add 'foreign' alias for the 'pinned' page flag
@ 2015-01-08 15:28   ` David Vrabel
  0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: David Vrabel, linux-mm, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Jenny Herbert

From: Jenny Herbert <jennifer.herbert@citrix.com>

The foreign page flag will be used by Xen guests to mark pages that
have grant mappings of frames from other (foreign) guests.

The foreign flag is an alias for the existing (Xen-specific) pinned
flag.  This is safe because pinned is only used on pages used for page
tables and these cannot also be foreign.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/page-flags.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e1f5fcd..7734cc8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -123,6 +123,7 @@ enum pageflags {
 	/* XEN */
 	PG_pinned = PG_owner_priv_1,
 	PG_savepinned = PG_dirty,
+	PG_foreign = PG_owner_priv_1,
 
 	/* SLOB */
 	PG_slob_free = PG_private,
@@ -215,6 +216,7 @@ __PAGEFLAG(Slab, slab)
 PAGEFLAG(Checked, checked)		/* Used by some filesystems */
 PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned)	/* Xen */
 PAGEFLAG(SavePinned, savepinned);			/* Xen */
+PAGEFLAG(Foreign, foreign);				/* Xen */
 PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
 PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
 	__SETPAGEFLAG(SwapBacked, swapbacked)
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
  2015-01-08 15:28   ` David Vrabel
@ 2015-01-08 17:20     ` Johannes Weiner
  -1 siblings, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2015-01-08 17:20 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Morton, linux-kernel, linux-mm, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky

On Thu, Jan 08, 2015 at 03:28:43PM +0000, David Vrabel wrote:
> Add an optional array of pages to struct vm_area_struct that can be
> used find the page backing a VMA.  This is useful in cases where the
> normal mechanisms for finding the page don't work.  This array is only
> inspected if the PTE is special.
> 
> Splitting a VMA with such an array of pages is trivially done by
> adjusting vma->pages.  The original creator of the VMA must only free
> the page array once all sub-VMAs are closed (e.g., by ref-counting in
> vm_ops->open and vm_ops->close).
> 
> One use case is a Xen PV guest mapping foreign pages into userspace.
> 
> In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
> example) must do an MFN to PFN (M2P) lookup before it can get the
> page.  For foreign pages (those owned by another guest) the M2P lookup
> returns the PFN as seen by the foreign guest (which would be
> completely the wrong page for the local guest).
> 
> This cannot be fixed up improving the M2P lookup since one MFN may be
> mapped onto two or more pages so getting the right page is impossible
> given just the MFN.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  include/linux/mm_types.h |    8 ++++++++
>  mm/memory.c              |    2 ++
>  mm/mmap.c                |   12 +++++++++++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6d34aa2..4f34609 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -309,6 +309,14 @@ struct vm_area_struct {
>  #ifdef CONFIG_NUMA
>  	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>  #endif
> +	/*
> +	 * Array of pages to override the default vm_normal_page()
> +	 * result iff the PTE is special.
> +	 *
> +	 * The memory for this should be refcounted in vm_ops->open
> +	 * and vm_ops->close.
> +	 */
> +	struct page **pages;

Please make this configuration-dependent, not every Linux user should
have to pay for a Xen optimization.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
  2015-01-08 15:28   ` David Vrabel
  (?)
@ 2015-01-08 17:20   ` Johannes Weiner
  -1 siblings, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2015-01-08 17:20 UTC (permalink / raw)
  To: David Vrabel
  Cc: linux-kernel, linux-mm, xen-devel, Boris Ostrovsky, Andrew Morton

On Thu, Jan 08, 2015 at 03:28:43PM +0000, David Vrabel wrote:
> Add an optional array of pages to struct vm_area_struct that can be
> used find the page backing a VMA.  This is useful in cases where the
> normal mechanisms for finding the page don't work.  This array is only
> inspected if the PTE is special.
> 
> Splitting a VMA with such an array of pages is trivially done by
> adjusting vma->pages.  The original creator of the VMA must only free
> the page array once all sub-VMAs are closed (e.g., by ref-counting in
> vm_ops->open and vm_ops->close).
> 
> One use case is a Xen PV guest mapping foreign pages into userspace.
> 
> In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
> example) must do an MFN to PFN (M2P) lookup before it can get the
> page.  For foreign pages (those owned by another guest) the M2P lookup
> returns the PFN as seen by the foreign guest (which would be
> completely the wrong page for the local guest).
> 
> This cannot be fixed up improving the M2P lookup since one MFN may be
> mapped onto two or more pages so getting the right page is impossible
> given just the MFN.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  include/linux/mm_types.h |    8 ++++++++
>  mm/memory.c              |    2 ++
>  mm/mmap.c                |   12 +++++++++++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6d34aa2..4f34609 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -309,6 +309,14 @@ struct vm_area_struct {
>  #ifdef CONFIG_NUMA
>  	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>  #endif
> +	/*
> +	 * Array of pages to override the default vm_normal_page()
> +	 * result iff the PTE is special.
> +	 *
> +	 * The memory for this should be refcounted in vm_ops->open
> +	 * and vm_ops->close.
> +	 */
> +	struct page **pages;

Please make this configuration-dependent, not every Linux user should
have to pay for a Xen optimization.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
@ 2015-01-08 17:20     ` Johannes Weiner
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Weiner @ 2015-01-08 17:20 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Morton, linux-kernel, linux-mm, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky

On Thu, Jan 08, 2015 at 03:28:43PM +0000, David Vrabel wrote:
> Add an optional array of pages to struct vm_area_struct that can be
> used find the page backing a VMA.  This is useful in cases where the
> normal mechanisms for finding the page don't work.  This array is only
> inspected if the PTE is special.
> 
> Splitting a VMA with such an array of pages is trivially done by
> adjusting vma->pages.  The original creator of the VMA must only free
> the page array once all sub-VMAs are closed (e.g., by ref-counting in
> vm_ops->open and vm_ops->close).
> 
> One use case is a Xen PV guest mapping foreign pages into userspace.
> 
> In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
> example) must do an MFN to PFN (M2P) lookup before it can get the
> page.  For foreign pages (those owned by another guest) the M2P lookup
> returns the PFN as seen by the foreign guest (which would be
> completely the wrong page for the local guest).
> 
> This cannot be fixed up improving the M2P lookup since one MFN may be
> mapped onto two or more pages so getting the right page is impossible
> given just the MFN.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  include/linux/mm_types.h |    8 ++++++++
>  mm/memory.c              |    2 ++
>  mm/mmap.c                |   12 +++++++++++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6d34aa2..4f34609 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -309,6 +309,14 @@ struct vm_area_struct {
>  #ifdef CONFIG_NUMA
>  	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>  #endif
> +	/*
> +	 * Array of pages to override the default vm_normal_page()
> +	 * result iff the PTE is special.
> +	 *
> +	 * The memory for this should be refcounted in vm_ops->open
> +	 * and vm_ops->close.
> +	 */
> +	struct page **pages;

Please make this configuration-dependent, not every Linux user should
have to pay for a Xen optimization.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
  2015-01-08 17:20     ` Johannes Weiner
@ 2015-01-08 17:50       ` David Vrabel
  -1 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 17:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky

On 08/01/15 17:20, Johannes Weiner wrote:
> On Thu, Jan 08, 2015 at 03:28:43PM +0000, David Vrabel wrote:
>> Add an optional array of pages to struct vm_area_struct that can be
>> used find the page backing a VMA.  This is useful in cases where the
>> normal mechanisms for finding the page don't work.  This array is only
>> inspected if the PTE is special.
>>
>> Splitting a VMA with such an array of pages is trivially done by
>> adjusting vma->pages.  The original creator of the VMA must only free
>> the page array once all sub-VMAs are closed (e.g., by ref-counting in
>> vm_ops->open and vm_ops->close).
>>
>> One use case is a Xen PV guest mapping foreign pages into userspace.
>>
>> In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
>> example) must do an MFN to PFN (M2P) lookup before it can get the
>> page.  For foreign pages (those owned by another guest) the M2P lookup
>> returns the PFN as seen by the foreign guest (which would be
>> completely the wrong page for the local guest).
>>
>> This cannot be fixed up improving the M2P lookup since one MFN may be
>> mapped onto two or more pages so getting the right page is impossible
>> given just the MFN.
[...]
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -309,6 +309,14 @@ struct vm_area_struct {
>>  #ifdef CONFIG_NUMA
>>  	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>>  #endif
>> +	/*
>> +	 * Array of pages to override the default vm_normal_page()
>> +	 * result iff the PTE is special.
>> +	 *
>> +	 * The memory for this should be refcounted in vm_ops->open
>> +	 * and vm_ops->close.
>> +	 */
>> +	struct page **pages;
> 
> Please make this configuration-dependent, not every Linux user should
> have to pay for a Xen optimization.

If the additional field in struct vm_area_struct is a concern, I would
prefer to use a vm_flag bit and union pages with an existing field.

Perhaps using VM_PFNMAP and reusing vm_file?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
  2015-01-08 17:20     ` Johannes Weiner
  (?)
@ 2015-01-08 17:50     ` David Vrabel
  -1 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 17:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-kernel, linux-mm, xen-devel, Boris Ostrovsky, Andrew Morton

On 08/01/15 17:20, Johannes Weiner wrote:
> On Thu, Jan 08, 2015 at 03:28:43PM +0000, David Vrabel wrote:
>> Add an optional array of pages to struct vm_area_struct that can be
>> used find the page backing a VMA.  This is useful in cases where the
>> normal mechanisms for finding the page don't work.  This array is only
>> inspected if the PTE is special.
>>
>> Splitting a VMA with such an array of pages is trivially done by
>> adjusting vma->pages.  The original creator of the VMA must only free
>> the page array once all sub-VMAs are closed (e.g., by ref-counting in
>> vm_ops->open and vm_ops->close).
>>
>> One use case is a Xen PV guest mapping foreign pages into userspace.
>>
>> In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
>> example) must do an MFN to PFN (M2P) lookup before it can get the
>> page.  For foreign pages (those owned by another guest) the M2P lookup
>> returns the PFN as seen by the foreign guest (which would be
>> completely the wrong page for the local guest).
>>
>> This cannot be fixed up improving the M2P lookup since one MFN may be
>> mapped onto two or more pages so getting the right page is impossible
>> given just the MFN.
[...]
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -309,6 +309,14 @@ struct vm_area_struct {
>>  #ifdef CONFIG_NUMA
>>  	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>>  #endif
>> +	/*
>> +	 * Array of pages to override the default vm_normal_page()
>> +	 * result iff the PTE is special.
>> +	 *
>> +	 * The memory for this should be refcounted in vm_ops->open
>> +	 * and vm_ops->close.
>> +	 */
>> +	struct page **pages;
> 
> Please make this configuration-dependent, not every Linux user should
> have to pay for a Xen optimization.

If the additional field in struct vm_area_struct is a concern, I would
prefer to use a vm_flag bit and union pages with an existing field.

Perhaps using VM_PFNMAP and reusing vm_file?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings
@ 2015-01-08 17:50       ` David Vrabel
  0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-01-08 17:50 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky

On 08/01/15 17:20, Johannes Weiner wrote:
> On Thu, Jan 08, 2015 at 03:28:43PM +0000, David Vrabel wrote:
>> Add an optional array of pages to struct vm_area_struct that can be
>> used find the page backing a VMA.  This is useful in cases where the
>> normal mechanisms for finding the page don't work.  This array is only
>> inspected if the PTE is special.
>>
>> Splitting a VMA with such an array of pages is trivially done by
>> adjusting vma->pages.  The original creator of the VMA must only free
>> the page array once all sub-VMAs are closed (e.g., by ref-counting in
>> vm_ops->open and vm_ops->close).
>>
>> One use case is a Xen PV guest mapping foreign pages into userspace.
>>
>> In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
>> example) must do an MFN to PFN (M2P) lookup before it can get the
>> page.  For foreign pages (those owned by another guest) the M2P lookup
>> returns the PFN as seen by the foreign guest (which would be
>> completely the wrong page for the local guest).
>>
>> This cannot be fixed up improving the M2P lookup since one MFN may be
>> mapped onto two or more pages so getting the right page is impossible
>> given just the MFN.
[...]
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -309,6 +309,14 @@ struct vm_area_struct {
>>  #ifdef CONFIG_NUMA
>>  	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>>  #endif
>> +	/*
>> +	 * Array of pages to override the default vm_normal_page()
>> +	 * result iff the PTE is special.
>> +	 *
>> +	 * The memory for this should be refcounted in vm_ops->open
>> +	 * and vm_ops->close.
>> +	 */
>> +	struct page **pages;
> 
> Please make this configuration-dependent, not every Linux user should
> have to pay for a Xen optimization.

If the additional field in struct vm_area_struct is a concern, I would
prefer to use a vm_flag bit and union pages with an existing field.

Perhaps using VM_PFNMAP and reusing vm_file?

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-01-08 17:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-08 15:28 [PATCHv1 0/2] mm: infrastructure for correctly handling foreign pages on Xen David Vrabel
2015-01-08 15:28 ` David Vrabel
2015-01-08 15:28 ` [PATCH 1/2] mm: allow for an alternate set of pages for userspace mappings David Vrabel
2015-01-08 15:28   ` David Vrabel
2015-01-08 17:20   ` Johannes Weiner
2015-01-08 17:20   ` Johannes Weiner
2015-01-08 17:20     ` Johannes Weiner
2015-01-08 17:50     ` David Vrabel
2015-01-08 17:50     ` David Vrabel
2015-01-08 17:50       ` David Vrabel
2015-01-08 15:28 ` David Vrabel
2015-01-08 15:28 ` [PATCH 2/2] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
2015-01-08 15:28 ` David Vrabel
2015-01-08 15:28   ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.