All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs
@ 2015-01-19 15:51 David Vrabel
  2015-01-19 15:51 ` [PATCH 01/14] mm: provide a find_special_page vma operation David Vrabel
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

This series fixes a number of long-standing bugs in the handling of
grant maps.  Refer to the following for all the details.

  http://xenbits.xen.org/people/dvrabel/grant-improvements-C.pdf

In summary, the important uses that this enables are:

1. Block backends can use networked storage safely.

2. Block backends can use network storage provided by other guests on
   the same host.

3. User space block backends can use direct I/O or asynchronous I/O.

The first two patches are the core MM changes necessary.  I shall be
sending these to the MM maintainers seperately.

Patches #3 and #4 remove existing (broken) mechanisms.  This does
temporarily break some previously working use cases, but it does make
the subsequent additions much easier to review.

As a happy side effect, performance is also likely to be improved in
some areas (but I've not got any measurements yet).  User space
backends using grant mapping should see some good improvements from
reduced overheads and better unmap batching.  VIF to VIF network
traffic may also see a small improvement.

Finally, thanks to Jenny who did much of the implementation.

Changes in v3:
- find_page renamed to find_special_page.
- Added documentation for mm changes.
- Fixed mangled forward port of blkback's safe unmap patch.
- Export gnttab_alloc_pages() and gnttab_free_pages().
- Fix 32-bit build.

Changes in v2:
- Add find_page() VMA op instead of pages field (so struct
  vm_area_struct doesn't increase in size.
- Make foreign page tracking arch-independant and improve the API.
- Alloc extra memory (for 32-bit archs) for the (domain, gref) when
  allocating the page (instead of during the map).
- Convert gntdev's lock to a mutex.

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/14] mm: provide a find_special_page vma operation
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 02/14] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

The optional find_special_page VMA operation is used to lookup the
pages backing a VMA.  This is useful in cases where the normal
mechanisms for finding the page don't work.  This is only called if
the PTE is special.

One use case is a Xen PV guest mapping foreign pages into userspace.

In a Xen PV guest, the PTEs contain MFNs so get_user_pages() (for
example) must do an MFN to PFN (M2P) lookup before it can get the
page.  For foreign pages (those owned by another guest) the M2P lookup
returns the PFN as seen by the foreign guest (which would be
completely the wrong page for the local guest).

This cannot be fixed up improving the M2P lookup since one MFN may be
mapped onto two or more pages so getting the right page is impossible
given just the MFN.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/mm.h |    8 ++++++++
 mm/memory.c        |    2 ++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80fc92a..9269af7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -290,6 +290,14 @@ struct vm_operations_struct {
 	/* called by sys_remap_file_pages() to populate non-linear mapping */
 	int (*remap_pages)(struct vm_area_struct *vma, unsigned long addr,
 			   unsigned long size, pgoff_t pgoff);
+
+	/*
+	 * Called by vm_normal_page() for special PTEs to find the
+	 * page for @addr.  This is useful if the default behavior
+	 * (using pte_page()) would not find the correct page.
+	 */
+	struct page *(*find_special_page)(struct vm_area_struct *vma,
+					  unsigned long addr);
 };
 
 struct mmu_gather;
diff --git a/mm/memory.c b/mm/memory.c
index 54f3a9b..dc2e01a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -754,6 +754,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 	if (HAVE_PTE_SPECIAL) {
 		if (likely(!pte_special(pte)))
 			goto check_pfn;
+		if (vma->vm_ops && vma->vm_ops->find_special_page)
+			return vma->vm_ops->find_special_page(vma, addr);
 		if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
 			return NULL;
 		if (!is_zero_pfn(pfn))
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/14] mm: add 'foreign' alias for the 'pinned' page flag
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
  2015-01-19 15:51 ` [PATCH 01/14] mm: provide a find_special_page vma operation David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 03/14] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() David Vrabel
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, David Vrabel

From: Jenny Herbert <jennifer.herbert@citrix.com>

The foreign page flag will be used by Xen guests to mark pages that
have grant mappings of frames from other (foreign) guests.

The foreign flag is an alias for the existing (Xen-specific) pinned
flag.  This is safe because pinned is only used on pages used for page
tables and these cannot also be foreign.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 include/linux/page-flags.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e1f5fcd..5ed7bda 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -121,8 +121,12 @@ enum pageflags {
 	PG_fscache = PG_private_2,	/* page backed by cache */
 
 	/* XEN */
+	/* Pinned in Xen as a read-only pagetable page. */
 	PG_pinned = PG_owner_priv_1,
+	/* Pinned as part of domain save (see xen_mm_pin_all()). */
 	PG_savepinned = PG_dirty,
+	/* Has a grant mapping of another (foreign) domain's page. */
+	PG_foreign = PG_owner_priv_1,
 
 	/* SLOB */
 	PG_slob_free = PG_private,
@@ -215,6 +219,7 @@ __PAGEFLAG(Slab, slab)
 PAGEFLAG(Checked, checked)		/* Used by some filesystems */
 PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned)	/* Xen */
 PAGEFLAG(SavePinned, savepinned);			/* Xen */
+PAGEFLAG(Foreign, foreign);				/* Xen */
 PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved)
 PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked)
 	__SETPAGEFLAG(SwapBacked, swapbacked)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/14] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
  2015-01-19 15:51 ` [PATCH 01/14] mm: provide a find_special_page vma operation David Vrabel
  2015-01-19 15:51 ` [PATCH 02/14] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 04/14] xen: remove scratch frames for ballooned pages and m2p override David Vrabel
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

When unmapping grants, instead of converting the kernel map ops to
unmap ops on the fly, pre-populate the set of unmap ops.

This allows the grant unmap for the kernel mappings to be trivially
batched in the future.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/arm/include/asm/xen/page.h |    2 +-
 arch/arm/xen/p2m.c              |    2 +-
 arch/x86/include/asm/xen/page.h |    2 +-
 arch/x86/xen/p2m.c              |   21 ++++++++++-----------
 drivers/xen/gntdev.c            |   26 ++++++++++++++++++--------
 drivers/xen/grant-table.c       |    4 ++--
 include/xen/grant_table.h       |    2 +-
 7 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 68c739b..2f7e6ff 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -92,7 +92,7 @@ extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 				   struct page **pages, unsigned int count);
 
 extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
-				     struct gnttab_map_grant_ref *kmap_ops,
+				     struct gnttab_unmap_grant_ref *kunmap_ops,
 				     struct page **pages, unsigned int count);
 
 bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 0548577..cb7a14c 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -102,7 +102,7 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping);
 
 int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
-			      struct gnttab_map_grant_ref *kmap_ops,
+			      struct gnttab_unmap_grant_ref *kunmap_ops,
 			      struct page **pages, unsigned int count)
 {
 	int i;
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 5eea099..e9f52fe 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -55,7 +55,7 @@ extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 				   struct gnttab_map_grant_ref *kmap_ops,
 				   struct page **pages, unsigned int count);
 extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
-				     struct gnttab_map_grant_ref *kmap_ops,
+				     struct gnttab_unmap_grant_ref *kunmap_ops,
 				     struct page **pages, unsigned int count);
 extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
 
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 70fb507..df40b28 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -816,7 +816,7 @@ static struct page *m2p_find_override(unsigned long mfn)
 }
 
 static int m2p_remove_override(struct page *page,
-			       struct gnttab_map_grant_ref *kmap_op,
+			       struct gnttab_unmap_grant_ref *kunmap_op,
 			       unsigned long mfn)
 {
 	unsigned long flags;
@@ -840,7 +840,7 @@ static int m2p_remove_override(struct page *page,
 	list_del(&page->lru);
 	spin_unlock_irqrestore(&m2p_override_lock, flags);
 
-	if (kmap_op != NULL) {
+	if (kunmap_op != NULL) {
 		if (!PageHighMem(page)) {
 			struct multicall_space mcs;
 			struct gnttab_unmap_and_replace *unmap_op;
@@ -855,13 +855,13 @@ static int m2p_remove_override(struct page *page,
 			 * issued. In this case handle is going to -1 because
 			 * it hasn't been modified yet.
 			 */
-			if (kmap_op->handle == -1)
+			if (kunmap_op->handle == -1)
 				xen_mc_flush();
 			/*
 			 * Now if kmap_op->handle is negative it means that the
 			 * hypercall actually returned an error.
 			 */
-			if (kmap_op->handle == GNTST_general_error) {
+			if (kunmap_op->handle == GNTST_general_error) {
 				pr_warn("m2p_remove_override: pfn %lx mfn %lx, failed to modify kernel mappings",
 					pfn, mfn);
 				put_balloon_scratch_page();
@@ -873,9 +873,9 @@ static int m2p_remove_override(struct page *page,
 			mcs = __xen_mc_entry(
 				sizeof(struct gnttab_unmap_and_replace));
 			unmap_op = mcs.args;
-			unmap_op->host_addr = kmap_op->host_addr;
+			unmap_op->host_addr = kunmap_op->host_addr;
 			unmap_op->new_addr = scratch_page_address;
-			unmap_op->handle = kmap_op->handle;
+			unmap_op->handle = kunmap_op->handle;
 
 			MULTI_grant_table_op(mcs.mc,
 				GNTTABOP_unmap_and_replace, unmap_op, 1);
@@ -887,7 +887,6 @@ static int m2p_remove_override(struct page *page,
 
 			xen_mc_issue(PARAVIRT_LAZY_MMU);
 
-			kmap_op->host_addr = 0;
 			put_balloon_scratch_page();
 		}
 	}
@@ -912,7 +911,7 @@ static int m2p_remove_override(struct page *page,
 }
 
 int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
-			      struct gnttab_map_grant_ref *kmap_ops,
+			      struct gnttab_unmap_grant_ref *kunmap_ops,
 			      struct page **pages, unsigned int count)
 {
 	int i, ret = 0;
@@ -921,7 +920,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return 0;
 
-	if (kmap_ops &&
+	if (kunmap_ops &&
 	    !in_interrupt() &&
 	    paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
 		arch_enter_lazy_mmu_mode();
@@ -942,8 +941,8 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 		ClearPagePrivate(pages[i]);
 		set_phys_to_machine(pfn, pages[i]->index);
 
-		if (kmap_ops)
-			ret = m2p_remove_override(pages[i], &kmap_ops[i], mfn);
+		if (kunmap_ops)
+			ret = m2p_remove_override(pages[i], &kunmap_ops[i], mfn);
 		if (ret)
 			goto out;
 	}
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 073b4a1..32f6bfe 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -91,6 +91,7 @@ struct grant_map {
 	struct gnttab_map_grant_ref   *map_ops;
 	struct gnttab_unmap_grant_ref *unmap_ops;
 	struct gnttab_map_grant_ref   *kmap_ops;
+	struct gnttab_unmap_grant_ref *kunmap_ops;
 	struct page **pages;
 };
 
@@ -124,6 +125,7 @@ static void gntdev_free_map(struct grant_map *map)
 	kfree(map->map_ops);
 	kfree(map->unmap_ops);
 	kfree(map->kmap_ops);
+	kfree(map->kunmap_ops);
 	kfree(map);
 }
 
@@ -140,11 +142,13 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
 	add->map_ops   = kcalloc(count, sizeof(add->map_ops[0]), GFP_KERNEL);
 	add->unmap_ops = kcalloc(count, sizeof(add->unmap_ops[0]), GFP_KERNEL);
 	add->kmap_ops  = kcalloc(count, sizeof(add->kmap_ops[0]), GFP_KERNEL);
+	add->kunmap_ops = kcalloc(count, sizeof(add->kunmap_ops[0]), GFP_KERNEL);
 	add->pages     = kcalloc(count, sizeof(add->pages[0]), GFP_KERNEL);
 	if (NULL == add->grants    ||
 	    NULL == add->map_ops   ||
 	    NULL == add->unmap_ops ||
 	    NULL == add->kmap_ops  ||
+	    NULL == add->kunmap_ops ||
 	    NULL == add->pages)
 		goto err;
 
@@ -155,6 +159,7 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
 		add->map_ops[i].handle = -1;
 		add->unmap_ops[i].handle = -1;
 		add->kmap_ops[i].handle = -1;
+		add->kunmap_ops[i].handle = -1;
 	}
 
 	add->index = 0;
@@ -261,8 +266,6 @@ static int map_grant_pages(struct grant_map *map)
 			gnttab_set_map_op(&map->map_ops[i], addr, map->flags,
 				map->grants[i].ref,
 				map->grants[i].domid);
-			gnttab_set_unmap_op(&map->unmap_ops[i], addr,
-				map->flags, -1 /* handle */);
 		}
 	} else {
 		/*
@@ -290,13 +293,20 @@ static int map_grant_pages(struct grant_map *map)
 		return err;
 
 	for (i = 0; i < map->count; i++) {
-		if (map->map_ops[i].status)
+		if (map->map_ops[i].status) {
 			err = -EINVAL;
-		else {
-			BUG_ON(map->map_ops[i].handle == -1);
-			map->unmap_ops[i].handle = map->map_ops[i].handle;
-			pr_debug("map handle=%d\n", map->map_ops[i].handle);
+			continue;
 		}
+
+		gnttab_set_unmap_op(&map->unmap_ops[i],
+				    map->map_ops[i].host_addr,
+				    map->flags,
+				    map->map_ops[i].handle);
+		if (use_ptemod)
+			gnttab_set_unmap_op(&map->kunmap_ops[i],
+					    map->kmap_ops[i].host_addr,
+					    map->flags | GNTMAP_host_map,
+					    map->kmap_ops[i].handle);
 	}
 	return err;
 }
@@ -316,7 +326,7 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
 	}
 
 	err = gnttab_unmap_refs(map->unmap_ops + offset,
-			use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
+			use_ptemod ? map->kunmap_ops + offset : NULL, map->pages + offset,
 			pages);
 	if (err)
 		return err;
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7786291..999d7ab 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -738,7 +738,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 EXPORT_SYMBOL_GPL(gnttab_map_refs);
 
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
-		      struct gnttab_map_grant_ref *kmap_ops,
+		      struct gnttab_unmap_grant_ref *kunmap_ops,
 		      struct page **pages, unsigned int count)
 {
 	int ret;
@@ -747,7 +747,7 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 	if (ret)
 		return ret;
 
-	return clear_foreign_p2m_mapping(unmap_ops, kmap_ops, pages, count);
+	return clear_foreign_p2m_mapping(unmap_ops, kunmap_ops, pages, count);
 }
 EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
 
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 3387465..7235d8f 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -167,7 +167,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count);
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
-		      struct gnttab_map_grant_ref *kunmap_ops,
+		      struct gnttab_unmap_grant_ref *kunmap_ops,
 		      struct page **pages, unsigned int count);
 
 /* Perform a batch of grant map/copy operations. Retry every batch slot
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/14] xen: remove scratch frames for ballooned pages and m2p override
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (2 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 03/14] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 05/14] x86/xen: require ballooned pages for grant maps David Vrabel
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

The scratch frame mappings for ballooned pages and the m2p override
are broken.  Remove them in preparation for replacing them with
simpler mechanisms that works.

The scratch pages did not ensure that the page was not in use.  In
particular, the foreign page could still be in use by hardware.  If
the guest reused the frame the hardware could read or write that
frame.

The m2p override did not handle the same frame being granted by two
different grant references.  Trying an M2P override lookup in this
case is impossible.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/include/asm/xen/page.h |   18 +--
 arch/x86/xen/p2m.c              |  254 ++-------------------------------------
 drivers/xen/balloon.c           |   86 +------------
 3 files changed, 14 insertions(+), 344 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index e9f52fe..358dcd3 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -57,7 +57,6 @@ extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 				     struct gnttab_unmap_grant_ref *kunmap_ops,
 				     struct page **pages, unsigned int count);
-extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
 
 /*
  * Helper functions to write or read unsigned long values to/from
@@ -154,21 +153,12 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 		return mfn;
 
 	pfn = mfn_to_pfn_no_overrides(mfn);
-	if (__pfn_to_mfn(pfn) != mfn) {
-		/*
-		 * If this appears to be a foreign mfn (because the pfn
-		 * doesn't map back to the mfn), then check the local override
-		 * table to see if there's a better pfn to use.
-		 *
-		 * m2p_find_override_pfn returns ~0 if it doesn't find anything.
-		 */
-		pfn = m2p_find_override_pfn(mfn, ~0);
-	}
+	if (__pfn_to_mfn(pfn) != mfn)
+		pfn = ~0;
 
 	/*
-	 * pfn is ~0 if there are no entries in the m2p for mfn or if the
-	 * entry doesn't map back to the mfn and m2p_override doesn't have a
-	 * valid entry for it.
+	 * pfn is ~0 if there are no entries in the m2p for mfn or the
+	 * entry doesn't map back to the mfn.
 	 */
 	if (pfn == ~0 && __pfn_to_mfn(mfn) == IDENTITY_FRAME(mfn))
 		pfn = mfn;
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index df40b28..c9bc53f 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -84,8 +84,6 @@
 
 #define PMDS_PER_MID_PAGE	(P2M_MID_PER_PAGE / PTRS_PER_PTE)
 
-static void __init m2p_override_init(void);
-
 unsigned long *xen_p2m_addr __read_mostly;
 EXPORT_SYMBOL_GPL(xen_p2m_addr);
 unsigned long xen_p2m_size __read_mostly;
@@ -402,8 +400,6 @@ void __init xen_vmalloc_p2m_tree(void)
 	xen_p2m_size = xen_max_p2m_pfn;
 
 	xen_inv_extra_mem();
-
-	m2p_override_init();
 }
 
 unsigned long get_phys_to_machine(unsigned long pfn)
@@ -652,100 +648,21 @@ bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 	return true;
 }
 
-#define M2P_OVERRIDE_HASH_SHIFT	10
-#define M2P_OVERRIDE_HASH	(1 << M2P_OVERRIDE_HASH_SHIFT)
-
-static struct list_head *m2p_overrides;
-static DEFINE_SPINLOCK(m2p_override_lock);
-
-static void __init m2p_override_init(void)
-{
-	unsigned i;
-
-	m2p_overrides = alloc_bootmem_align(
-				sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
-				sizeof(unsigned long));
-
-	for (i = 0; i < M2P_OVERRIDE_HASH; i++)
-		INIT_LIST_HEAD(&m2p_overrides[i]);
-}
-
-static unsigned long mfn_hash(unsigned long mfn)
-{
-	return hash_long(mfn, M2P_OVERRIDE_HASH_SHIFT);
-}
-
-/* Add an MFN override for a particular page */
-static int m2p_add_override(unsigned long mfn, struct page *page,
-			    struct gnttab_map_grant_ref *kmap_op)
-{
-	unsigned long flags;
-	unsigned long pfn;
-	unsigned long uninitialized_var(address);
-	unsigned level;
-	pte_t *ptep = NULL;
-
-	pfn = page_to_pfn(page);
-	if (!PageHighMem(page)) {
-		address = (unsigned long)__va(pfn << PAGE_SHIFT);
-		ptep = lookup_address(address, &level);
-		if (WARN(ptep == NULL || level != PG_LEVEL_4K,
-			 "m2p_add_override: pfn %lx not mapped", pfn))
-			return -EINVAL;
-	}
-
-	if (kmap_op != NULL) {
-		if (!PageHighMem(page)) {
-			struct multicall_space mcs =
-				xen_mc_entry(sizeof(*kmap_op));
-
-			MULTI_grant_table_op(mcs.mc,
-					GNTTABOP_map_grant_ref, kmap_op, 1);
-
-			xen_mc_issue(PARAVIRT_LAZY_MMU);
-		}
-	}
-	spin_lock_irqsave(&m2p_override_lock, flags);
-	list_add(&page->lru,  &m2p_overrides[mfn_hash(mfn)]);
-	spin_unlock_irqrestore(&m2p_override_lock, flags);
-
-	/* p2m(m2p(mfn)) == mfn: the mfn is already present somewhere in
-	 * this domain. Set the FOREIGN_FRAME_BIT in the p2m for the other
-	 * pfn so that the following mfn_to_pfn(mfn) calls will return the
-	 * pfn from the m2p_override (the backend pfn) instead.
-	 * We need to do this because the pages shared by the frontend
-	 * (xen-blkfront) can be already locked (lock_page, called by
-	 * do_read_cache_page); when the userspace backend tries to use them
-	 * with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
-	 * do_blockdev_direct_IO is going to try to lock the same pages
-	 * again resulting in a deadlock.
-	 * As a side effect get_user_pages_fast might not be safe on the
-	 * frontend pages while they are being shared with the backend,
-	 * because mfn_to_pfn (that ends up being called by GUPF) will
-	 * return the backend pfn rather than the frontend pfn. */
-	pfn = mfn_to_pfn_no_overrides(mfn);
-	if (__pfn_to_mfn(pfn) == mfn)
-		set_phys_to_machine(pfn, FOREIGN_FRAME(mfn));
-
-	return 0;
-}
-
 int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 			    struct gnttab_map_grant_ref *kmap_ops,
 			    struct page **pages, unsigned int count)
 {
 	int i, ret = 0;
-	bool lazy = false;
 	pte_t *pte;
 
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return 0;
 
-	if (kmap_ops &&
-	    !in_interrupt() &&
-	    paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
-		arch_enter_lazy_mmu_mode();
-		lazy = true;
+	if (kmap_ops) {
+		ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
+						kmap_ops, count);
+		if (ret)
+			goto out;
 	}
 
 	for (i = 0; i < count; i++) {
@@ -773,160 +690,22 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 			ret = -ENOMEM;
 			goto out;
 		}
-
-		if (kmap_ops) {
-			ret = m2p_add_override(mfn, pages[i], &kmap_ops[i]);
-			if (ret)
-				goto out;
-		}
 	}
 
 out:
-	if (lazy)
-		arch_leave_lazy_mmu_mode();
-
 	return ret;
 }
 EXPORT_SYMBOL_GPL(set_foreign_p2m_mapping);
 
-static struct page *m2p_find_override(unsigned long mfn)
-{
-	unsigned long flags;
-	struct list_head *bucket;
-	struct page *p, *ret;
-
-	if (unlikely(!m2p_overrides))
-		return NULL;
-
-	ret = NULL;
-	bucket = &m2p_overrides[mfn_hash(mfn)];
-
-	spin_lock_irqsave(&m2p_override_lock, flags);
-
-	list_for_each_entry(p, bucket, lru) {
-		if (page_private(p) == mfn) {
-			ret = p;
-			break;
-		}
-	}
-
-	spin_unlock_irqrestore(&m2p_override_lock, flags);
-
-	return ret;
-}
-
-static int m2p_remove_override(struct page *page,
-			       struct gnttab_unmap_grant_ref *kunmap_op,
-			       unsigned long mfn)
-{
-	unsigned long flags;
-	unsigned long pfn;
-	unsigned long uninitialized_var(address);
-	unsigned level;
-	pte_t *ptep = NULL;
-
-	pfn = page_to_pfn(page);
-
-	if (!PageHighMem(page)) {
-		address = (unsigned long)__va(pfn << PAGE_SHIFT);
-		ptep = lookup_address(address, &level);
-
-		if (WARN(ptep == NULL || level != PG_LEVEL_4K,
-			 "m2p_remove_override: pfn %lx not mapped", pfn))
-			return -EINVAL;
-	}
-
-	spin_lock_irqsave(&m2p_override_lock, flags);
-	list_del(&page->lru);
-	spin_unlock_irqrestore(&m2p_override_lock, flags);
-
-	if (kunmap_op != NULL) {
-		if (!PageHighMem(page)) {
-			struct multicall_space mcs;
-			struct gnttab_unmap_and_replace *unmap_op;
-			struct page *scratch_page = get_balloon_scratch_page();
-			unsigned long scratch_page_address = (unsigned long)
-				__va(page_to_pfn(scratch_page) << PAGE_SHIFT);
-
-			/*
-			 * It might be that we queued all the m2p grant table
-			 * hypercalls in a multicall, then m2p_remove_override
-			 * get called before the multicall has actually been
-			 * issued. In this case handle is going to -1 because
-			 * it hasn't been modified yet.
-			 */
-			if (kunmap_op->handle == -1)
-				xen_mc_flush();
-			/*
-			 * Now if kmap_op->handle is negative it means that the
-			 * hypercall actually returned an error.
-			 */
-			if (kunmap_op->handle == GNTST_general_error) {
-				pr_warn("m2p_remove_override: pfn %lx mfn %lx, failed to modify kernel mappings",
-					pfn, mfn);
-				put_balloon_scratch_page();
-				return -1;
-			}
-
-			xen_mc_batch();
-
-			mcs = __xen_mc_entry(
-				sizeof(struct gnttab_unmap_and_replace));
-			unmap_op = mcs.args;
-			unmap_op->host_addr = kunmap_op->host_addr;
-			unmap_op->new_addr = scratch_page_address;
-			unmap_op->handle = kunmap_op->handle;
-
-			MULTI_grant_table_op(mcs.mc,
-				GNTTABOP_unmap_and_replace, unmap_op, 1);
-
-			mcs = __xen_mc_entry(0);
-			MULTI_update_va_mapping(mcs.mc, scratch_page_address,
-					pfn_pte(page_to_pfn(scratch_page),
-					PAGE_KERNEL_RO), 0);
-
-			xen_mc_issue(PARAVIRT_LAZY_MMU);
-
-			put_balloon_scratch_page();
-		}
-	}
-
-	/* p2m(m2p(mfn)) == FOREIGN_FRAME(mfn): the mfn is already present
-	 * somewhere in this domain, even before being added to the
-	 * m2p_override (see comment above in m2p_add_override).
-	 * If there are no other entries in the m2p_override corresponding
-	 * to this mfn, then remove the FOREIGN_FRAME_BIT from the p2m for
-	 * the original pfn (the one shared by the frontend): the backend
-	 * cannot do any IO on this page anymore because it has been
-	 * unshared. Removing the FOREIGN_FRAME_BIT from the p2m entry of
-	 * the original pfn causes mfn_to_pfn(mfn) to return the frontend
-	 * pfn again. */
-	mfn &= ~FOREIGN_FRAME_BIT;
-	pfn = mfn_to_pfn_no_overrides(mfn);
-	if (__pfn_to_mfn(pfn) == FOREIGN_FRAME(mfn) &&
-			m2p_find_override(mfn) == NULL)
-		set_phys_to_machine(pfn, mfn);
-
-	return 0;
-}
-
 int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 			      struct gnttab_unmap_grant_ref *kunmap_ops,
 			      struct page **pages, unsigned int count)
 {
 	int i, ret = 0;
-	bool lazy = false;
 
 	if (xen_feature(XENFEAT_auto_translated_physmap))
 		return 0;
 
-	if (kunmap_ops &&
-	    !in_interrupt() &&
-	    paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
-		arch_enter_lazy_mmu_mode();
-		lazy = true;
-	}
-
 	for (i = 0; i < count; i++) {
 		unsigned long mfn = __pfn_to_mfn(page_to_pfn(pages[i]));
 		unsigned long pfn = page_to_pfn(pages[i]);
@@ -940,32 +719,15 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 		WARN_ON(!PagePrivate(pages[i]));
 		ClearPagePrivate(pages[i]);
 		set_phys_to_machine(pfn, pages[i]->index);
-
-		if (kunmap_ops)
-			ret = m2p_remove_override(pages[i], &kunmap_ops[i], mfn);
-		if (ret)
-			goto out;
 	}
-
+	if (kunmap_ops)
+		ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+						kunmap_ops, count);
 out:
-	if (lazy)
-		arch_leave_lazy_mmu_mode();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(clear_foreign_p2m_mapping);
 
-unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn)
-{
-	struct page *p = m2p_find_override(mfn);
-	unsigned long ret = pfn;
-
-	if (p)
-		ret = page_to_pfn(p);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(m2p_find_override_pfn);
-
 #ifdef CONFIG_XEN_DEBUG_FS
 #include <linux/debugfs.h>
 #include "debugfs.h"
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3860d02..0b52d92 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -92,7 +92,6 @@ EXPORT_SYMBOL_GPL(balloon_stats);
 
 /* We increase/decrease in batches which fit in a page */
 static xen_pfn_t frame_list[PAGE_SIZE / sizeof(unsigned long)];
-static DEFINE_PER_CPU(struct page *, balloon_scratch_page);
 
 
 /* List of ballooned pages, threaded through the mem_map array. */
@@ -423,22 +422,12 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 		page = pfn_to_page(pfn);
 
 #ifdef CONFIG_XEN_HAVE_PVMMU
-		/*
-		 * Ballooned out frames are effectively replaced with
-		 * a scratch frame.  Ensure direct mappings and the
-		 * p2m are consistent.
-		 */
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			if (!PageHighMem(page)) {
-				struct page *scratch_page = get_balloon_scratch_page();
-
 				ret = HYPERVISOR_update_va_mapping(
 						(unsigned long)__va(pfn << PAGE_SHIFT),
-						pfn_pte(page_to_pfn(scratch_page),
-							PAGE_KERNEL_RO), 0);
+						__pte_ma(0), 0);
 				BUG_ON(ret);
-
-				put_balloon_scratch_page();
 			}
 			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 		}
@@ -500,18 +489,6 @@ static void balloon_process(struct work_struct *work)
 	mutex_unlock(&balloon_mutex);
 }
 
-struct page *get_balloon_scratch_page(void)
-{
-	struct page *ret = get_cpu_var(balloon_scratch_page);
-	BUG_ON(ret == NULL);
-	return ret;
-}
-
-void put_balloon_scratch_page(void)
-{
-	put_cpu_var(balloon_scratch_page);
-}
-
 /* Resets the Xen limit, sets new target, and kicks off processing. */
 void balloon_set_new_target(unsigned long target)
 {
@@ -605,61 +582,13 @@ static void __init balloon_add_region(unsigned long start_pfn,
 	}
 }
 
-static int alloc_balloon_scratch_page(int cpu)
-{
-	if (per_cpu(balloon_scratch_page, cpu) != NULL)
-		return 0;
-
-	per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL);
-	if (per_cpu(balloon_scratch_page, cpu) == NULL) {
-		pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", cpu);
-		return -ENOMEM;
-	}
-
-	return 0;
-}
-
-
-static int balloon_cpu_notify(struct notifier_block *self,
-				    unsigned long action, void *hcpu)
-{
-	int cpu = (long)hcpu;
-	switch (action) {
-	case CPU_UP_PREPARE:
-		if (alloc_balloon_scratch_page(cpu))
-			return NOTIFY_BAD;
-		break;
-	default:
-		break;
-	}
-	return NOTIFY_OK;
-}
-
-static struct notifier_block balloon_cpu_notifier = {
-	.notifier_call	= balloon_cpu_notify,
-};
-
 static int __init balloon_init(void)
 {
-	int i, cpu;
+	int i;
 
 	if (!xen_domain())
 		return -ENODEV;
 
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		register_cpu_notifier(&balloon_cpu_notifier);
-
-		get_online_cpus();
-		for_each_online_cpu(cpu) {
-			if (alloc_balloon_scratch_page(cpu)) {
-				put_online_cpus();
-				unregister_cpu_notifier(&balloon_cpu_notifier);
-				return -ENOMEM;
-			}
-		}
-		put_online_cpus();
-	}
-
 	pr_info("Initialising balloon driver\n");
 
 	balloon_stats.current_pages = xen_pv_domain()
@@ -696,15 +625,4 @@ static int __init balloon_init(void)
 
 subsys_initcall(balloon_init);
 
-static int __init balloon_clear(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu)
-		per_cpu(balloon_scratch_page, cpu) = NULL;
-
-	return 0;
-}
-early_initcall(balloon_clear);
-
 MODULE_LICENSE("GPL");
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/14] x86/xen: require ballooned pages for grant maps
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (3 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 04/14] xen: remove scratch frames for ballooned pages and m2p override David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 06/14] xen/grant-table: add helpers for allocating pages David Vrabel
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, David Vrabel

From: Jenny Herbert <jennifer.herbert@citrix.com>

Ballooned pages are always used for grant maps which means the
original frame does not need to be saved in page->index nor restored
after the grant unmap.

This allows the workaround in netback for the conflicting use of the
(unionized) page->index and page->pfmemalloc to be removed.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c                |    5 +++--
 drivers/net/xen-netback/netback.c |    6 ------
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index c9bc53f..a8691cb 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -682,9 +682,10 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 		pfn = page_to_pfn(pages[i]);
 
 		WARN_ON(PagePrivate(pages[i]));
+		WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be ballooned");
+
 		SetPagePrivate(pages[i]);
 		set_page_private(pages[i], mfn);
-		pages[i]->index = pfn_to_mfn(pfn);
 
 		if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)))) {
 			ret = -ENOMEM;
@@ -718,7 +719,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 		set_page_private(pages[i], INVALID_P2M_ENTRY);
 		WARN_ON(!PagePrivate(pages[i]));
 		ClearPagePrivate(pages[i]);
-		set_phys_to_machine(pfn, pages[i]->index);
+		set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 	}
 	if (kunmap_ops)
 		ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 908e65e..6441318 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1241,12 +1241,6 @@ static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 		/* Take an extra reference to offset network stack's put_page */
 		get_page(queue->mmap_pages[pending_idx]);
 	}
-	/* FIXME: __skb_fill_page_desc set this to true because page->pfmemalloc
-	 * overlaps with "index", and "mapping" is not set. I think mapping
-	 * should be set. If delivered to local stack, it would drop this
-	 * skb in sk_filter unless the socket has the right to use it.
-	 */
-	skb->pfmemalloc	= false;
 }
 
 static int xenvif_get_extras(struct xenvif_queue *queue,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/14] xen/grant-table: add helpers for allocating pages
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (4 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 05/14] x86/xen: require ballooned pages for grant maps David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 07/14] xen: mark grant mapped pages as foreign David Vrabel
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages
suitable to for granted maps.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/block/xen-blkback/blkback.c |    8 ++++----
 drivers/net/xen-netback/interface.c |    7 +++----
 drivers/xen/gntdev.c                |    4 ++--
 drivers/xen/grant-table.c           |   29 +++++++++++++++++++++++++++++
 drivers/xen/xen-scsiback.c          |    6 +++---
 include/xen/grant_table.h           |    3 +++
 6 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 63fc7f0..908e630 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -100,7 +100,7 @@ module_param(log_stats, int, 0644);
 
 #define BLKBACK_INVALID_HANDLE (~0)
 
-/* Number of free pages to remove on each call to free_xenballooned_pages */
+/* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
 static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
@@ -111,7 +111,7 @@ static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
 	if (list_empty(&blkif->free_pages)) {
 		BUG_ON(blkif->free_pages_num != 0);
 		spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
-		return alloc_xenballooned_pages(1, page, false);
+		return gnttab_alloc_pages(1, page);
 	}
 	BUG_ON(blkif->free_pages_num == 0);
 	page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
@@ -151,14 +151,14 @@ static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
 		blkif->free_pages_num--;
 		if (++num_pages == NUM_BATCH_FREE_PAGES) {
 			spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
-			free_xenballooned_pages(num_pages, page);
+			gnttab_free_pages(num_pages, page);
 			spin_lock_irqsave(&blkif->free_pages_lock, flags);
 			num_pages = 0;
 		}
 	}
 	spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
 	if (num_pages != 0)
-		free_xenballooned_pages(num_pages, page);
+		gnttab_free_pages(num_pages, page);
 }
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 9259a73..2e07f84 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -483,9 +483,8 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 	 * better enable it. The long term solution would be to use just a
 	 * bunch of valid page descriptors, without dependency on ballooning
 	 */
-	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
-				       queue->mmap_pages,
-				       false);
+	err = gnttab_alloc_pages(MAX_PENDING_REQS,
+				 queue->mmap_pages);
 	if (err) {
 		netdev_err(queue->vif->dev, "Could not reserve mmap_pages\n");
 		return -ENOMEM;
@@ -662,7 +661,7 @@ void xenvif_disconnect(struct xenvif *vif)
  */
 void xenvif_deinit_queue(struct xenvif_queue *queue)
 {
-	free_xenballooned_pages(MAX_PENDING_REQS, queue->mmap_pages);
+	gnttab_free_pages(MAX_PENDING_REQS, queue->mmap_pages);
 }
 
 void xenvif_free(struct xenvif *vif)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 32f6bfe..a28807a 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -119,7 +119,7 @@ static void gntdev_free_map(struct grant_map *map)
 		return;
 
 	if (map->pages)
-		free_xenballooned_pages(map->count, map->pages);
+		gnttab_free_pages(map->count, map->pages);
 	kfree(map->pages);
 	kfree(map->grants);
 	kfree(map->map_ops);
@@ -152,7 +152,7 @@ static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
 	    NULL == add->pages)
 		goto err;
 
-	if (alloc_xenballooned_pages(count, add->pages, false /* lowmem */))
+	if (gnttab_alloc_pages(count, add->pages))
 		goto err;
 
 	for (i = 0; i < count; i++) {
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 999d7ab..b4f93c4 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -50,6 +50,7 @@
 #include <xen/interface/memory.h>
 #include <xen/hvc-console.h>
 #include <xen/swiotlb-xen.h>
+#include <xen/balloon.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/interface.h>
 
@@ -671,6 +672,34 @@ void gnttab_free_auto_xlat_frames(void)
 }
 EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
 
+/**
+ * gnttab_alloc_pages - alloc pages suitable for grant mapping into
+ * @nr_pages: number of pages to alloc
+ * @pages: returns the pages
+ */
+int gnttab_alloc_pages(int nr_pages, struct page **pages)
+{
+	int ret;
+
+	ret = alloc_xenballooned_pages(nr_pages, pages, false);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+EXPORT_SYMBOL(gnttab_alloc_pages);
+
+/**
+ * gnttab_free_pages - free pages allocated by gnttab_alloc_pages()
+ * @nr_pages; number of pages to free
+ * @pages: the pages
+ */
+void gnttab_free_pages(int nr_pages, struct page **pages)
+{
+	free_xenballooned_pages(nr_pages, pages);
+}
+EXPORT_SYMBOL(gnttab_free_pages);
+
 /* Handling of paged out grant targets (GNTST_eagain) */
 #define MAX_DELAY 256
 static inline void
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index e999496e..ecd540a 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -227,7 +227,7 @@ static void put_free_pages(struct page **page, int num)
 		return;
 	if (i > scsiback_max_buffer_pages) {
 		n = min(num, i - scsiback_max_buffer_pages);
-		free_xenballooned_pages(n, page + num - n);
+		gnttab_free_pages(n, page + num - n);
 		n = num - n;
 	}
 	spin_lock_irqsave(&free_pages_lock, flags);
@@ -244,7 +244,7 @@ static int get_free_page(struct page **page)
 	spin_lock_irqsave(&free_pages_lock, flags);
 	if (list_empty(&scsiback_free_pages)) {
 		spin_unlock_irqrestore(&free_pages_lock, flags);
-		return alloc_xenballooned_pages(1, page, false);
+		return gnttab_alloc_pages(1, page);
 	}
 	page[0] = list_first_entry(&scsiback_free_pages, struct page, lru);
 	list_del(&page[0]->lru);
@@ -2106,7 +2106,7 @@ static void __exit scsiback_exit(void)
 	while (free_pages_num) {
 		if (get_free_page(&page))
 			BUG();
-		free_xenballooned_pages(1, &page);
+		gnttab_free_pages(1, &page);
 	}
 	scsiback_deregister_configfs();
 	xenbus_unregister_driver(&scsiback_driver);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 7235d8f..949803e 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -163,6 +163,9 @@ void gnttab_free_auto_xlat_frames(void);
 
 #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
 
+int gnttab_alloc_pages(int nr_pages, struct page **pages);
+void gnttab_free_pages(int nr_pages, struct page **pages);
+
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/14] xen: mark grant mapped pages as foreign
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (5 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 06/14] xen/grant-table: add helpers for allocating pages David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 08/14] xen-netback: use foreign page information from the pages themselves David Vrabel
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, David Vrabel, Jenny Herbert

From: Jenny Herbert <jennifer.herbert@citrix.com>

Use the "foreign" page flag to mark pages that have a grant map.  Use
page->private to store information of the grant (the granting domain
and the grant reference).

Signed-off-by: Jenny Herbert <jenny.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/p2m.c        |    7 -------
 drivers/xen/grant-table.c |   43 +++++++++++++++++++++++++++++++++++++++++--
 include/xen/grant_table.h |   20 ++++++++++++++++++++
 3 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index a8691cb..f18fd1d 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -681,12 +681,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
 		}
 		pfn = page_to_pfn(pages[i]);
 
-		WARN_ON(PagePrivate(pages[i]));
 		WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be ballooned");
 
-		SetPagePrivate(pages[i]);
-		set_page_private(pages[i], mfn);
-
 		if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)))) {
 			ret = -ENOMEM;
 			goto out;
@@ -716,9 +712,6 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 			goto out;
 		}
 
-		set_page_private(pages[i], INVALID_P2M_ENTRY);
-		WARN_ON(!PagePrivate(pages[i]));
-		ClearPagePrivate(pages[i]);
 		set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 	}
 	if (kunmap_ops)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index b4f93c4..89dcca4 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -679,12 +679,27 @@ EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
  */
 int gnttab_alloc_pages(int nr_pages, struct page **pages)
 {
+	int i;
 	int ret;
 
 	ret = alloc_xenballooned_pages(nr_pages, pages, false);
 	if (ret < 0)
 		return ret;
 
+	for (i = 0; i < nr_pages; i++) {
+#if BITS_PER_LONG < 64
+		struct xen_page_foreign *foreign;
+
+		foreign = kzalloc(sizeof(*foreign), GFP_KERNEL);
+		if (!foreign) {
+			gnttab_free_pages(nr_pages, pages);
+			return -ENOMEM;
+		}
+		set_page_private(pages[i], (unsigned long)foreign);
+#endif
+		SetPagePrivate(pages[i]);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(gnttab_alloc_pages);
@@ -696,6 +711,16 @@ EXPORT_SYMBOL(gnttab_alloc_pages);
  */
 void gnttab_free_pages(int nr_pages, struct page **pages)
 {
+	int i;
+
+	for (i = 0; i < nr_pages; i++) {
+		if (PagePrivate(pages[i])) {
+#if BITS_PER_LONG < 64
+			kfree((void *)page_private(pages[i]));
+#endif
+			ClearPagePrivate(pages[i]);
+		}
+	}
 	free_xenballooned_pages(nr_pages, pages);
 }
 EXPORT_SYMBOL(gnttab_free_pages);
@@ -756,12 +781,22 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 	if (ret)
 		return ret;
 
-	/* Retry eagain maps */
-	for (i = 0; i < count; i++)
+	for (i = 0; i < count; i++) {
+		/* Retry eagain maps */
 		if (map_ops[i].status == GNTST_eagain)
 			gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, map_ops + i,
 						&map_ops[i].status, __func__);
 
+		if (map_ops[i].status == GNTST_okay) {
+			struct xen_page_foreign *foreign;
+
+			SetPageForeign(pages[i]);
+			foreign = xen_page_foreign(pages[i]);
+			foreign->domid = map_ops[i].dom;
+			foreign->gref = map_ops[i].ref;
+		}
+	}
+
 	return set_foreign_p2m_mapping(map_ops, kmap_ops, pages, count);
 }
 EXPORT_SYMBOL_GPL(gnttab_map_refs);
@@ -770,12 +805,16 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 		      struct gnttab_unmap_grant_ref *kunmap_ops,
 		      struct page **pages, unsigned int count)
 {
+	unsigned int i;
 	int ret;
 
 	ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count);
 	if (ret)
 		return ret;
 
+	for (i = 0; i < count; i++)
+		ClearPageForeign(pages[i]);
+
 	return clear_foreign_p2m_mapping(unmap_ops, kunmap_ops, pages, count);
 }
 EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 949803e..d3bef56 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,6 +45,8 @@
 #include <asm/xen/hypervisor.h>
 
 #include <xen/features.h>
+#include <linux/mm_types.h>
+#include <linux/page-flags.h>
 
 #define GNTTAB_RESERVED_XENSTORE 1
 
@@ -185,4 +187,22 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
 void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
 
+
+struct xen_page_foreign {
+	domid_t domid;
+	grant_ref_t gref;
+};
+
+static inline struct xen_page_foreign *xen_page_foreign(struct page *page)
+{
+	if (!PageForeign(page))
+		return NULL;
+#if BITS_PER_LONG < 64
+	return (struct xen_page_foreign *)page->private;
+#else
+	BUILD_BUG_ON(sizeof(struct xen_page_foreign) > BITS_PER_LONG);
+	return (struct xen_page_foreign *)&page->private;
+#endif
+}
+
 #endif /* __ASM_GNTTAB_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/14] xen-netback: use foreign page information from the pages themselves
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (6 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 07/14] xen: mark grant mapped pages as foreign David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 09/14] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, David Vrabel

From: Jenny Herbert <jennifer.herbert@citrix.com>

Use the foreign page flag in netback to get the domid and grant ref
needed for the grant copy.  This signficiantly simplifies the netback
code and makes netback work with foreign pages from other backends
(e.g., blkback).

This allows blkback to use iSCSI disks provided by domUs running on
the same host.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/netback.c |  100 ++++---------------------------------
 1 file changed, 9 insertions(+), 91 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 6441318..ae3ab37 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -314,9 +314,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb,
 				 struct netrx_pending_operations *npo,
 				 struct page *page, unsigned long size,
-				 unsigned long offset, int *head,
-				 struct xenvif_queue *foreign_queue,
-				 grant_ref_t foreign_gref)
+				 unsigned long offset, int *head)
 {
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
@@ -333,6 +331,8 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 	offset &= ~PAGE_MASK;
 
 	while (size > 0) {
+		struct xen_page_foreign *foreign;
+
 		BUG_ON(offset >= PAGE_SIZE);
 		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
 
@@ -361,9 +361,10 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		copy_gop->flags = GNTCOPY_dest_gref;
 		copy_gop->len = bytes;
 
-		if (foreign_queue) {
-			copy_gop->source.domid = foreign_queue->vif->domid;
-			copy_gop->source.u.ref = foreign_gref;
+		foreign = xen_page_foreign(page);
+		if (foreign) {
+			copy_gop->source.domid = foreign->domid;
+			copy_gop->source.u.ref = foreign->gref;
 			copy_gop->flags |= GNTCOPY_source_gref;
 		} else {
 			copy_gop->source.domid = DOMID_SELF;
@@ -406,35 +407,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 }
 
 /*
- * Find the grant ref for a given frag in a chain of struct ubuf_info's
- * skb: the skb itself
- * i: the frag's number
- * ubuf: a pointer to an element in the chain. It should not be NULL
- *
- * Returns a pointer to the element in the chain where the page were found. If
- * not found, returns NULL.
- * See the definition of callback_struct in common.h for more details about
- * the chain.
- */
-static const struct ubuf_info *xenvif_find_gref(const struct sk_buff *const skb,
-						const int i,
-						const struct ubuf_info *ubuf)
-{
-	struct xenvif_queue *foreign_queue = ubuf_to_queue(ubuf);
-
-	do {
-		u16 pending_idx = ubuf->desc;
-
-		if (skb_shinfo(skb)->frags[i].page.p ==
-		    foreign_queue->mmap_pages[pending_idx])
-			break;
-		ubuf = (struct ubuf_info *) ubuf->ctx;
-	} while (ubuf);
-
-	return ubuf;
-}
-
-/*
  * Prepare an SKB to be transmitted to the frontend.
  *
  * This function is responsible for allocating grant operations, meta
@@ -459,8 +431,6 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	int head = 1;
 	int old_meta_prod;
 	int gso_type;
-	const struct ubuf_info *ubuf = skb_shinfo(skb)->destructor_arg;
-	const struct ubuf_info *const head_ubuf = ubuf;
 
 	old_meta_prod = npo->meta_prod;
 
@@ -507,68 +477,16 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 			len = skb_tail_pointer(skb) - data;
 
 		xenvif_gop_frag_copy(queue, skb, npo,
-				     virt_to_page(data), len, offset, &head,
-				     NULL,
-				     0);
+				     virt_to_page(data), len, offset, &head);
 		data += len;
 	}
 
 	for (i = 0; i < nr_frags; i++) {
-		/* This variable also signals whether foreign_gref has a real
-		 * value or not.
-		 */
-		struct xenvif_queue *foreign_queue = NULL;
-		grant_ref_t foreign_gref;
-
-		if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) &&
-			(ubuf->callback == &xenvif_zerocopy_callback)) {
-			const struct ubuf_info *const startpoint = ubuf;
-
-			/* Ideally ubuf points to the chain element which
-			 * belongs to this frag. Or if frags were removed from
-			 * the beginning, then shortly before it.
-			 */
-			ubuf = xenvif_find_gref(skb, i, ubuf);
-
-			/* Try again from the beginning of the list, if we
-			 * haven't tried from there. This only makes sense in
-			 * the unlikely event of reordering the original frags.
-			 * For injected local pages it's an unnecessary second
-			 * run.
-			 */
-			if (unlikely(!ubuf) && startpoint != head_ubuf)
-				ubuf = xenvif_find_gref(skb, i, head_ubuf);
-
-			if (likely(ubuf)) {
-				u16 pending_idx = ubuf->desc;
-
-				foreign_queue = ubuf_to_queue(ubuf);
-				foreign_gref =
-					foreign_queue->pending_tx_info[pending_idx].req.gref;
-				/* Just a safety measure. If this was the last
-				 * element on the list, the for loop will
-				 * iterate again if a local page were added to
-				 * the end. Using head_ubuf here prevents the
-				 * second search on the chain. Or the original
-				 * frags changed order, but that's less likely.
-				 * In any way, ubuf shouldn't be NULL.
-				 */
-				ubuf = ubuf->ctx ?
-					(struct ubuf_info *) ubuf->ctx :
-					head_ubuf;
-			} else
-				/* This frag was a local page, added to the
-				 * array after the skb left netback.
-				 */
-				ubuf = head_ubuf;
-		}
 		xenvif_gop_frag_copy(queue, skb, npo,
 				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
 				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
 				     skb_shinfo(skb)->frags[i].page_offset,
-				     &head,
-				     foreign_queue,
-				     foreign_queue ? foreign_gref : UINT_MAX);
+				     &head);
 	}
 
 	return npo->meta_prod - old_meta_prod;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/14] xen/grant-table: add a mechanism to safely unmap pages that are in use
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (7 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 08/14] xen-netback: use foreign page information from the pages themselves David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex David Vrabel
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, David Vrabel

From: Jenny Herbert <jennifer.herbert@citrix.com>

Introduce gnttab_unmap_refs_async() that can be used to safely unmap
pages that may be in use (ref count > 1).  If the pages are in use the
unmap is deferred and retried later.  This polling is not very clever
but it should be good enough if the cases where the delay is necessary
are rare.

The initial delay is 5 ms and is increased linearly on each subsequent
retry (to reduce load if the page is in use for a long time).

This is needed to allow block backends using grant mapping to safely
use network storage (block or filesystem based such as iSCSI or NFS).

The network storage driver may complete a block request whilst there
is a queued network packet retry (because the ack from the remote end
races with deciding to queue the retry).  The pages for the retried
packet would be grant unmapped and the network driver (or hardware)
would access the unmapped page.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/grant-table.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 include/xen/grant_table.h |   18 ++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 89dcca4..17972fb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -42,6 +42,7 @@
 #include <linux/io.h>
 #include <linux/delay.h>
 #include <linux/hardirq.h>
+#include <linux/workqueue.h>
 
 #include <xen/xen.h>
 #include <xen/interface/xen.h>
@@ -819,6 +820,49 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 }
 EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
 
+#define GNTTAB_UNMAP_REFS_DELAY 5
+
+static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
+
+static void gnttab_unmap_work(struct work_struct *work)
+{
+	struct gntab_unmap_queue_data
+		*unmap_data = container_of(work, 
+					   struct gntab_unmap_queue_data,
+					   gnttab_work.work);
+	if (unmap_data->age != UINT_MAX)
+		unmap_data->age++;
+	__gnttab_unmap_refs_async(unmap_data);
+}
+
+static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
+{
+	int ret;
+	int pc;
+
+	for (pc = 0; pc < item->count; pc++) {
+		if (page_count(item->pages[pc]) > 1) {
+			unsigned long delay = GNTTAB_UNMAP_REFS_DELAY * (item->age + 1);
+			schedule_delayed_work(&item->gnttab_work,
+					      msecs_to_jiffies(delay));
+			return;
+		}
+	}
+
+	ret = gnttab_unmap_refs(item->unmap_ops, item->kunmap_ops,
+				item->pages, item->count);
+	item->done(ret, item);
+}
+
+void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
+{
+	INIT_DELAYED_WORK(&item->gnttab_work, gnttab_unmap_work);
+	item->age = 0;
+
+	__gnttab_unmap_refs_async(item);
+}
+EXPORT_SYMBOL_GPL(gnttab_unmap_refs_async);
+
 static int gnttab_map_frames_v1(xen_pfn_t *frames, unsigned int nr_gframes)
 {
 	int rc;
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index d3bef56..143ca5f 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -60,6 +60,22 @@ struct gnttab_free_callback {
 	u16 count;
 };
 
+struct gntab_unmap_queue_data;
+
+typedef void (*gnttab_unmap_refs_done)(int result, struct gntab_unmap_queue_data *data);
+
+struct gntab_unmap_queue_data
+{
+	struct delayed_work	gnttab_work;
+	void *data;
+	gnttab_unmap_refs_done	done;
+	struct gnttab_unmap_grant_ref *unmap_ops;
+	struct gnttab_unmap_grant_ref *kunmap_ops;
+	struct page **pages;
+	unsigned int count;
+	unsigned int age;
+};
+
 int gnttab_init(void);
 int gnttab_suspend(void);
 int gnttab_resume(void);
@@ -174,6 +190,8 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 		      struct gnttab_unmap_grant_ref *kunmap_ops,
 		      struct page **pages, unsigned int count);
+void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
+
 
 /* Perform a batch of grant map/copy operations. Retry every batch slot
  * for which the hypervisor returns GNTST_eagain. This is typically due
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (8 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 09/14] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 17:49   ` Stefano Stabellini
  2015-01-19 15:51 ` [PATCH 11/14] xen/gntdev: safely unmap grants in case they are still in use David Vrabel
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

Unmapping may require sleeping and we unmap while holding priv->lock, so
convert it to a mutex.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/gntdev.c |   40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index a28807a..c1a03fa 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -67,7 +67,7 @@ struct gntdev_priv {
 	 * Only populated if populate_freeable_maps == 1 */
 	struct list_head freeable_maps;
 	/* lock protects maps and freeable_maps */
-	spinlock_t lock;
+	struct mutex lock;
 	struct mm_struct *mm;
 	struct mmu_notifier mn;
 };
@@ -221,9 +221,9 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
 	}
 
 	if (populate_freeable_maps && priv) {
-		spin_lock(&priv->lock);
+		mutex_lock(&priv->lock);
 		list_del(&map->next);
-		spin_unlock(&priv->lock);
+		mutex_unlock(&priv->lock);
 	}
 
 	if (map->pages && !use_ptemod)
@@ -397,9 +397,9 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
 		 * not do any unmapping, since that has been done prior to
 		 * closing the vma, but it may still iterate the unmap_ops list.
 		 */
-		spin_lock(&priv->lock);
+		mutex_lock(&priv->lock);
 		map->vma = NULL;
-		spin_unlock(&priv->lock);
+		mutex_unlock(&priv->lock);
 	}
 	vma->vm_private_data = NULL;
 	gntdev_put_map(priv, map);
@@ -443,14 +443,14 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 	struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
 	struct grant_map *map;
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	list_for_each_entry(map, &priv->maps, next) {
 		unmap_if_in_range(map, start, end);
 	}
 	list_for_each_entry(map, &priv->freeable_maps, next) {
 		unmap_if_in_range(map, start, end);
 	}
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 }
 
 static void mn_invl_page(struct mmu_notifier *mn,
@@ -467,7 +467,7 @@ static void mn_release(struct mmu_notifier *mn,
 	struct grant_map *map;
 	int err;
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -486,7 +486,7 @@ static void mn_release(struct mmu_notifier *mn,
 		err = unmap_grant_pages(map, /* offset */ 0, map->count);
 		WARN_ON(err);
 	}
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 }
 
 static struct mmu_notifier_ops gntdev_mmu_ops = {
@@ -508,7 +508,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
 
 	INIT_LIST_HEAD(&priv->maps);
 	INIT_LIST_HEAD(&priv->freeable_maps);
-	spin_lock_init(&priv->lock);
+	mutex_init(&priv->lock);
 
 	if (use_ptemod) {
 		priv->mm = get_task_mm(current);
@@ -582,10 +582,10 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
 		return -EFAULT;
 	}
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	gntdev_add_map(priv, map);
 	op.index = map->index << PAGE_SHIFT;
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 
 	if (copy_to_user(u, &op, sizeof(op)) != 0)
 		return -EFAULT;
@@ -604,7 +604,7 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
 		return -EFAULT;
 	pr_debug("priv %p, del %d+%d\n", priv, (int)op.index, (int)op.count);
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
 	if (map) {
 		list_del(&map->next);
@@ -612,7 +612,7 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
 			list_add_tail(&map->next, &priv->freeable_maps);
 		err = 0;
 	}
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 	if (map)
 		gntdev_put_map(priv, map);
 	return err;
@@ -680,7 +680,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
 	out_flags = op.action;
 	out_event = op.event_channel_port;
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 
 	list_for_each_entry(map, &priv->maps, next) {
 		uint64_t begin = map->index << PAGE_SHIFT;
@@ -708,7 +708,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
 	rc = 0;
 
  unlock_out:
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 
 	/* Drop the reference to the event channel we did not save in the map */
 	if (out_flags & UNMAP_NOTIFY_SEND_EVENT)
@@ -758,7 +758,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 	pr_debug("map %d+%d at %lx (pgoff %lx)\n",
 			index, count, vma->vm_start, vma->vm_pgoff);
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	map = gntdev_find_map_index(priv, index, count);
 	if (!map)
 		goto unlock_out;
@@ -793,7 +793,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 			map->flags |= GNTMAP_readonly;
 	}
 
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 
 	if (use_ptemod) {
 		err = apply_to_page_range(vma->vm_mm, vma->vm_start,
@@ -821,11 +821,11 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 	return 0;
 
 unlock_out:
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 	return err;
 
 out_unlock_put:
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 out_put_map:
 	if (use_ptemod)
 		map->vma = NULL;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/14] xen/gntdev: safely unmap grants in case they are still in use
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (9 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 12/14] xen-blkback: " David Vrabel
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, David Vrabel

From: Jenny Herbert <jennifer.herbert@citrix.com>

Use gnttab_unmap_refs_async() to wait until the mapped pages are no
longer in use before unmapping them.

This allows userspace programs to safely use Direct I/O and AIO to a
network filesystem which may retain refs to pages in queued skbs after
the filesystem I/O has completed.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/gntdev.c |   36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index c1a03fa..494bd06 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -311,9 +311,30 @@ static int map_grant_pages(struct grant_map *map)
 	return err;
 }
 
+struct unmap_grant_pages_callback_data
+{
+	struct completion completion;
+	int result;
+};
+
+static void unmap_grant_callback(int result,
+				 struct gntab_unmap_queue_data *data)
+{
+	struct unmap_grant_pages_callback_data* d = data->data;
+
+	d->result = result;
+	complete(&d->completion);
+}
+
 static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
 {
 	int i, err = 0;
+	struct gntab_unmap_queue_data unmap_data;
+	struct unmap_grant_pages_callback_data data;
+
+	init_completion(&data.completion);
+	unmap_data.data = &data;
+	unmap_data.done= &unmap_grant_callback;
 
 	if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) {
 		int pgno = (map->notify.addr >> PAGE_SHIFT);
@@ -325,11 +346,16 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
 		}
 	}
 
-	err = gnttab_unmap_refs(map->unmap_ops + offset,
-			use_ptemod ? map->kunmap_ops + offset : NULL, map->pages + offset,
-			pages);
-	if (err)
-		return err;
+	unmap_data.unmap_ops = map->unmap_ops + offset;
+	unmap_data.kunmap_ops = use_ptemod ? map->kunmap_ops + offset : NULL;
+	unmap_data.pages = map->pages + offset;
+	unmap_data.count = pages;
+
+	gnttab_unmap_refs_async(&unmap_data);
+
+	wait_for_completion(&data.completion);
+	if (data.result)
+		return data.result;
 
 	for (i = 0; i < pages; i++) {
 		if (map->unmap_ops[offset+i].status)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (10 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 11/14] xen/gntdev: safely unmap grants in case they are still in use David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-23 12:02   ` David Vrabel
                     ` (2 more replies)
  2015-01-19 15:51 ` [PATCH 13/14] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
  2015-01-19 15:51 ` [PATCH 14/14] xen/gntdev: provide find_special_page VMA operation David Vrabel
  13 siblings, 3 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, David Vrabel

From: Jenny Herbert <jennifer.herbert@citrix.com>

Use gnttab_unmap_refs_async() to wait until the mapped pages are no
longer in use before unmapping them.

This allows blkback to use network storage which may retain refs to
pages in queued skbs after the block I/O has completed.

Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/block/xen-blkback/blkback.c |  175 +++++++++++++++++++++++++----------
 drivers/block/xen-blkback/common.h  |    3 +
 2 files changed, 127 insertions(+), 51 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 908e630..39a7aba 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -47,6 +47,7 @@
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 #include <xen/balloon.h>
+#include <xen/grant_table.h>
 #include "common.h"
 
 /*
@@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif,
 	atomic_dec(&blkif->persistent_gnt_in_use);
 }
 
+static void free_persistent_gnts_unmap_callback(int result,
+						struct gntab_unmap_queue_data *data)
+{
+	struct completion* c = data->data;
+
+	/* BUG_ON used to reproduce existing behaviour,
+	   but is this the best way to deal with this? */
+	BUG_ON(result);
+	complete(c);
+}
+
 static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
                                  unsigned int num)
 {
@@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	struct persistent_gnt *persistent_gnt;
 	struct rb_node *n;
-	int ret = 0;
 	int segs_to_unmap = 0;
+	struct gntab_unmap_queue_data unmap_data;
+	struct completion unmap_completion;
+
+	init_completion(&unmap_completion);
+
+	unmap_data.data = &unmap_completion;
+	unmap_data.done = &free_persistent_gnts_unmap_callback;
+	unmap_data.pages = pages;
+	unmap_data.unmap_ops = unmap;
+	unmap_data.kunmap_ops = NULL;
 
 	foreach_grant_safe(persistent_gnt, n, root, node) {
 		BUG_ON(persistent_gnt->handle ==
@@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 
 		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
 			!rb_next(&persistent_gnt->node)) {
-			ret = gnttab_unmap_refs(unmap, NULL, pages,
-				segs_to_unmap);
-			BUG_ON(ret);
+
+			unmap_data.count = segs_to_unmap;
+			gnttab_unmap_refs_async(&unmap_data);
+			wait_for_completion(&unmap_completion);
+
 			put_free_pages(blkif, pages, segs_to_unmap);
 			segs_to_unmap = 0;
 		}
@@ -653,18 +676,14 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif)
 	shrink_free_pagepool(blkif, 0 /* All */);
 }
 
-/*
- * Unmap the grant references, and also remove the M2P over-rides
- * used in the 'pending_req'.
- */
-static void xen_blkbk_unmap(struct xen_blkif *blkif,
-                            struct grant_page *pages[],
-                            int num)
+static unsigned int xen_blkbk_unmap_prepare(
+	struct xen_blkif *blkif,
+	struct grant_page **pages,
+	unsigned int num,
+	struct gnttab_unmap_grant_ref *unmap_ops,
+	struct page **unmap_pages)
 {
-	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
-	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	unsigned int i, invcount = 0;
-	int ret;
 
 	for (i = 0; i < num; i++) {
 		if (pages[i]->persistent_gnt != NULL) {
@@ -674,21 +693,99 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
 		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
 			continue;
 		unmap_pages[invcount] = pages[i]->page;
-		gnttab_set_unmap_op(&unmap[invcount], vaddr(pages[i]->page),
+		gnttab_set_unmap_op(&unmap_ops[invcount], vaddr(pages[invcount]->page),
 				    GNTMAP_host_map, pages[i]->handle);
 		pages[i]->handle = BLKBACK_INVALID_HANDLE;
-		if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
-			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
-			                        invcount);
+		invcount++;
+       }
+
+       return invcount;
+}
+
+static void xen_blkbk_unmap_done(struct xen_blkif *blkif,
+				 struct page *unmap_pages[],
+				 unsigned int num)
+{
+	put_free_pages(blkif, unmap_pages, num);
+}
+
+static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
+{
+	struct pending_req* pending_req = (struct pending_req*) (data->data);
+	struct xen_blkif *blkif = pending_req->blkif;
+
+	/* BUG_ON used to reproduce existing behaviour,
+	   but is this the best way to deal with this? */
+	BUG_ON(result);
+
+	xen_blkbk_unmap_done(blkif, data->pages, data->count);
+	make_response(blkif, pending_req->id,
+		      pending_req->operation, pending_req->status);
+	free_req(blkif, pending_req);
+	/*
+	 * Make sure the request is freed before releasing blkif,
+	 * or there could be a race between free_req and the
+	 * cleanup done in xen_blkif_free during shutdown.
+	 *
+	 * NB: The fact that we might try to wake up pending_free_wq
+	 * before drain_complete (in case there's a drain going on)
+	 * it's not a problem with our current implementation
+	 * because we can assure there's no thread waiting on
+	 * pending_free_wq if there's a drain going on, but it has
+	 * to be taken into account if the current model is changed.
+	 */
+	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
+		complete(&blkif->drain_complete);
+	}
+	xen_blkif_put(blkif);
+}
+
+static void xen_blkbk_unmap_and_respond(struct pending_req *req)
+{
+	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
+	struct xen_blkif *blkif = req->blkif;
+	struct grant_page **pages = req->segments;
+	unsigned int invcount;
+
+	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_pages,
+					   req->unmap, req->unmap_pages);
+
+	work->data = req;
+	work->done = xen_blkbk_unmap_and_respond_callback;
+	work->unmap_ops = req->unmap;
+	work->kunmap_ops = NULL;
+	work->pages = req->unmap_pages;
+	work->count = invcount;
+
+	gnttab_unmap_refs_async(&req->gnttab_unmap_data);
+}
+
+
+/*
+ * Unmap the grant references, and also remove the M2P over-rides
+ * used in the 'pending_req'.
+ */
+static void xen_blkbk_unmap(struct xen_blkif *blkif,
+                            struct grant_page *pages[],
+                            int num)
+{
+	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+	unsigned int invcount = 0;
+	int ret;
+
+	while (num) {
+		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+		
+		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
+						   unmap, unmap_pages);
+		if (invcount) {
+			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
 			BUG_ON(ret);
-			put_free_pages(blkif, unmap_pages, invcount);
-			invcount = 0;
+			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
 		}
-	}
-	if (invcount) {
-		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
-		BUG_ON(ret);
-		put_free_pages(blkif, unmap_pages, invcount);
+		pages += batch;
+		num -= batch;
 	}
 }
 
@@ -982,32 +1079,8 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 	 * the grant references associated with 'request' and provide
 	 * the proper response on the ring.
 	 */
-	if (atomic_dec_and_test(&pending_req->pendcnt)) {
-		struct xen_blkif *blkif = pending_req->blkif;
-
-		xen_blkbk_unmap(blkif,
-		                pending_req->segments,
-		                pending_req->nr_pages);
-		make_response(blkif, pending_req->id,
-			      pending_req->operation, pending_req->status);
-		free_req(blkif, pending_req);
-		/*
-		 * Make sure the request is freed before releasing blkif,
-		 * or there could be a race between free_req and the
-		 * cleanup done in xen_blkif_free during shutdown.
-		 *
-		 * NB: The fact that we might try to wake up pending_free_wq
-		 * before drain_complete (in case there's a drain going on)
-		 * it's not a problem with our current implementation
-		 * because we can assure there's no thread waiting on
-		 * pending_free_wq if there's a drain going on, but it has
-		 * to be taken into account if the current model is changed.
-		 */
-		if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
-			complete(&blkif->drain_complete);
-		}
-		xen_blkif_put(blkif);
-	}
+	if (atomic_dec_and_test(&pending_req->pendcnt))
+		xen_blkbk_unmap_and_respond(pending_req);
 }
 
 /*
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index f65b807..428a34d 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -350,6 +350,9 @@ struct pending_req {
 	struct grant_page	*indirect_pages[MAX_INDIRECT_PAGES];
 	struct seg_buf		seg[MAX_INDIRECT_SEGMENTS];
 	struct bio		*biolist[MAX_INDIRECT_SEGMENTS];
+	struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS];
+	struct page *unmap_pages[MAX_INDIRECT_SEGMENTS];
+	struct gntab_unmap_queue_data gnttab_unmap_data;
 };
 
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/14] xen/gntdev: mark userspace PTEs as special on x86 PV guests
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (11 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 12/14] xen-blkback: " David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  2015-01-19 15:51 ` [PATCH 14/14] xen/gntdev: provide find_special_page VMA operation David Vrabel
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

In an x86 PV guest, get_user_pages_fast() on a userspace address range
containing foreign mappings does not work correctly because the M2P
lookup of the MFN from a userspace PTE may return the wrong page.

Force get_user_pages_fast() to fail on such addresses by marking the PTEs
as special.

If Xen has XENFEAT_gnttab_map_avail_bits (available since at least
4.0), we can do so efficiently in the grant map hypercall.  Otherwise,
it needs to be done afterwards.  This is both inefficient and racy
(the mapping is visible to the task before we fixup the PTEs), but
will be fine for well-behaved applications that do not use the mapping
until after the mmap() system call returns.

Guests with XENFEAT_auto_translated_physmap (ARM and x86 HVM or PVH)
do not need this since get_user_pages() has always worked correctly
for them.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/gntdev.c                |   32 ++++++++++++++++++++++++++++++--
 include/xen/interface/features.h    |    6 ++++++
 include/xen/interface/grant_table.h |    7 +++++++
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 494bd06..8ea10eb 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -244,11 +244,24 @@ static int find_grant_ptes(pte_t *pte, pgtable_t token,
 	BUG_ON(pgnr >= map->count);
 	pte_maddr = arbitrary_virt_to_machine(pte).maddr;
 
+	/*
+	 * Set the PTE as special to force get_user_pages_fast() fall
+	 * back to the slow path.  If this is not supported as part of
+	 * the grant map, it will be done afterwards.
+	 */
+	if (xen_feature(XENFEAT_gnttab_map_avail_bits))
+		flags |= (1 << _GNTMAP_guest_avail0);
+
 	gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr, flags,
 			  map->grants[pgnr].ref,
 			  map->grants[pgnr].domid);
-	gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr, flags,
-			    -1 /* handle */);
+	return 0;
+}
+
+static int set_grant_ptes_as_special(pte_t *pte, pgtable_t token,
+				     unsigned long addr, void *data)
+{
+	set_pte_at(current->mm, addr, pte, pte_mkspecial(*pte));
 	return 0;
 }
 
@@ -842,6 +855,21 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 			if (err)
 				goto out_put_map;
 		}
+	} else {
+		/*
+		 * If the PTEs were not made special by the grant map
+		 * hypercall, do so here.
+		 *
+		 * This is racy since the mapping is already visible
+		 * to userspace but userspace should be well-behaved
+		 * enough to not touch it until the mmap() call
+		 * returns.
+		 */
+		if (!xen_feature(XENFEAT_gnttab_map_avail_bits)) {
+			apply_to_page_range(vma->vm_mm, vma->vm_start,
+					    vma->vm_end - vma->vm_start,
+					    set_grant_ptes_as_special, NULL);
+		}
 	}
 
 	return 0;
diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h
index 131a6cc..6ad3d11 100644
--- a/include/xen/interface/features.h
+++ b/include/xen/interface/features.h
@@ -41,6 +41,12 @@
 /* x86: Does this Xen host support the MMU_PT_UPDATE_PRESERVE_AD hypercall? */
 #define XENFEAT_mmu_pt_update_preserve_ad  5
 
+/*
+ * If set, GNTTABOP_map_grant_ref honors flags to be placed into guest kernel
+ * available pte bits.
+ */
+#define XENFEAT_gnttab_map_avail_bits      7
+
 /* x86: Does this Xen host support the HVM callback vector type? */
 #define XENFEAT_hvm_callback_vector        8
 
diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h
index bcce564..56806bc 100644
--- a/include/xen/interface/grant_table.h
+++ b/include/xen/interface/grant_table.h
@@ -526,6 +526,13 @@ DEFINE_GUEST_HANDLE_STRUCT(gnttab_cache_flush);
 #define GNTMAP_contains_pte     (1<<_GNTMAP_contains_pte)
 
 /*
+ * Bits to be placed in guest kernel available PTE bits (architecture
+ * dependent; only supported when XENFEAT_gnttab_map_avail_bits is set).
+ */
+#define _GNTMAP_guest_avail0    (16)
+#define GNTMAP_guest_avail_mask ((uint32_t)~0 << _GNTMAP_guest_avail0)
+
+/*
  * Values for error status returns. All errors are -ve.
  */
 #define GNTST_okay             (0)  /* Normal return.                        */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/14] xen/gntdev: provide find_special_page VMA operation
  2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
                   ` (12 preceding siblings ...)
  2015-01-19 15:51 ` [PATCH 13/14] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
@ 2015-01-19 15:51 ` David Vrabel
  13 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 15:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

For a PV guest, use the find_special_page op to find the right page.
To handle VMAs being split, remember the start of the original VMA so
the correct index in the pages array can be calculated.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/gntdev.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 8ea10eb..826e731 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -93,6 +93,7 @@ struct grant_map {
 	struct gnttab_map_grant_ref   *kmap_ops;
 	struct gnttab_unmap_grant_ref *kunmap_ops;
 	struct page **pages;
+	unsigned long pages_vm_start;
 };
 
 static int unmap_grant_pages(struct grant_map *map, int offset, int pages);
@@ -444,9 +445,18 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
 	gntdev_put_map(priv, map);
 }
 
+static struct page *gntdev_vma_find_special_page(struct vm_area_struct *vma,
+						 unsigned long addr)
+{
+	struct grant_map *map = vma->vm_private_data;
+
+	return map->pages[(addr - map->pages_vm_start) >> PAGE_SHIFT];
+}
+
 static struct vm_operations_struct gntdev_vmops = {
 	.open = gntdev_vma_open,
 	.close = gntdev_vma_close,
+	.find_special_page = gntdev_vma_find_special_page,
 };
 
 /* ------------------------------------------------------------------ */
@@ -870,6 +880,8 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 					    vma->vm_end - vma->vm_start,
 					    set_grant_ptes_as_special, NULL);
 		}
+
+		map->pages_vm_start = vma->vm_start;
 	}
 
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex
  2015-01-19 15:51 ` [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex David Vrabel
@ 2015-01-19 17:49   ` Stefano Stabellini
  2015-01-19 17:53     ` David Vrabel
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2015-01-19 17:49 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Boris Ostrovsky

On Mon, 19 Jan 2015, David Vrabel wrote:
> Unmapping may require sleeping and we unmap while holding priv->lock, so
> convert it to a mutex.

It would be useful to list in the commit message the operations that
might sleep and are currently called with the spinlock held.


> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/xen/gntdev.c |   40 ++++++++++++++++++++--------------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index a28807a..c1a03fa 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -67,7 +67,7 @@ struct gntdev_priv {
>  	 * Only populated if populate_freeable_maps == 1 */
>  	struct list_head freeable_maps;
>  	/* lock protects maps and freeable_maps */
> -	spinlock_t lock;
> +	struct mutex lock;
>  	struct mm_struct *mm;
>  	struct mmu_notifier mn;
>  };
> @@ -221,9 +221,9 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
>  	}
>  
>  	if (populate_freeable_maps && priv) {
> -		spin_lock(&priv->lock);
> +		mutex_lock(&priv->lock);
>  		list_del(&map->next);
> -		spin_unlock(&priv->lock);
> +		mutex_unlock(&priv->lock);
>  	}
>  
>  	if (map->pages && !use_ptemod)
> @@ -397,9 +397,9 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
>  		 * not do any unmapping, since that has been done prior to
>  		 * closing the vma, but it may still iterate the unmap_ops list.
>  		 */
> -		spin_lock(&priv->lock);
> +		mutex_lock(&priv->lock);
>  		map->vma = NULL;
> -		spin_unlock(&priv->lock);
> +		mutex_unlock(&priv->lock);
>  	}
>  	vma->vm_private_data = NULL;
>  	gntdev_put_map(priv, map);
> @@ -443,14 +443,14 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
>  	struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
>  	struct grant_map *map;
>  
> -	spin_lock(&priv->lock);
> +	mutex_lock(&priv->lock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		unmap_if_in_range(map, start, end);
>  	}
>  	list_for_each_entry(map, &priv->freeable_maps, next) {
>  		unmap_if_in_range(map, start, end);
>  	}
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  }
>  
>  static void mn_invl_page(struct mmu_notifier *mn,
> @@ -467,7 +467,7 @@ static void mn_release(struct mmu_notifier *mn,
>  	struct grant_map *map;
>  	int err;
>  
> -	spin_lock(&priv->lock);
> +	mutex_lock(&priv->lock);
>  	list_for_each_entry(map, &priv->maps, next) {
>  		if (!map->vma)
>  			continue;
> @@ -486,7 +486,7 @@ static void mn_release(struct mmu_notifier *mn,
>  		err = unmap_grant_pages(map, /* offset */ 0, map->count);
>  		WARN_ON(err);
>  	}
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  }
>  
>  static struct mmu_notifier_ops gntdev_mmu_ops = {
> @@ -508,7 +508,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>  
>  	INIT_LIST_HEAD(&priv->maps);
>  	INIT_LIST_HEAD(&priv->freeable_maps);
> -	spin_lock_init(&priv->lock);
> +	mutex_init(&priv->lock);
>  
>  	if (use_ptemod) {
>  		priv->mm = get_task_mm(current);
> @@ -582,10 +582,10 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>  		return -EFAULT;
>  	}
>  
> -	spin_lock(&priv->lock);
> +	mutex_lock(&priv->lock);
>  	gntdev_add_map(priv, map);
>  	op.index = map->index << PAGE_SHIFT;
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  
>  	if (copy_to_user(u, &op, sizeof(op)) != 0)
>  		return -EFAULT;
> @@ -604,7 +604,7 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
>  		return -EFAULT;
>  	pr_debug("priv %p, del %d+%d\n", priv, (int)op.index, (int)op.count);
>  
> -	spin_lock(&priv->lock);
> +	mutex_lock(&priv->lock);
>  	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
>  	if (map) {
>  		list_del(&map->next);
> @@ -612,7 +612,7 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
>  			list_add_tail(&map->next, &priv->freeable_maps);
>  		err = 0;
>  	}
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  	if (map)
>  		gntdev_put_map(priv, map);
>  	return err;
> @@ -680,7 +680,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
>  	out_flags = op.action;
>  	out_event = op.event_channel_port;
>  
> -	spin_lock(&priv->lock);
> +	mutex_lock(&priv->lock);
>  
>  	list_for_each_entry(map, &priv->maps, next) {
>  		uint64_t begin = map->index << PAGE_SHIFT;
> @@ -708,7 +708,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
>  	rc = 0;
>  
>   unlock_out:
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  
>  	/* Drop the reference to the event channel we did not save in the map */
>  	if (out_flags & UNMAP_NOTIFY_SEND_EVENT)
> @@ -758,7 +758,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  	pr_debug("map %d+%d at %lx (pgoff %lx)\n",
>  			index, count, vma->vm_start, vma->vm_pgoff);
>  
> -	spin_lock(&priv->lock);
> +	mutex_lock(&priv->lock);
>  	map = gntdev_find_map_index(priv, index, count);
>  	if (!map)
>  		goto unlock_out;
> @@ -793,7 +793,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  			map->flags |= GNTMAP_readonly;
>  	}
>  
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  
>  	if (use_ptemod) {
>  		err = apply_to_page_range(vma->vm_mm, vma->vm_start,
> @@ -821,11 +821,11 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  	return 0;
>  
>  unlock_out:
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  	return err;
>  
>  out_unlock_put:
> -	spin_unlock(&priv->lock);
> +	mutex_unlock(&priv->lock);
>  out_put_map:
>  	if (use_ptemod)
>  		map->vma = NULL;
> -- 
> 1.7.10.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex
  2015-01-19 17:49   ` Stefano Stabellini
@ 2015-01-19 17:53     ` David Vrabel
  2015-01-19 18:38       ` Stefano Stabellini
  0 siblings, 1 reply; 30+ messages in thread
From: David Vrabel @ 2015-01-19 17:53 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Boris Ostrovsky

On 19/01/15 17:49, Stefano Stabellini wrote:
> On Mon, 19 Jan 2015, David Vrabel wrote:
>> Unmapping may require sleeping and we unmap while holding priv->lock, so
>> convert it to a mutex.
> 
> It would be useful to list in the commit message the operations that
> might sleep and are currently called with the spinlock held.

It's the next patch that introduces the sleeping, when we wait on the
completion for the async grant unmap to complete.

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex
  2015-01-19 17:53     ` David Vrabel
@ 2015-01-19 18:38       ` Stefano Stabellini
  2015-01-19 18:43         ` David Vrabel
  0 siblings, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2015-01-19 18:38 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Boris Ostrovsky, Stefano Stabellini

On Mon, 19 Jan 2015, David Vrabel wrote:
> On 19/01/15 17:49, Stefano Stabellini wrote:
> > On Mon, 19 Jan 2015, David Vrabel wrote:
> >> Unmapping may require sleeping and we unmap while holding priv->lock, so
> >> convert it to a mutex.
> > 
> > It would be useful to list in the commit message the operations that
> > might sleep and are currently called with the spinlock held.
> 
> It's the next patch that introduces the sleeping, when we wait on the
> completion for the async grant unmap to complete.

Ah! Makes sense.
One thing to keep in mind is that mutex_lock can introduce a sleep
itself. We need to make sure that all the call sites in gntdev.c are OK
with it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex
  2015-01-19 18:38       ` Stefano Stabellini
@ 2015-01-19 18:43         ` David Vrabel
  0 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-19 18:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Boris Ostrovsky

On 19/01/15 18:38, Stefano Stabellini wrote:
> On Mon, 19 Jan 2015, David Vrabel wrote:
>> On 19/01/15 17:49, Stefano Stabellini wrote:
>>> On Mon, 19 Jan 2015, David Vrabel wrote:
>>>> Unmapping may require sleeping and we unmap while holding priv->lock, so
>>>> convert it to a mutex.
>>>
>>> It would be useful to list in the commit message the operations that
>>> might sleep and are currently called with the spinlock held.
>>
>> It's the next patch that introduces the sleeping, when we wait on the
>> completion for the async grant unmap to complete.
> 
> Ah! Makes sense.
> One thing to keep in mind is that mutex_lock can introduce a sleep
> itself. We need to make sure that all the call sites in gntdev.c are OK
> with it.

They are.

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-19 15:51 ` [PATCH 12/14] xen-blkback: " David Vrabel
@ 2015-01-23 12:02   ` David Vrabel
  2015-01-23 14:41     ` Roger Pau Monné
  2015-01-23 14:31   ` Roger Pau Monné
  2015-01-26 15:02   ` David Vrabel
  2 siblings, 1 reply; 30+ messages in thread
From: David Vrabel @ 2015-01-23 12:02 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert, Roger Pau Monne

On 19/01/15 15:51, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
> 
> Use gnttab_unmap_refs_async() to wait until the mapped pages are no
> longer in use before unmapping them.
> 
> This allows blkback to use network storage which may retain refs to
> pages in queued skbs after the block I/O has completed.

Roger, I'd appreciate a review of this blkback change.

Thanks.

David

>  drivers/block/xen-blkback/blkback.c |  175 +++++++++++++++++++++++++----------
>  drivers/block/xen-blkback/common.h  |    3 +
>  2 files changed, 127 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 908e630..39a7aba 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -47,6 +47,7 @@
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
>  #include <xen/balloon.h>
> +#include <xen/grant_table.h>
>  #include "common.h"
>  
>  /*
> @@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif,
>  	atomic_dec(&blkif->persistent_gnt_in_use);
>  }
>  
> +static void free_persistent_gnts_unmap_callback(int result,
> +						struct gntab_unmap_queue_data *data)
> +{
> +	struct completion* c = data->data;
> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +	complete(c);
> +}
> +
>  static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>                                   unsigned int num)
>  {
> @@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	struct persistent_gnt *persistent_gnt;
>  	struct rb_node *n;
> -	int ret = 0;
>  	int segs_to_unmap = 0;
> +	struct gntab_unmap_queue_data unmap_data;
> +	struct completion unmap_completion;
> +
> +	init_completion(&unmap_completion);
> +
> +	unmap_data.data = &unmap_completion;
> +	unmap_data.done = &free_persistent_gnts_unmap_callback;
> +	unmap_data.pages = pages;
> +	unmap_data.unmap_ops = unmap;
> +	unmap_data.kunmap_ops = NULL;
>  
>  	foreach_grant_safe(persistent_gnt, n, root, node) {
>  		BUG_ON(persistent_gnt->handle ==
> @@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  
>  		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
>  			!rb_next(&persistent_gnt->node)) {
> -			ret = gnttab_unmap_refs(unmap, NULL, pages,
> -				segs_to_unmap);
> -			BUG_ON(ret);
> +
> +			unmap_data.count = segs_to_unmap;
> +			gnttab_unmap_refs_async(&unmap_data);
> +			wait_for_completion(&unmap_completion);
> +
>  			put_free_pages(blkif, pages, segs_to_unmap);
>  			segs_to_unmap = 0;
>  		}
> @@ -653,18 +676,14 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif)
>  	shrink_free_pagepool(blkif, 0 /* All */);
>  }
>  
> -/*
> - * Unmap the grant references, and also remove the M2P over-rides
> - * used in the 'pending_req'.
> - */
> -static void xen_blkbk_unmap(struct xen_blkif *blkif,
> -                            struct grant_page *pages[],
> -                            int num)
> +static unsigned int xen_blkbk_unmap_prepare(
> +	struct xen_blkif *blkif,
> +	struct grant_page **pages,
> +	unsigned int num,
> +	struct gnttab_unmap_grant_ref *unmap_ops,
> +	struct page **unmap_pages)
>  {
> -	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> -	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	unsigned int i, invcount = 0;
> -	int ret;
>  
>  	for (i = 0; i < num; i++) {
>  		if (pages[i]->persistent_gnt != NULL) {
> @@ -674,21 +693,99 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
>  		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
>  			continue;
>  		unmap_pages[invcount] = pages[i]->page;
> -		gnttab_set_unmap_op(&unmap[invcount], vaddr(pages[i]->page),
> +		gnttab_set_unmap_op(&unmap_ops[invcount], vaddr(pages[invcount]->page),
>  				    GNTMAP_host_map, pages[i]->handle);
>  		pages[i]->handle = BLKBACK_INVALID_HANDLE;
> -		if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> -			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
> -			                        invcount);
> +		invcount++;
> +       }
> +
> +       return invcount;
> +}
> +
> +static void xen_blkbk_unmap_done(struct xen_blkif *blkif,
> +				 struct page *unmap_pages[],
> +				 unsigned int num)
> +{
> +	put_free_pages(blkif, unmap_pages, num);
> +}
> +
> +static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
> +{
> +	struct pending_req* pending_req = (struct pending_req*) (data->data);
> +	struct xen_blkif *blkif = pending_req->blkif;
> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +
> +	xen_blkbk_unmap_done(blkif, data->pages, data->count);
> +	make_response(blkif, pending_req->id,
> +		      pending_req->operation, pending_req->status);
> +	free_req(blkif, pending_req);
> +	/*
> +	 * Make sure the request is freed before releasing blkif,
> +	 * or there could be a race between free_req and the
> +	 * cleanup done in xen_blkif_free during shutdown.
> +	 *
> +	 * NB: The fact that we might try to wake up pending_free_wq
> +	 * before drain_complete (in case there's a drain going on)
> +	 * it's not a problem with our current implementation
> +	 * because we can assure there's no thread waiting on
> +	 * pending_free_wq if there's a drain going on, but it has
> +	 * to be taken into account if the current model is changed.
> +	 */
> +	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> +		complete(&blkif->drain_complete);
> +	}
> +	xen_blkif_put(blkif);
> +}
> +
> +static void xen_blkbk_unmap_and_respond(struct pending_req *req)
> +{
> +	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
> +	struct xen_blkif *blkif = req->blkif;
> +	struct grant_page **pages = req->segments;
> +	unsigned int invcount;
> +
> +	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_pages,
> +					   req->unmap, req->unmap_pages);
> +
> +	work->data = req;
> +	work->done = xen_blkbk_unmap_and_respond_callback;
> +	work->unmap_ops = req->unmap;
> +	work->kunmap_ops = NULL;
> +	work->pages = req->unmap_pages;
> +	work->count = invcount;
> +
> +	gnttab_unmap_refs_async(&req->gnttab_unmap_data);
> +}
> +
> +
> +/*
> + * Unmap the grant references, and also remove the M2P over-rides
> + * used in the 'pending_req'.
> + */
> +static void xen_blkbk_unmap(struct xen_blkif *blkif,
> +                            struct grant_page *pages[],
> +                            int num)
> +{
> +	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	unsigned int invcount = 0;
> +	int ret;
> +
> +	while (num) {
> +		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +		
> +		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
> +						   unmap, unmap_pages);
> +		if (invcount) {
> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>  			BUG_ON(ret);
> -			put_free_pages(blkif, unmap_pages, invcount);
> -			invcount = 0;
> +			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
>  		}
> -	}
> -	if (invcount) {
> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
> -		BUG_ON(ret);
> -		put_free_pages(blkif, unmap_pages, invcount);
> +		pages += batch;
> +		num -= batch;
>  	}
>  }
>  
> @@ -982,32 +1079,8 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>  	 * the grant references associated with 'request' and provide
>  	 * the proper response on the ring.
>  	 */
> -	if (atomic_dec_and_test(&pending_req->pendcnt)) {
> -		struct xen_blkif *blkif = pending_req->blkif;
> -
> -		xen_blkbk_unmap(blkif,
> -		                pending_req->segments,
> -		                pending_req->nr_pages);
> -		make_response(blkif, pending_req->id,
> -			      pending_req->operation, pending_req->status);
> -		free_req(blkif, pending_req);
> -		/*
> -		 * Make sure the request is freed before releasing blkif,
> -		 * or there could be a race between free_req and the
> -		 * cleanup done in xen_blkif_free during shutdown.
> -		 *
> -		 * NB: The fact that we might try to wake up pending_free_wq
> -		 * before drain_complete (in case there's a drain going on)
> -		 * it's not a problem with our current implementation
> -		 * because we can assure there's no thread waiting on
> -		 * pending_free_wq if there's a drain going on, but it has
> -		 * to be taken into account if the current model is changed.
> -		 */
> -		if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> -			complete(&blkif->drain_complete);
> -		}
> -		xen_blkif_put(blkif);
> -	}
> +	if (atomic_dec_and_test(&pending_req->pendcnt))
> +		xen_blkbk_unmap_and_respond(pending_req);
>  }
>  
>  /*
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index f65b807..428a34d 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -350,6 +350,9 @@ struct pending_req {
>  	struct grant_page	*indirect_pages[MAX_INDIRECT_PAGES];
>  	struct seg_buf		seg[MAX_INDIRECT_SEGMENTS];
>  	struct bio		*biolist[MAX_INDIRECT_SEGMENTS];
> +	struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS];
> +	struct page *unmap_pages[MAX_INDIRECT_SEGMENTS];
> +	struct gntab_unmap_queue_data gnttab_unmap_data;
>  };
>  
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-19 15:51 ` [PATCH 12/14] xen-blkback: " David Vrabel
  2015-01-23 12:02   ` David Vrabel
@ 2015-01-23 14:31   ` Roger Pau Monné
  2015-01-23 14:54     ` David Vrabel
  2015-01-26 15:02   ` David Vrabel
  2 siblings, 1 reply; 30+ messages in thread
From: Roger Pau Monné @ 2015-01-23 14:31 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert

El 19/01/15 a les 16.51, David Vrabel ha escrit:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
> 
> Use gnttab_unmap_refs_async() to wait until the mapped pages are no
> longer in use before unmapping them.
> 
> This allows blkback to use network storage which may retain refs to
> pages in queued skbs after the block I/O has completed.
> 
> Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

AFAICT it looks good, just a couple of comment below.

> ---
>  drivers/block/xen-blkback/blkback.c |  175 +++++++++++++++++++++++++----------
>  drivers/block/xen-blkback/common.h  |    3 +
>  2 files changed, 127 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 908e630..39a7aba 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -47,6 +47,7 @@
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
>  #include <xen/balloon.h>
> +#include <xen/grant_table.h>
>  #include "common.h"
>  
>  /*
> @@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif,
>  	atomic_dec(&blkif->persistent_gnt_in_use);
>  }
>  
> +static void free_persistent_gnts_unmap_callback(int result,
> +						struct gntab_unmap_queue_data *data)
> +{
> +	struct completion* c = data->data;

This should be "struct completion *c".

> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +	complete(c);
> +}
> +
>  static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>                                   unsigned int num)
>  {
> @@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	struct persistent_gnt *persistent_gnt;
>  	struct rb_node *n;
> -	int ret = 0;
>  	int segs_to_unmap = 0;
> +	struct gntab_unmap_queue_data unmap_data;
> +	struct completion unmap_completion;
> +
> +	init_completion(&unmap_completion);
> +
> +	unmap_data.data = &unmap_completion;
> +	unmap_data.done = &free_persistent_gnts_unmap_callback;
> +	unmap_data.pages = pages;
> +	unmap_data.unmap_ops = unmap;
> +	unmap_data.kunmap_ops = NULL;
>  
>  	foreach_grant_safe(persistent_gnt, n, root, node) {
>  		BUG_ON(persistent_gnt->handle ==
> @@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  
>  		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
>  			!rb_next(&persistent_gnt->node)) {
> -			ret = gnttab_unmap_refs(unmap, NULL, pages,
> -				segs_to_unmap);
> -			BUG_ON(ret);
> +
> +			unmap_data.count = segs_to_unmap;
> +			gnttab_unmap_refs_async(&unmap_data);
> +			wait_for_completion(&unmap_completion);
> +
>  			put_free_pages(blkif, pages, segs_to_unmap);
>  			segs_to_unmap = 0;
>  		}
> @@ -653,18 +676,14 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif)
>  	shrink_free_pagepool(blkif, 0 /* All */);
>  }
>  
> -/*
> - * Unmap the grant references, and also remove the M2P over-rides
> - * used in the 'pending_req'.
> - */
> -static void xen_blkbk_unmap(struct xen_blkif *blkif,
> -                            struct grant_page *pages[],
> -                            int num)
> +static unsigned int xen_blkbk_unmap_prepare(
> +	struct xen_blkif *blkif,
> +	struct grant_page **pages,
> +	unsigned int num,
> +	struct gnttab_unmap_grant_ref *unmap_ops,
> +	struct page **unmap_pages)
>  {
> -	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> -	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	unsigned int i, invcount = 0;
> -	int ret;
>  
>  	for (i = 0; i < num; i++) {
>  		if (pages[i]->persistent_gnt != NULL) {
> @@ -674,21 +693,99 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
>  		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
>  			continue;
>  		unmap_pages[invcount] = pages[i]->page;
> -		gnttab_set_unmap_op(&unmap[invcount], vaddr(pages[i]->page),
> +		gnttab_set_unmap_op(&unmap_ops[invcount], vaddr(pages[invcount]->page),
>  				    GNTMAP_host_map, pages[i]->handle);
>  		pages[i]->handle = BLKBACK_INVALID_HANDLE;
> -		if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> -			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
> -			                        invcount);
> +		invcount++;
> +       }
> +
> +       return invcount;
> +}
> +
> +static void xen_blkbk_unmap_done(struct xen_blkif *blkif,
> +				 struct page *unmap_pages[],
> +				 unsigned int num)
> +{
> +	put_free_pages(blkif, unmap_pages, num);
> +}

I'm unsure this is really needed, IMHO I would just call put_free_pages
directly.

> +
> +static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
> +{
> +	struct pending_req* pending_req = (struct pending_req*) (data->data);
> +	struct xen_blkif *blkif = pending_req->blkif;
> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +
> +	xen_blkbk_unmap_done(blkif, data->pages, data->count);

So I would change this to a direct call to put_free_pages.

> +	make_response(blkif, pending_req->id,
> +		      pending_req->operation, pending_req->status);
> +	free_req(blkif, pending_req);
> +	/*
> +	 * Make sure the request is freed before releasing blkif,
> +	 * or there could be a race between free_req and the
> +	 * cleanup done in xen_blkif_free during shutdown.
> +	 *
> +	 * NB: The fact that we might try to wake up pending_free_wq
> +	 * before drain_complete (in case there's a drain going on)
> +	 * it's not a problem with our current implementation
> +	 * because we can assure there's no thread waiting on
> +	 * pending_free_wq if there's a drain going on, but it has
> +	 * to be taken into account if the current model is changed.
> +	 */
> +	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> +		complete(&blkif->drain_complete);
> +	}
> +	xen_blkif_put(blkif);
> +}
> +
> +static void xen_blkbk_unmap_and_respond(struct pending_req *req)
> +{
> +	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
> +	struct xen_blkif *blkif = req->blkif;
> +	struct grant_page **pages = req->segments;
> +	unsigned int invcount;
> +
> +	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_pages,
> +					   req->unmap, req->unmap_pages);
> +
> +	work->data = req;
> +	work->done = xen_blkbk_unmap_and_respond_callback;
> +	work->unmap_ops = req->unmap;
> +	work->kunmap_ops = NULL;
> +	work->pages = req->unmap_pages;
> +	work->count = invcount;
> +
> +	gnttab_unmap_refs_async(&req->gnttab_unmap_data);
> +}
> +
> +
> +/*
> + * Unmap the grant references, and also remove the M2P over-rides
> + * used in the 'pending_req'.
> + */
> +static void xen_blkbk_unmap(struct xen_blkif *blkif,
> +                            struct grant_page *pages[],
> +                            int num)
> +{
> +	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	unsigned int invcount = 0;
> +	int ret;
> +
> +	while (num) {
> +		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +		
> +		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
> +						   unmap, unmap_pages);

I would add:

BUG_ON(invcount != batch);

> +		if (invcount) {
> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>  			BUG_ON(ret);
> -			put_free_pages(blkif, unmap_pages, invcount);
> -			invcount = 0;
> +			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
>  		}
> -	}
> -	if (invcount) {
> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
> -		BUG_ON(ret);
> -		put_free_pages(blkif, unmap_pages, invcount);
> +		pages += batch;
> +		num -= batch;

This loop is sort of buggy, it should work, but it is sub-optimal.

The pages array can contain both persistent grants and normal grants
(the ones that we should unmap). So blindly increasing the pages pointer
with batch will mean that we could be iterating over grants that have
already been freed. It is not really an issue because we set handle to
BLKBACK_INVALID_HANDLE, but it's a waste.

>  	}
>  }
>  
> @@ -982,32 +1079,8 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>  	 * the grant references associated with 'request' and provide
>  	 * the proper response on the ring.
>  	 */
> -	if (atomic_dec_and_test(&pending_req->pendcnt)) {
> -		struct xen_blkif *blkif = pending_req->blkif;
> -
> -		xen_blkbk_unmap(blkif,
> -		                pending_req->segments,
> -		                pending_req->nr_pages);
> -		make_response(blkif, pending_req->id,
> -			      pending_req->operation, pending_req->status);
> -		free_req(blkif, pending_req);
> -		/*
> -		 * Make sure the request is freed before releasing blkif,
> -		 * or there could be a race between free_req and the
> -		 * cleanup done in xen_blkif_free during shutdown.
> -		 *
> -		 * NB: The fact that we might try to wake up pending_free_wq
> -		 * before drain_complete (in case there's a drain going on)
> -		 * it's not a problem with our current implementation
> -		 * because we can assure there's no thread waiting on
> -		 * pending_free_wq if there's a drain going on, but it has
> -		 * to be taken into account if the current model is changed.
> -		 */
> -		if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> -			complete(&blkif->drain_complete);
> -		}
> -		xen_blkif_put(blkif);
> -	}
> +	if (atomic_dec_and_test(&pending_req->pendcnt))
> +		xen_blkbk_unmap_and_respond(pending_req);
>  }
>  
>  /*
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index f65b807..428a34d 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -350,6 +350,9 @@ struct pending_req {
>  	struct grant_page	*indirect_pages[MAX_INDIRECT_PAGES];
>  	struct seg_buf		seg[MAX_INDIRECT_SEGMENTS];
>  	struct bio		*biolist[MAX_INDIRECT_SEGMENTS];
> +	struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS];
> +	struct page *unmap_pages[MAX_INDIRECT_SEGMENTS];

Missing alignment here.

We are increasing the size of pending_req one more time, which I would
prefer to avoid but at least it allows us to perform the unmaps with a
single hypercall.

> +	struct gntab_unmap_queue_data gnttab_unmap_data;
>  };
>  
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-23 12:02   ` David Vrabel
@ 2015-01-23 14:41     ` Roger Pau Monné
  0 siblings, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2015-01-23 14:41 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert

El 23/01/15 a les 13.02, David Vrabel ha escrit:
> On 19/01/15 15:51, David Vrabel wrote:
>> From: Jenny Herbert <jennifer.herbert@citrix.com>
>>
>> Use gnttab_unmap_refs_async() to wait until the mapped pages are no
>> longer in use before unmapping them.
>>
>> This allows blkback to use network storage which may retain refs to
>> pages in queued skbs after the block I/O has completed.
> 
> Roger, I'd appreciate a review of this blkback change.

Done. BTW, have you tested the patch setting:

# echo 128 > /sys/module/xen_blkback/parameters/max_persistent_grants

Without this some of the newly introduced code-paths would remain
untested (replace 128 with whatever you want, or even 0).

Roger.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-23 14:31   ` Roger Pau Monné
@ 2015-01-23 14:54     ` David Vrabel
  2015-01-23 15:47       ` Roger Pau Monné
  0 siblings, 1 reply; 30+ messages in thread
From: David Vrabel @ 2015-01-23 14:54 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert

On 23/01/15 14:31, Roger Pau Monné wrote:
> El 19/01/15 a les 16.51, David Vrabel ha escrit:
>> From: Jenny Herbert <jennifer.herbert@citrix.com>
>>
>> +static void xen_blkbk_unmap(struct xen_blkif *blkif,
>> +                            struct grant_page *pages[],
>> +                            int num)
>> +{
>> +	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>> +	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>> +	unsigned int invcount = 0;
>> +	int ret;
>> +
>> +	while (num) {
>> +		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
>> +		
>> +		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
>> +						   unmap, unmap_pages);
> 
> I would add:
> 
> BUG_ON(invcount != batch);

I'm confused.  Surely invcount < batch is valid if one or more pages
within this batch are persistently mapped?

>> +		if (invcount) {
>> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>>  			BUG_ON(ret);
>> -			put_free_pages(blkif, unmap_pages, invcount);
>> -			invcount = 0;
>> +			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
>>  		}
>> -	}
>> -	if (invcount) {
>> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>> -		BUG_ON(ret);
>> -		put_free_pages(blkif, unmap_pages, invcount);
>> +		pages += batch;
>> +		num -= batch;
> 
> This loop is sort of buggy, it should work, but it is sub-optimal.
> 
> The pages array can contain both persistent grants and normal grants
> (the ones that we should unmap). So blindly increasing the pages pointer
> with batch will mean that we could be iterating over grants that have
> already been freed. It is not really an issue because we set handle to
> BLKBACK_INVALID_HANDLE, but it's a waste.

Again, I don't follow you here.

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-23 14:54     ` David Vrabel
@ 2015-01-23 15:47       ` Roger Pau Monné
  2015-01-23 16:00         ` David Vrabel
  2015-01-24  8:57         ` Roger Pau Monné
  0 siblings, 2 replies; 30+ messages in thread
From: Roger Pau Monné @ 2015-01-23 15:47 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert

El 23/01/15 a les 15.54, David Vrabel ha escrit:
> On 23/01/15 14:31, Roger Pau Monné wrote:
>> El 19/01/15 a les 16.51, David Vrabel ha escrit:
>>> From: Jenny Herbert <jennifer.herbert@citrix.com>
>>>
>>> +static void xen_blkbk_unmap(struct xen_blkif *blkif,
>>> +                            struct grant_page *pages[],
>>> +                            int num)
>>> +{
>>> +	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>>> +	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>>> +	unsigned int invcount = 0;
>>> +	int ret;
>>> +
>>> +	while (num) {
>>> +		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
>>> +		
>>> +		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
>>> +						   unmap, unmap_pages);
>>
>> I would add:
>>
>> BUG_ON(invcount != batch);
> 
> I'm confused.  Surely invcount < batch is valid if one or more pages
> within this batch are persistently mapped?

Yes, my bad, see below.

> 
>>> +		if (invcount) {
>>> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>>>  			BUG_ON(ret);
>>> -			put_free_pages(blkif, unmap_pages, invcount);
>>> -			invcount = 0;
>>> +			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
>>>  		}
>>> -	}
>>> -	if (invcount) {
>>> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>>> -		BUG_ON(ret);
>>> -		put_free_pages(blkif, unmap_pages, invcount);
>>> +		pages += batch;
>>> +		num -= batch;

This should be fixed to at least be (which is still not fully correct,
but it's better):

pages += invcount;
num -= invcount;

I hope an example will clarify this, suppose we have the following pages
array:

pages[0] = persistent grant
pages[1] = persistent grant
pages[2] = regular grant
pages[3] = persistent grant
pages[4] = regular grant

And batch is 1. In this case, the unmapped grant will be pages[2], but
then due to the code below pages will be updated to point to &pages[1],
which has already been scanned. If this was done correctly pages should
point to &pages[3]. As said, it's not really a bug, but the loop is
sub-optimal.

Roger.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-23 15:47       ` Roger Pau Monné
@ 2015-01-23 16:00         ` David Vrabel
  2015-01-26 15:00           ` David Vrabel
  2015-01-24  8:57         ` Roger Pau Monné
  1 sibling, 1 reply; 30+ messages in thread
From: David Vrabel @ 2015-01-23 16:00 UTC (permalink / raw)
  To: Roger Pau Monné, David Vrabel, xen-devel
  Cc: Boris Ostrovsky, Jenny Herbert

On 23/01/15 15:47, Roger Pau Monné wrote:
> El 23/01/15 a les 15.54, David Vrabel ha escrit:
>> On 23/01/15 14:31, Roger Pau Monné wrote:
>>> El 19/01/15 a les 16.51, David Vrabel ha escrit:
>>>> +		if (invcount) {
>>>> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>>>>  			BUG_ON(ret);
>>>> -			put_free_pages(blkif, unmap_pages, invcount);
>>>> -			invcount = 0;
>>>> +			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
>>>>  		}
>>>> -	}
>>>> -	if (invcount) {
>>>> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>>>> -		BUG_ON(ret);
>>>> -		put_free_pages(blkif, unmap_pages, invcount);
>>>> +		pages += batch;
>>>> +		num -= batch;
> 
> This should be fixed to at least be (which is still not fully correct,
> but it's better):
> 
> pages += invcount;
> num -= invcount;
> 
> I hope an example will clarify this, suppose we have the following pages
> array:
> 
> pages[0] = persistent grant
> pages[1] = persistent grant
> pages[2] = regular grant
> pages[3] = persistent grant
> pages[4] = regular grant
> 
> And batch is 1. In this case, the unmapped grant will be pages[2], but
> then due to the code below pages will be updated to point to &pages[1],
> which has already been scanned. If this was done correctly pages should
> point to &pages[3]. As said, it's not really a bug, but the loop is
> sub-optimal.

Ah ha.  Thanks for the clear explanation.

gnttab_blkback_unmap_prepare() stops once its been through the whole
batch regardless of whether it filled the array with ops so we don't
check a page twice but this does mean we have a sub-optimal number of ops.

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-23 15:47       ` Roger Pau Monné
  2015-01-23 16:00         ` David Vrabel
@ 2015-01-24  8:57         ` Roger Pau Monné
  1 sibling, 0 replies; 30+ messages in thread
From: Roger Pau Monné @ 2015-01-24  8:57 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert

El 23/01/15 a les 16.47, Roger Pau Monné ha escrit:
> This should be fixed to at least be (which is still not fully correct,
> but it's better):
> 
> pages += invcount;
> num -= invcount;

Forget about this chunk above, it is plain wrong and would make the loop
endless if there are persistent grants in the set of grants to unmap.

Roger.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-23 16:00         ` David Vrabel
@ 2015-01-26 15:00           ` David Vrabel
  0 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-26 15:00 UTC (permalink / raw)
  To: David Vrabel, Roger Pau Monné, xen-devel
  Cc: Boris Ostrovsky, Jenny Herbert

On 23/01/15 16:00, David Vrabel wrote:
> On 23/01/15 15:47, Roger Pau Monné wrote:
>> El 23/01/15 a les 15.54, David Vrabel ha escrit:
>>> On 23/01/15 14:31, Roger Pau Monné wrote:
>>>> El 19/01/15 a les 16.51, David Vrabel ha escrit:
>>>>> +		if (invcount) {
>>>>> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>>>>>  			BUG_ON(ret);
>>>>> -			put_free_pages(blkif, unmap_pages, invcount);
>>>>> -			invcount = 0;
>>>>> +			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
>>>>>  		}
>>>>> -	}
>>>>> -	if (invcount) {
>>>>> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>>>>> -		BUG_ON(ret);
>>>>> -		put_free_pages(blkif, unmap_pages, invcount);
>>>>> +		pages += batch;
>>>>> +		num -= batch;
>>
>> This should be fixed to at least be (which is still not fully correct,
>> but it's better):
>>
>> pages += invcount;
>> num -= invcount;
>>
>> I hope an example will clarify this, suppose we have the following pages
>> array:
>>
>> pages[0] = persistent grant
>> pages[1] = persistent grant
>> pages[2] = regular grant
>> pages[3] = persistent grant
>> pages[4] = regular grant
>>
>> And batch is 1. In this case, the unmapped grant will be pages[2], but
>> then due to the code below pages will be updated to point to &pages[1],
>> which has already been scanned. If this was done correctly pages should
>> point to &pages[3]. As said, it's not really a bug, but the loop is
>> sub-optimal.
> 
> Ah ha.  Thanks for the clear explanation.
> 
> gnttab_blkback_unmap_prepare() stops once its been through the whole
> batch regardless of whether it filled the array with ops so we don't
> check a page twice but this does mean we have a sub-optimal number of ops.

This is not a hot path (it's only called during error recovery).  Are
you happy to leave this as is?

David

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-19 15:51 ` [PATCH 12/14] xen-blkback: " David Vrabel
  2015-01-23 12:02   ` David Vrabel
  2015-01-23 14:31   ` Roger Pau Monné
@ 2015-01-26 15:02   ` David Vrabel
  2015-01-26 17:34     ` Jens Axboe
  2 siblings, 1 reply; 30+ messages in thread
From: David Vrabel @ 2015-01-26 15:02 UTC (permalink / raw)
  To: xen-devel, Jens Axboe; +Cc: Boris Ostrovsky, Jenny Herbert

On 19/01/15 15:51, David Vrabel wrote:
> From: Jenny Herbert <jennifer.herbert@citrix.com>
> 
> Use gnttab_unmap_refs_async() to wait until the mapped pages are no
> longer in use before unmapping them.
> 
> This allows blkback to use network storage which may retain refs to
> pages in queued skbs after the block I/O has completed.

Hi Jens,

This xen-blkback change depends on several Xen changes and it would be
easiest if this went via the Xen tree.  Are you ok with this?

David

> Signed-off-by: Jenny Herbert <jennifer.herbert@citrix.com>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/block/xen-blkback/blkback.c |  175 +++++++++++++++++++++++++----------
>  drivers/block/xen-blkback/common.h  |    3 +
>  2 files changed, 127 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 908e630..39a7aba 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -47,6 +47,7 @@
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
>  #include <xen/balloon.h>
> +#include <xen/grant_table.h>
>  #include "common.h"
>  
>  /*
> @@ -262,6 +263,17 @@ static void put_persistent_gnt(struct xen_blkif *blkif,
>  	atomic_dec(&blkif->persistent_gnt_in_use);
>  }
>  
> +static void free_persistent_gnts_unmap_callback(int result,
> +						struct gntab_unmap_queue_data *data)
> +{
> +	struct completion* c = data->data;
> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +	complete(c);
> +}
> +
>  static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>                                   unsigned int num)
>  {
> @@ -269,8 +281,17 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  	struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	struct persistent_gnt *persistent_gnt;
>  	struct rb_node *n;
> -	int ret = 0;
>  	int segs_to_unmap = 0;
> +	struct gntab_unmap_queue_data unmap_data;
> +	struct completion unmap_completion;
> +
> +	init_completion(&unmap_completion);
> +
> +	unmap_data.data = &unmap_completion;
> +	unmap_data.done = &free_persistent_gnts_unmap_callback;
> +	unmap_data.pages = pages;
> +	unmap_data.unmap_ops = unmap;
> +	unmap_data.kunmap_ops = NULL;
>  
>  	foreach_grant_safe(persistent_gnt, n, root, node) {
>  		BUG_ON(persistent_gnt->handle ==
> @@ -285,9 +306,11 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>  
>  		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
>  			!rb_next(&persistent_gnt->node)) {
> -			ret = gnttab_unmap_refs(unmap, NULL, pages,
> -				segs_to_unmap);
> -			BUG_ON(ret);
> +
> +			unmap_data.count = segs_to_unmap;
> +			gnttab_unmap_refs_async(&unmap_data);
> +			wait_for_completion(&unmap_completion);
> +
>  			put_free_pages(blkif, pages, segs_to_unmap);
>  			segs_to_unmap = 0;
>  		}
> @@ -653,18 +676,14 @@ void xen_blkbk_free_caches(struct xen_blkif *blkif)
>  	shrink_free_pagepool(blkif, 0 /* All */);
>  }
>  
> -/*
> - * Unmap the grant references, and also remove the M2P over-rides
> - * used in the 'pending_req'.
> - */
> -static void xen_blkbk_unmap(struct xen_blkif *blkif,
> -                            struct grant_page *pages[],
> -                            int num)
> +static unsigned int xen_blkbk_unmap_prepare(
> +	struct xen_blkif *blkif,
> +	struct grant_page **pages,
> +	unsigned int num,
> +	struct gnttab_unmap_grant_ref *unmap_ops,
> +	struct page **unmap_pages)
>  {
> -	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> -	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	unsigned int i, invcount = 0;
> -	int ret;
>  
>  	for (i = 0; i < num; i++) {
>  		if (pages[i]->persistent_gnt != NULL) {
> @@ -674,21 +693,99 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
>  		if (pages[i]->handle == BLKBACK_INVALID_HANDLE)
>  			continue;
>  		unmap_pages[invcount] = pages[i]->page;
> -		gnttab_set_unmap_op(&unmap[invcount], vaddr(pages[i]->page),
> +		gnttab_set_unmap_op(&unmap_ops[invcount], vaddr(pages[invcount]->page),
>  				    GNTMAP_host_map, pages[i]->handle);
>  		pages[i]->handle = BLKBACK_INVALID_HANDLE;
> -		if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> -			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
> -			                        invcount);
> +		invcount++;
> +       }
> +
> +       return invcount;
> +}
> +
> +static void xen_blkbk_unmap_done(struct xen_blkif *blkif,
> +				 struct page *unmap_pages[],
> +				 unsigned int num)
> +{
> +	put_free_pages(blkif, unmap_pages, num);
> +}
> +
> +static void xen_blkbk_unmap_and_respond_callback(int result, struct gntab_unmap_queue_data *data)
> +{
> +	struct pending_req* pending_req = (struct pending_req*) (data->data);
> +	struct xen_blkif *blkif = pending_req->blkif;
> +
> +	/* BUG_ON used to reproduce existing behaviour,
> +	   but is this the best way to deal with this? */
> +	BUG_ON(result);
> +
> +	xen_blkbk_unmap_done(blkif, data->pages, data->count);
> +	make_response(blkif, pending_req->id,
> +		      pending_req->operation, pending_req->status);
> +	free_req(blkif, pending_req);
> +	/*
> +	 * Make sure the request is freed before releasing blkif,
> +	 * or there could be a race between free_req and the
> +	 * cleanup done in xen_blkif_free during shutdown.
> +	 *
> +	 * NB: The fact that we might try to wake up pending_free_wq
> +	 * before drain_complete (in case there's a drain going on)
> +	 * it's not a problem with our current implementation
> +	 * because we can assure there's no thread waiting on
> +	 * pending_free_wq if there's a drain going on, but it has
> +	 * to be taken into account if the current model is changed.
> +	 */
> +	if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> +		complete(&blkif->drain_complete);
> +	}
> +	xen_blkif_put(blkif);
> +}
> +
> +static void xen_blkbk_unmap_and_respond(struct pending_req *req)
> +{
> +	struct gntab_unmap_queue_data* work = &req->gnttab_unmap_data;
> +	struct xen_blkif *blkif = req->blkif;
> +	struct grant_page **pages = req->segments;
> +	unsigned int invcount;
> +
> +	invcount = xen_blkbk_unmap_prepare(blkif, pages, req->nr_pages,
> +					   req->unmap, req->unmap_pages);
> +
> +	work->data = req;
> +	work->done = xen_blkbk_unmap_and_respond_callback;
> +	work->unmap_ops = req->unmap;
> +	work->kunmap_ops = NULL;
> +	work->pages = req->unmap_pages;
> +	work->count = invcount;
> +
> +	gnttab_unmap_refs_async(&req->gnttab_unmap_data);
> +}
> +
> +
> +/*
> + * Unmap the grant references, and also remove the M2P over-rides
> + * used in the 'pending_req'.
> + */
> +static void xen_blkbk_unmap(struct xen_blkif *blkif,
> +                            struct grant_page *pages[],
> +                            int num)
> +{
> +	struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> +	unsigned int invcount = 0;
> +	int ret;
> +
> +	while (num) {
> +		unsigned int batch = min(num, BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +		
> +		invcount = xen_blkbk_unmap_prepare(blkif, pages, batch,
> +						   unmap, unmap_pages);
> +		if (invcount) {
> +			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>  			BUG_ON(ret);
> -			put_free_pages(blkif, unmap_pages, invcount);
> -			invcount = 0;
> +			xen_blkbk_unmap_done(blkif, unmap_pages, invcount);
>  		}
> -	}
> -	if (invcount) {
> -		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
> -		BUG_ON(ret);
> -		put_free_pages(blkif, unmap_pages, invcount);
> +		pages += batch;
> +		num -= batch;
>  	}
>  }
>  
> @@ -982,32 +1079,8 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>  	 * the grant references associated with 'request' and provide
>  	 * the proper response on the ring.
>  	 */
> -	if (atomic_dec_and_test(&pending_req->pendcnt)) {
> -		struct xen_blkif *blkif = pending_req->blkif;
> -
> -		xen_blkbk_unmap(blkif,
> -		                pending_req->segments,
> -		                pending_req->nr_pages);
> -		make_response(blkif, pending_req->id,
> -			      pending_req->operation, pending_req->status);
> -		free_req(blkif, pending_req);
> -		/*
> -		 * Make sure the request is freed before releasing blkif,
> -		 * or there could be a race between free_req and the
> -		 * cleanup done in xen_blkif_free during shutdown.
> -		 *
> -		 * NB: The fact that we might try to wake up pending_free_wq
> -		 * before drain_complete (in case there's a drain going on)
> -		 * it's not a problem with our current implementation
> -		 * because we can assure there's no thread waiting on
> -		 * pending_free_wq if there's a drain going on, but it has
> -		 * to be taken into account if the current model is changed.
> -		 */
> -		if (atomic_dec_and_test(&blkif->inflight) && atomic_read(&blkif->drain)) {
> -			complete(&blkif->drain_complete);
> -		}
> -		xen_blkif_put(blkif);
> -	}
> +	if (atomic_dec_and_test(&pending_req->pendcnt))
> +		xen_blkbk_unmap_and_respond(pending_req);
>  }
>  
>  /*
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index f65b807..428a34d 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -350,6 +350,9 @@ struct pending_req {
>  	struct grant_page	*indirect_pages[MAX_INDIRECT_PAGES];
>  	struct seg_buf		seg[MAX_INDIRECT_SEGMENTS];
>  	struct bio		*biolist[MAX_INDIRECT_SEGMENTS];
> +	struct gnttab_unmap_grant_ref unmap[MAX_INDIRECT_SEGMENTS];
> +	struct page *unmap_pages[MAX_INDIRECT_SEGMENTS];
> +	struct gntab_unmap_queue_data gnttab_unmap_data;
>  };
>  
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/14] xen-blkback: safely unmap grants in case they are still in use
  2015-01-26 15:02   ` David Vrabel
@ 2015-01-26 17:34     ` Jens Axboe
  0 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2015-01-26 17:34 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Jenny Herbert

On 01/26/2015 08:02 AM, David Vrabel wrote:
> On 19/01/15 15:51, David Vrabel wrote:
>> From: Jenny Herbert <jennifer.herbert@citrix.com>
>>
>> Use gnttab_unmap_refs_async() to wait until the mapped pages are no
>> longer in use before unmapping them.
>>
>> This allows blkback to use network storage which may retain refs to
>> pages in queued skbs after the block I/O has completed.
>
> Hi Jens,
>
> This xen-blkback change depends on several Xen changes and it would be
> easiest if this went via the Xen tree.  Are you ok with this?

Yeah, that's fine. You can add my acked-by.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex
  2015-01-12 15:43 [PATCHv2 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
@ 2015-01-12 15:43 ` David Vrabel
  0 siblings, 0 replies; 30+ messages in thread
From: David Vrabel @ 2015-01-12 15:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel

Unmapping may require sleeping and we unmap while holding priv->lock, so
convert it to a mutex.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/gntdev.c |   40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index a28807a..c1a03fa 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -67,7 +67,7 @@ struct gntdev_priv {
 	 * Only populated if populate_freeable_maps == 1 */
 	struct list_head freeable_maps;
 	/* lock protects maps and freeable_maps */
-	spinlock_t lock;
+	struct mutex lock;
 	struct mm_struct *mm;
 	struct mmu_notifier mn;
 };
@@ -221,9 +221,9 @@ static void gntdev_put_map(struct gntdev_priv *priv, struct grant_map *map)
 	}
 
 	if (populate_freeable_maps && priv) {
-		spin_lock(&priv->lock);
+		mutex_lock(&priv->lock);
 		list_del(&map->next);
-		spin_unlock(&priv->lock);
+		mutex_unlock(&priv->lock);
 	}
 
 	if (map->pages && !use_ptemod)
@@ -397,9 +397,9 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
 		 * not do any unmapping, since that has been done prior to
 		 * closing the vma, but it may still iterate the unmap_ops list.
 		 */
-		spin_lock(&priv->lock);
+		mutex_lock(&priv->lock);
 		map->vma = NULL;
-		spin_unlock(&priv->lock);
+		mutex_unlock(&priv->lock);
 	}
 	vma->vm_private_data = NULL;
 	gntdev_put_map(priv, map);
@@ -443,14 +443,14 @@ static void mn_invl_range_start(struct mmu_notifier *mn,
 	struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
 	struct grant_map *map;
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	list_for_each_entry(map, &priv->maps, next) {
 		unmap_if_in_range(map, start, end);
 	}
 	list_for_each_entry(map, &priv->freeable_maps, next) {
 		unmap_if_in_range(map, start, end);
 	}
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 }
 
 static void mn_invl_page(struct mmu_notifier *mn,
@@ -467,7 +467,7 @@ static void mn_release(struct mmu_notifier *mn,
 	struct grant_map *map;
 	int err;
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	list_for_each_entry(map, &priv->maps, next) {
 		if (!map->vma)
 			continue;
@@ -486,7 +486,7 @@ static void mn_release(struct mmu_notifier *mn,
 		err = unmap_grant_pages(map, /* offset */ 0, map->count);
 		WARN_ON(err);
 	}
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 }
 
 static struct mmu_notifier_ops gntdev_mmu_ops = {
@@ -508,7 +508,7 @@ static int gntdev_open(struct inode *inode, struct file *flip)
 
 	INIT_LIST_HEAD(&priv->maps);
 	INIT_LIST_HEAD(&priv->freeable_maps);
-	spin_lock_init(&priv->lock);
+	mutex_init(&priv->lock);
 
 	if (use_ptemod) {
 		priv->mm = get_task_mm(current);
@@ -582,10 +582,10 @@ static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
 		return -EFAULT;
 	}
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	gntdev_add_map(priv, map);
 	op.index = map->index << PAGE_SHIFT;
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 
 	if (copy_to_user(u, &op, sizeof(op)) != 0)
 		return -EFAULT;
@@ -604,7 +604,7 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
 		return -EFAULT;
 	pr_debug("priv %p, del %d+%d\n", priv, (int)op.index, (int)op.count);
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
 	if (map) {
 		list_del(&map->next);
@@ -612,7 +612,7 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
 			list_add_tail(&map->next, &priv->freeable_maps);
 		err = 0;
 	}
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 	if (map)
 		gntdev_put_map(priv, map);
 	return err;
@@ -680,7 +680,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
 	out_flags = op.action;
 	out_event = op.event_channel_port;
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 
 	list_for_each_entry(map, &priv->maps, next) {
 		uint64_t begin = map->index << PAGE_SHIFT;
@@ -708,7 +708,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
 	rc = 0;
 
  unlock_out:
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 
 	/* Drop the reference to the event channel we did not save in the map */
 	if (out_flags & UNMAP_NOTIFY_SEND_EVENT)
@@ -758,7 +758,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 	pr_debug("map %d+%d at %lx (pgoff %lx)\n",
 			index, count, vma->vm_start, vma->vm_pgoff);
 
-	spin_lock(&priv->lock);
+	mutex_lock(&priv->lock);
 	map = gntdev_find_map_index(priv, index, count);
 	if (!map)
 		goto unlock_out;
@@ -793,7 +793,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 			map->flags |= GNTMAP_readonly;
 	}
 
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 
 	if (use_ptemod) {
 		err = apply_to_page_range(vma->vm_mm, vma->vm_start,
@@ -821,11 +821,11 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 	return 0;
 
 unlock_out:
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 	return err;
 
 out_unlock_put:
-	spin_unlock(&priv->lock);
+	mutex_unlock(&priv->lock);
 out_put_map:
 	if (use_ptemod)
 		map->vma = NULL;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-01-26 17:34 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-19 15:51 [PATCHv3 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
2015-01-19 15:51 ` [PATCH 01/14] mm: provide a find_special_page vma operation David Vrabel
2015-01-19 15:51 ` [PATCH 02/14] mm: add 'foreign' alias for the 'pinned' page flag David Vrabel
2015-01-19 15:51 ` [PATCH 03/14] xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() David Vrabel
2015-01-19 15:51 ` [PATCH 04/14] xen: remove scratch frames for ballooned pages and m2p override David Vrabel
2015-01-19 15:51 ` [PATCH 05/14] x86/xen: require ballooned pages for grant maps David Vrabel
2015-01-19 15:51 ` [PATCH 06/14] xen/grant-table: add helpers for allocating pages David Vrabel
2015-01-19 15:51 ` [PATCH 07/14] xen: mark grant mapped pages as foreign David Vrabel
2015-01-19 15:51 ` [PATCH 08/14] xen-netback: use foreign page information from the pages themselves David Vrabel
2015-01-19 15:51 ` [PATCH 09/14] xen/grant-table: add a mechanism to safely unmap pages that are in use David Vrabel
2015-01-19 15:51 ` [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex David Vrabel
2015-01-19 17:49   ` Stefano Stabellini
2015-01-19 17:53     ` David Vrabel
2015-01-19 18:38       ` Stefano Stabellini
2015-01-19 18:43         ` David Vrabel
2015-01-19 15:51 ` [PATCH 11/14] xen/gntdev: safely unmap grants in case they are still in use David Vrabel
2015-01-19 15:51 ` [PATCH 12/14] xen-blkback: " David Vrabel
2015-01-23 12:02   ` David Vrabel
2015-01-23 14:41     ` Roger Pau Monné
2015-01-23 14:31   ` Roger Pau Monné
2015-01-23 14:54     ` David Vrabel
2015-01-23 15:47       ` Roger Pau Monné
2015-01-23 16:00         ` David Vrabel
2015-01-26 15:00           ` David Vrabel
2015-01-24  8:57         ` Roger Pau Monné
2015-01-26 15:02   ` David Vrabel
2015-01-26 17:34     ` Jens Axboe
2015-01-19 15:51 ` [PATCH 13/14] xen/gntdev: mark userspace PTEs as special on x86 PV guests David Vrabel
2015-01-19 15:51 ` [PATCH 14/14] xen/gntdev: provide find_special_page VMA operation David Vrabel
  -- strict thread matches above, loose matches on Subject: below --
2015-01-12 15:43 [PATCHv2 00/14] xen: fix many long-standing grant mapping bugs David Vrabel
2015-01-12 15:43 ` [PATCH 10/14] xen/gntdev: convert priv->lock to a mutex David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.