All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -v2 0/3] KVM, Replace is_hwpoison_address with __get_user_pages
@ 2011-01-30  3:15 Huang Ying
  2011-01-30  3:15 ` [PATCH -v2 1/3] mm, export __get_user_pages Huang Ying
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Huang Ying @ 2011-01-30  3:15 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, ying.huang, Tony Luck,
	Dean Nelson, Andrew Morton

v2:

- Export __get_user_pages, because we need other get_user_pages variants too.


[PATCH -v2 1/3] mm, export __get_user_pages
[PATCH -v2 2/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
[PATCH -v2 3/3] KVM, Replace is_hwpoison_address with __get_user_pages

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH -v2 1/3] mm, export __get_user_pages
  2011-01-30  3:15 [PATCH -v2 0/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
@ 2011-01-30  3:15 ` Huang Ying
  2011-01-30  3:15 ` [PATCH -v2 2/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Huang Ying @ 2011-01-30  3:15 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, ying.huang, Tony Luck,
	Dean Nelson, Andrew Morton, Michel Lespinasse, Roland Dreier,
	Ralph Campbell

In most cases, get_user_pages and get_user_pages_fast should be used
to pin user pages in memory.  But sometimes, some special flags except
FOLL_GET, FOLL_WRITE and FOLL_FORCE are needed, for example in
following patch, KVM needs FOLL_HWPOISON.  To support these users,
__get_user_pages is exported directly.

There are some symbol name conflicts in infiniband driver, fixed them too.

Signed-off-by: Huang Ying <ying.huang@intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Michel Lespinasse <walken@google.com>
CC: Roland Dreier <roland@kernel.org>
CC: Ralph Campbell <infinipath@qlogic.com>
---
 drivers/infiniband/hw/ipath/ipath_user_pages.c |    6 +--
 drivers/infiniband/hw/qib/qib_user_pages.c     |    6 +--
 include/linux/mm.h                             |    4 ++
 mm/internal.h                                  |    5 --
 mm/memory.c                                    |   50 +++++++++++++++++++++++++
 5 files changed, 60 insertions(+), 11 deletions(-)

--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
@@ -53,8 +53,8 @@ static void __ipath_release_user_pages(s
 }
 
 /* call with current->mm->mmap_sem held */
-static int __get_user_pages(unsigned long start_page, size_t num_pages,
-			struct page **p, struct vm_area_struct **vma)
+static int __ipath_get_user_pages(unsigned long start_page, size_t num_pages,
+				  struct page **p, struct vm_area_struct **vma)
 {
 	unsigned long lock_limit;
 	size_t got;
@@ -165,7 +165,7 @@ int ipath_get_user_pages(unsigned long s
 
 	down_write(&current->mm->mmap_sem);
 
-	ret = __get_user_pages(start_page, num_pages, p, NULL);
+	ret = __ipath_get_user_pages(start_page, num_pages, p, NULL);
 
 	up_write(&current->mm->mmap_sem);
 
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -51,8 +51,8 @@ static void __qib_release_user_pages(str
 /*
  * Call with current->mm->mmap_sem held.
  */
-static int __get_user_pages(unsigned long start_page, size_t num_pages,
-			    struct page **p, struct vm_area_struct **vma)
+static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
+				struct page **p, struct vm_area_struct **vma)
 {
 	unsigned long lock_limit;
 	size_t got;
@@ -136,7 +136,7 @@ int qib_get_user_pages(unsigned long sta
 
 	down_write(&current->mm->mmap_sem);
 
-	ret = __get_user_pages(start_page, num_pages, p, NULL);
+	ret = __qib_get_user_pages(start_page, num_pages, p, NULL);
 
 	up_write(&current->mm->mmap_sem);
 
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -965,6 +965,10 @@ static inline int handle_mm_fault(struct
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
 
+int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
+		     unsigned long start, int len, unsigned int foll_flags,
+		     struct page **pages, struct vm_area_struct **vmas,
+		     int *nonblocking);
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			unsigned long start, int nr_pages, int write, int force,
 			struct page **pages, struct vm_area_struct **vmas);
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -245,11 +245,6 @@ static inline void mminit_validate_memmo
 }
 #endif /* CONFIG_SPARSEMEM */
 
-int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
-		     unsigned long start, int len, unsigned int foll_flags,
-		     struct page **pages, struct vm_area_struct **vmas,
-		     int *nonblocking);
-
 #define ZONE_RECLAIM_NOSCAN	-2
 #define ZONE_RECLAIM_FULL	-1
 #define ZONE_RECLAIM_SOME	0
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1410,6 +1410,55 @@ no_page_table:
 	return page;
 }
 
+/**
+ * __get_user_pages() - pin user pages in memory
+ * @tsk:	task_struct of target task
+ * @mm:		mm_struct of target mm
+ * @start:	starting user address
+ * @nr_pages:	number of pages from start to pin
+ * @gup_flags:	flags modifying pin behaviour
+ * @pages:	array that receives pointers to the pages pinned.
+ *		Should be at least nr_pages long. Or NULL, if caller
+ *		only intends to ensure the pages are faulted in.
+ * @vmas:	array of pointers to vmas corresponding to each page.
+ *		Or NULL if the caller does not require them.
+ * @nonblocking: whether waiting for disk IO or mmap_sem contention
+ *
+ * Returns number of pages pinned. This may be fewer than the number
+ * requested. If nr_pages is 0 or negative, returns 0. If no pages
+ * were pinned, returns -errno. Each page returned must be released
+ * with a put_page() call when it is finished with. vmas will only
+ * remain valid while mmap_sem is held.
+ *
+ * Must be called with mmap_sem held for read or write.
+ *
+ * __get_user_pages walks a process's page tables and takes a reference to
+ * each struct page that each user address corresponds to at a given
+ * instant. That is, it takes the page that would be accessed if a user
+ * thread accesses the given user virtual address at that instant.
+ *
+ * This does not guarantee that the page exists in the user mappings when
+ * __get_user_pages returns, and there may even be a completely different
+ * page there in some cases (eg. if mmapped pagecache has been invalidated
+ * and subsequently re faulted). However it does guarantee that the page
+ * won't be freed completely. And mostly callers simply care that the page
+ * contains data that was valid *at some point in time*. Typically, an IO
+ * or similar operation cannot guarantee anything stronger anyway because
+ * locks can't be held over the syscall boundary.
+ *
+ * If @gup_flags & FOLL_WRITE == 0, the page must not be written to. If
+ * the page is written to, set_page_dirty (or set_page_dirty_lock, as
+ * appropriate) must be called after the page is finished with, and
+ * before put_page is called.
+ *
+ * If @nonblocking != NULL, __get_user_pages will not wait for disk IO
+ * or mmap_sem contention, and if waiting is needed to pin all pages,
+ * *@nonblocking will be set to 0.
+ *
+ * In most cases, get_user_pages or get_user_pages_fast should be used
+ * instead of __get_user_pages. __get_user_pages should be used only if
+ * you need some special @gup_flags.
+ */
 int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		     unsigned long start, int nr_pages, unsigned int gup_flags,
 		     struct page **pages, struct vm_area_struct **vmas,
@@ -1578,6 +1627,7 @@ int __get_user_pages(struct task_struct
 	} while (nr_pages);
 	return i;
 }
+EXPORT_SYMBOL(__get_user_pages);
 
 /**
  * get_user_pages() - pin user pages in memory

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH -v2 2/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
  2011-01-30  3:15 [PATCH -v2 0/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
  2011-01-30  3:15 ` [PATCH -v2 1/3] mm, export __get_user_pages Huang Ying
@ 2011-01-30  3:15 ` Huang Ying
  2011-01-30  3:15 ` [PATCH -v2 3/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
  2011-02-03  9:44 ` [PATCH -v2 0/3] " Marcelo Tosatti
  3 siblings, 0 replies; 5+ messages in thread
From: Huang Ying @ 2011-01-30  3:15 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, ying.huang, Tony Luck,
	Dean Nelson, Andrew Morton

Make __get_user_pages return -EHWPOISON for HWPOISON page only if
FOLL_HWPOISON is specified.  With this patch, the interested callers
can distinguish HWPOISON pages from general FAULT pages, while other
callers will still get -EFAULT for all these pages, so the user space
interface need not to be changed.

This feature is needed by KVM, where UCR MCE should be relayed to
guest for HWPOISON page, while instruction emulation and MMIO will be
tried for general FAULT page.

The idea comes from Andrew Morton.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/asm-generic/errno.h |    2 ++
 include/linux/mm.h          |    1 +
 mm/memory.c                 |   13 ++++++++++---
 3 files changed, 13 insertions(+), 3 deletions(-)

--- a/include/asm-generic/errno.h
+++ b/include/asm-generic/errno.h
@@ -108,4 +108,6 @@
 
 #define ERFKILL		132	/* Operation not possible due to RF-kill */
 
+#define EHWPOISON	133	/* Memory page has hardware error */
+
 #endif
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1534,6 +1534,7 @@ struct page *follow_page(struct vm_area_
 #define FOLL_FORCE	0x10	/* get_user_pages read/write w/o permission */
 #define FOLL_MLOCK	0x40	/* mark page as mlocked */
 #define FOLL_SPLIT	0x80	/* don't return transhuge pages, split them */
+#define FOLL_HWPOISON	0x100	/* check page is hwpoisoned */
 
 typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
 			void *data);
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1576,9 +1576,16 @@ int __get_user_pages(struct task_struct
 				if (ret & VM_FAULT_ERROR) {
 					if (ret & VM_FAULT_OOM)
 						return i ? i : -ENOMEM;
-					if (ret &
-					    (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE|
-					     VM_FAULT_SIGBUS))
+					if (ret & (VM_FAULT_HWPOISON |
+						   VM_FAULT_HWPOISON_LARGE)) {
+						if (i)
+							return i;
+						else if (gup_flags & FOLL_HWPOISON)
+							return -EHWPOISON;
+						else
+							return -EFAULT;
+					}
+					if (ret & VM_FAULT_SIGBUS)
 						return i ? i : -EFAULT;
 					BUG();
 				}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH -v2 3/3] KVM, Replace is_hwpoison_address with __get_user_pages
  2011-01-30  3:15 [PATCH -v2 0/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
  2011-01-30  3:15 ` [PATCH -v2 1/3] mm, export __get_user_pages Huang Ying
  2011-01-30  3:15 ` [PATCH -v2 2/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
@ 2011-01-30  3:15 ` Huang Ying
  2011-02-03  9:44 ` [PATCH -v2 0/3] " Marcelo Tosatti
  3 siblings, 0 replies; 5+ messages in thread
From: Huang Ying @ 2011-01-30  3:15 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: linux-kernel, kvm, Andi Kleen, ying.huang, Tony Luck,
	Dean Nelson, Andrew Morton

is_hwpoison_address only checks whether the page table entry is
hwpoisoned, regardless the memory page mapped.  While __get_user_pages
will check both.

QEMU will clear the poisoned page table entry (via unmap/map) to make
it possible to allocate a new memory page for the virtual address
across guest rebooting.  But it is also possible that the underlying
memory page is kept poisoned even after the corresponding page table
entry is cleared, that is, a new memory page can not be allocated.
__get_user_pages can catch these situations.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 include/linux/mm.h  |    8 --------
 mm/memory-failure.c |   32 --------------------------------
 virt/kvm/kvm_main.c |   11 ++++++++++-
 3 files changed, 10 insertions(+), 41 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1627,14 +1627,6 @@ extern int sysctl_memory_failure_recover
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t mce_bad_pages;
 extern int soft_offline_page(struct page *page, int flags);
-#ifdef CONFIG_MEMORY_FAILURE
-int is_hwpoison_address(unsigned long addr);
-#else
-static inline int is_hwpoison_address(unsigned long addr)
-{
-	return 0;
-}
-#endif
 
 extern void dump_page(struct page *page);
 
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1437,35 +1437,3 @@ done:
 	/* keep elevated page count for bad page */
 	return ret;
 }
-
-/*
- * The caller must hold current->mm->mmap_sem in read mode.
- */
-int is_hwpoison_address(unsigned long addr)
-{
-	pgd_t *pgdp;
-	pud_t pud, *pudp;
-	pmd_t pmd, *pmdp;
-	pte_t pte, *ptep;
-	swp_entry_t entry;
-
-	pgdp = pgd_offset(current->mm, addr);
-	if (!pgd_present(*pgdp))
-		return 0;
-	pudp = pud_offset(pgdp, addr);
-	pud = *pudp;
-	if (!pud_present(pud) || pud_large(pud))
-		return 0;
-	pmdp = pmd_offset(pudp, addr);
-	pmd = *pmdp;
-	if (!pmd_present(pmd) || pmd_large(pmd))
-		return 0;
-	ptep = pte_offset_map(pmdp, addr);
-	pte = *ptep;
-	pte_unmap(ptep);
-	if (!is_swap_pte(pte))
-		return 0;
-	entry = pte_to_swp_entry(pte);
-	return is_hwpoison_entry(entry);
-}
-EXPORT_SYMBOL_GPL(is_hwpoison_address);
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1029,6 +1029,15 @@ static pfn_t get_fault_pfn(void)
 	return fault_pfn;
 }
 
+static inline int check_user_page_hwpoison(unsigned long addr)
+{
+	int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
+
+	rc = __get_user_pages(current, current->mm, addr, 1,
+			      flags, NULL, NULL, NULL);
+	return rc == -EHWPOISON;
+}
+
 static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic,
 			bool *async, bool write_fault, bool *writable)
 {
@@ -1076,7 +1085,7 @@ static pfn_t hva_to_pfn(struct kvm *kvm,
 			return get_fault_pfn();
 
 		down_read(&current->mm->mmap_sem);
-		if (is_hwpoison_address(addr)) {
+		if (check_user_page_hwpoison(addr)) {
 			up_read(&current->mm->mmap_sem);
 			get_page(hwpoison_page);
 			return page_to_pfn(hwpoison_page);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH -v2 0/3] KVM, Replace is_hwpoison_address with __get_user_pages
  2011-01-30  3:15 [PATCH -v2 0/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
                   ` (2 preceding siblings ...)
  2011-01-30  3:15 ` [PATCH -v2 3/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
@ 2011-02-03  9:44 ` Marcelo Tosatti
  3 siblings, 0 replies; 5+ messages in thread
From: Marcelo Tosatti @ 2011-02-03  9:44 UTC (permalink / raw)
  To: Huang Ying
  Cc: Avi Kivity, linux-kernel, kvm, Andi Kleen, Tony Luck,
	Dean Nelson, Andrew Morton

On Sun, Jan 30, 2011 at 11:15:46AM +0800, Huang Ying wrote:
> v2:
> 
> - Export __get_user_pages, because we need other get_user_pages variants too.
> 
> 
> [PATCH -v2 1/3] mm, export __get_user_pages
> [PATCH -v2 2/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally
> [PATCH -v2 3/3] KVM, Replace is_hwpoison_address with __get_user_pages
> --

Applied, thanks.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-03  9:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-30  3:15 [PATCH -v2 0/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
2011-01-30  3:15 ` [PATCH -v2 1/3] mm, export __get_user_pages Huang Ying
2011-01-30  3:15 ` [PATCH -v2 2/3] mm, Make __get_user_pages return -EHWPOISON for HWPOISON page optionally Huang Ying
2011-01-30  3:15 ` [PATCH -v2 3/3] KVM, Replace is_hwpoison_address with __get_user_pages Huang Ying
2011-02-03  9:44 ` [PATCH -v2 0/3] " Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.