* [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range()
@ 2014-06-13 16:26 Chris Wilson
2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Chris Wilson @ 2014-06-13 16:26 UTC (permalink / raw)
To: intel-gfx
Cc: Rik van Riel, Peter Zijlstra, Cyrill Gorcunov, linux-mm,
Mel Gorman, Johannes Weiner, Andrew Morton, Kirill A. Shutemov
When using remap_pfn_range() from a fault handler, we are exposed to
races between concurrent faults. Rather than hitting a BUG, report the
error back to the caller, like vm_insert_pfn().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
---
mm/memory.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 037b812a9531..6603a9e6a731 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
{
pte_t *pte;
spinlock_t *ptl;
+ int ret = 0;
pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
if (!pte)
return -ENOMEM;
arch_enter_lazy_mmu_mode();
do {
- BUG_ON(!pte_none(*pte));
+ if (!pte_none(*pte)) {
+ ret = -EBUSY;
+ break;
+ }
set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
pfn++;
} while (pte++, addr += PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
- return 0;
+ return ret;
}
static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
--
2.0.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
@ 2014-06-13 16:26 ` Chris Wilson
2014-06-13 16:34 ` [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
2014-06-16 13:41 ` Kirill A. Shutemov
2 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2014-06-13 16:26 UTC (permalink / raw)
To: intel-gfx; +Cc: linux-mm
On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces: 8134MiB/s -> 8154MiB/s
Upload rate for 2 tiled surfaces: 8625MiB/s -> 8632MiB/s
Upload rate for 4 linear surfaces: 8127MiB/s -> 8134MiB/s
Upload rate for 4 tiled surfaces: 8602MiB/s -> 8629MiB/s
Upload rate for 8 linear surfaces: 8124MiB/s -> 8137MiB/s
Upload rate for 8 tiled surfaces: 8603MiB/s -> 8624MiB/s
Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
Upload rate for 16 tiled surfaces: 8606MiB/s -> 8618MiB/s
Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
Upload rate for 32 tiled surfaces: 8605MiB/s -> 8614MiB/s
Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
Upload rate for 64 tiled surfaces: 3017MiB/s -> 5127MiB/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Cc: linux-mm@kvack.org
---
drivers/gpu/drm/i915/i915_gem.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c313cb2b641b..e6246634b419 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1565,22 +1565,23 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj);
pfn >>= PAGE_SHIFT;
- if (!obj->fault_mappable) {
- int i;
+ ret = remap_pfn_range(vma, vma->vm_start,
+ pfn, vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
+ if (ret) {
+ /* After passing the sanity checks on remap_pfn_range(), we may
+ * abort whilst updating the pagetables due to ENOMEM and leave
+ * the tables in an inconsistent state. Reset them all now.
+ * However, we do not want to undo the work of another thread
+ * that beat us to prefaulting the PTEs.
+ */
+ if (ret != -EBUSY)
+ zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
+ goto unpin;
+ }
- for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
- ret = vm_insert_pfn(vma,
- (unsigned long)vma->vm_start + i * PAGE_SIZE,
- pfn + i);
- if (ret)
- break;
- }
+ obj->fault_mappable = true;
- obj->fault_mappable = true;
- } else
- ret = vm_insert_pfn(vma,
- (unsigned long)vmf->virtual_address,
- pfn + page_offset);
unpin:
i915_gem_object_ggtt_unpin(obj);
unlock:
--
2.0.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson
@ 2014-06-13 16:34 ` Chris Wilson
2014-06-16 13:41 ` Kirill A. Shutemov
2 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2014-06-13 16:34 UTC (permalink / raw)
To: intel-gfx
Cc: Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Rik van Riel,
Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm
On Fri, Jun 13, 2014 at 05:26:17PM +0100, Chris Wilson wrote:
> When using remap_pfn_range() from a fault handler, we are exposed to
> races between concurrent faults. Rather than hitting a BUG, report the
> error back to the caller, like vm_insert_pfn().
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> ---
> mm/memory.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 037b812a9531..6603a9e6a731 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
> {
> pte_t *pte;
> spinlock_t *ptl;
> + int ret = 0;
>
> pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
> if (!pte)
> return -ENOMEM;
> arch_enter_lazy_mmu_mode();
> do {
> - BUG_ON(!pte_none(*pte));
> + if (!pte_none(*pte)) {
> + ret = -EBUSY;
> + break;
> + }
> set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
> pfn++;
> } while (pte++, addr += PAGE_SIZE, addr != end);
> arch_leave_lazy_mmu_mode();
> pte_unmap_unlock(pte - 1, ptl);
Oh. That will want the EBUSY path to increment pte or we will try to
unmap the wrong page.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
@ 2014-06-16 13:41 ` Kirill A. Shutemov
2014-06-13 16:34 ` [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
2014-06-16 13:41 ` Kirill A. Shutemov
2 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-16 13:41 UTC (permalink / raw)
To: Chris Wilson
Cc: intel-gfx, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra,
Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner,
linux-mm
Chris Wilson wrote:
> When using remap_pfn_range() from a fault handler, we are exposed to
> races between concurrent faults. Rather than hitting a BUG, report the
> error back to the caller, like vm_insert_pfn().
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> ---
> mm/memory.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 037b812a9531..6603a9e6a731 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
> {
> pte_t *pte;
> spinlock_t *ptl;
> + int ret = 0;
>
> pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
> if (!pte)
> return -ENOMEM;
> arch_enter_lazy_mmu_mode();
> do {
> - BUG_ON(!pte_none(*pte));
> + if (!pte_none(*pte)) {
> + ret = -EBUSY;
> + break;
I think you need at least remove entries you've setup if the check failed not
at first iteration.
And nobody propagate your -EBUSY back to remap_pfn_range(): caller will
see -ENOMEM, which is not what you want, I believe.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range()
@ 2014-06-16 13:41 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-16 13:41 UTC (permalink / raw)
Cc: intel-gfx, Chris Wilson, Andrew Morton, Kirill A. Shutemov,
Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov,
Johannes Weiner, linux-mm
Chris Wilson wrote:
> When using remap_pfn_range() from a fault handler, we are exposed to
> races between concurrent faults. Rather than hitting a BUG, report the
> error back to the caller, like vm_insert_pfn().
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> ---
> mm/memory.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 037b812a9531..6603a9e6a731 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
> {
> pte_t *pte;
> spinlock_t *ptl;
> + int ret = 0;
>
> pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
> if (!pte)
> return -ENOMEM;
> arch_enter_lazy_mmu_mode();
> do {
> - BUG_ON(!pte_none(*pte));
> + if (!pte_none(*pte)) {
> + ret = -EBUSY;
> + break;
I think you need at least remove entries you've setup if the check failed not
at first iteration.
And nobody propagate your -EBUSY back to remap_pfn_range(): caller will
see -ENOMEM, which is not what you want, I believe.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-16 13:41 ` Kirill A. Shutemov
(?)
@ 2014-06-19 7:19 ` Chris Wilson
2014-06-19 11:50 ` Kirill A. Shutemov
-1 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2014-06-19 7:19 UTC (permalink / raw)
To: intel-gfx
Cc: Chris Wilson, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra,
Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner,
linux-mm
When using remap_pfn_range() from a fault handler, we are exposed to
races between concurrent faults. Rather than hitting a BUG, report the
error back to the caller, like vm_insert_pfn().
v2: Fix the pte address for unmapping along the error path.
v3: Report the error back and cleanup partial remaps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
---
Whilst this has the semantics I want to allow two concurrent, but
serialised, pagefaults that try to prefault the same object to succeed,
it looks fragile and fraught with subtlety.
-Chris
---
mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index d67fd9f..be51fcc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed);
* in null mappings (currently treated as "copy-on-access")
*/
static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
- unsigned long addr, unsigned long end,
- unsigned long pfn, pgprot_t prot)
+ unsigned long addr, unsigned long end,
+ unsigned long pfn, pgprot_t prot,
+ bool first)
{
pte_t *pte;
spinlock_t *ptl;
+ int err = 0;
pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
if (!pte)
return -ENOMEM;
arch_enter_lazy_mmu_mode();
do {
- BUG_ON(!pte_none(*pte));
+ if (!pte_none(*pte)) {
+ err = first ? -EBUSY : -EINVAL;
+ pte++;
+ break;
+ }
+ first = false;
set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
pfn++;
} while (pte++, addr += PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
- return 0;
+ return err;
}
static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
- unsigned long addr, unsigned long end,
- unsigned long pfn, pgprot_t prot)
+ unsigned long addr, unsigned long end,
+ unsigned long pfn, pgprot_t prot,
+ bool first)
{
pmd_t *pmd;
unsigned long next;
+ int err;
pfn -= addr >> PAGE_SHIFT;
pmd = pmd_alloc(mm, pud, addr);
@@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
VM_BUG_ON(pmd_trans_huge(*pmd));
do {
next = pmd_addr_end(addr, end);
- if (remap_pte_range(mm, pmd, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot))
- return -ENOMEM;
+ err = remap_pte_range(mm, pmd, addr, next,
+ pfn + (addr >> PAGE_SHIFT), prot, first);
+ if (err)
+ return err;
+
+ first = false;
} while (pmd++, addr = next, addr != end);
return 0;
}
static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
- unsigned long addr, unsigned long end,
- unsigned long pfn, pgprot_t prot)
+ unsigned long addr, unsigned long end,
+ unsigned long pfn, pgprot_t prot, bool first)
{
pud_t *pud;
unsigned long next;
+ int err;
pfn -= addr >> PAGE_SHIFT;
pud = pud_alloc(mm, pgd, addr);
@@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
- if (remap_pmd_range(mm, pud, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot))
- return -ENOMEM;
+ err = remap_pmd_range(mm, pud, addr, next,
+ pfn + (addr >> PAGE_SHIFT), prot, first);
+ if (err)
+ return err;
+
+ first = false;
} while (pud++, addr = next, addr != end);
return 0;
}
@@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
unsigned long next;
unsigned long end = addr + PAGE_ALIGN(size);
struct mm_struct *mm = vma->vm_mm;
+ bool first = true;
int err;
/*
@@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
do {
next = pgd_addr_end(addr, end);
err = remap_pud_range(mm, pgd, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot);
+ pfn + (addr >> PAGE_SHIFT), prot, first);
if (err)
break;
+
+ first = false;
} while (pgd++, addr = next, addr != end);
- if (err)
+ if (err) {
untrack_pfn(vma, pfn, PAGE_ALIGN(size));
+ if (err != -EBUSY)
+ zap_page_range_single(vma, addr, size, NULL);
+ }
return err;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 22+ messages in thread
* RE: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-19 7:19 ` [PATCH] " Chris Wilson
@ 2014-06-19 11:50 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-19 11:50 UTC (permalink / raw)
To: Chris Wilson
Cc: intel-gfx, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra,
Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner,
linux-mm
Chris Wilson wrote:
> When using remap_pfn_range() from a fault handler, we are exposed to
> races between concurrent faults. Rather than hitting a BUG, report the
> error back to the caller, like vm_insert_pfn().
>
> v2: Fix the pte address for unmapping along the error path.
> v3: Report the error back and cleanup partial remaps.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> ---
>
> Whilst this has the semantics I want to allow two concurrent, but
> serialised, pagefaults that try to prefault the same object to succeed,
> it looks fragile and fraught with subtlety.
> -Chris
>
> ---
> mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index d67fd9f..be51fcc 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed);
> * in null mappings (currently treated as "copy-on-access")
> */
> static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> + unsigned long addr, unsigned long end,
> + unsigned long pfn, pgprot_t prot,
> + bool first)
> {
With this long parameter list, wouldn't it cleaner to pass down a point to
structure instead? This could simplify the code, I believe.
> pte_t *pte;
> spinlock_t *ptl;
> + int err = 0;
>
> pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
> if (!pte)
> return -ENOMEM;
> arch_enter_lazy_mmu_mode();
> do {
> - BUG_ON(!pte_none(*pte));
> + if (!pte_none(*pte)) {
> + err = first ? -EBUSY : -EINVAL;
> + pte++;
> + break;
> + }
> + first = false;
> set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
> pfn++;
> } while (pte++, addr += PAGE_SIZE, addr != end);
> arch_leave_lazy_mmu_mode();
> pte_unmap_unlock(pte - 1, ptl);
> - return 0;
> + return err;
> }
>
> static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> + unsigned long addr, unsigned long end,
> + unsigned long pfn, pgprot_t prot,
> + bool first)
> {
> pmd_t *pmd;
> unsigned long next;
> + int err;
>
> pfn -= addr >> PAGE_SHIFT;
> pmd = pmd_alloc(mm, pud, addr);
> @@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
> VM_BUG_ON(pmd_trans_huge(*pmd));
> do {
> next = pmd_addr_end(addr, end);
> - if (remap_pte_range(mm, pmd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> + err = remap_pte_range(mm, pmd, addr, next,
> + pfn + (addr >> PAGE_SHIFT), prot, first);
> + if (err)
> + return err;
> +
> + first = false;
> } while (pmd++, addr = next, addr != end);
> return 0;
> }
>
> static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> + unsigned long addr, unsigned long end,
> + unsigned long pfn, pgprot_t prot, bool first)
> {
> pud_t *pud;
> unsigned long next;
> + int err;
>
> pfn -= addr >> PAGE_SHIFT;
> pud = pud_alloc(mm, pgd, addr);
> @@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> return -ENOMEM;
> do {
> next = pud_addr_end(addr, end);
> - if (remap_pmd_range(mm, pud, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> + err = remap_pmd_range(mm, pud, addr, next,
> + pfn + (addr >> PAGE_SHIFT), prot, first);
> + if (err)
> + return err;
> +
> + first = false;
> } while (pud++, addr = next, addr != end);
> return 0;
> }
> @@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> unsigned long next;
> unsigned long end = addr + PAGE_ALIGN(size);
> struct mm_struct *mm = vma->vm_mm;
> + bool first = true;
> int err;
>
> /*
> @@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> do {
> next = pgd_addr_end(addr, end);
> err = remap_pud_range(mm, pgd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot);
> + pfn + (addr >> PAGE_SHIFT), prot, first);
> if (err)
> break;
> +
> + first = false;
> } while (pgd++, addr = next, addr != end);
>
> - if (err)
> + if (err) {
> untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> + if (err != -EBUSY)
> + zap_page_range_single(vma, addr, size, NULL);
Hm. If I read it correctly, you zap whole range, not only what you've
set up. Looks wrong.
And for after zap, you probably whant to return -EBUSY to caller of
remap_pfn_range(), not -EINVAL.
> + }
>
> return err;
> }
> --
> 1.9.1
>
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
@ 2014-06-19 11:50 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-19 11:50 UTC (permalink / raw)
To: Chris Wilson
Cc: Rik van Riel, Peter Zijlstra, intel-gfx, Cyrill Gorcunov,
linux-mm, Mel Gorman, Johannes Weiner, Andrew Morton,
Kirill A. Shutemov
Chris Wilson wrote:
> When using remap_pfn_range() from a fault handler, we are exposed to
> races between concurrent faults. Rather than hitting a BUG, report the
> error back to the caller, like vm_insert_pfn().
>
> v2: Fix the pte address for unmapping along the error path.
> v3: Report the error back and cleanup partial remaps.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> ---
>
> Whilst this has the semantics I want to allow two concurrent, but
> serialised, pagefaults that try to prefault the same object to succeed,
> it looks fragile and fraught with subtlety.
> -Chris
>
> ---
> mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index d67fd9f..be51fcc 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed);
> * in null mappings (currently treated as "copy-on-access")
> */
> static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> + unsigned long addr, unsigned long end,
> + unsigned long pfn, pgprot_t prot,
> + bool first)
> {
With this long parameter list, wouldn't it cleaner to pass down a point to
structure instead? This could simplify the code, I believe.
> pte_t *pte;
> spinlock_t *ptl;
> + int err = 0;
>
> pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
> if (!pte)
> return -ENOMEM;
> arch_enter_lazy_mmu_mode();
> do {
> - BUG_ON(!pte_none(*pte));
> + if (!pte_none(*pte)) {
> + err = first ? -EBUSY : -EINVAL;
> + pte++;
> + break;
> + }
> + first = false;
> set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
> pfn++;
> } while (pte++, addr += PAGE_SIZE, addr != end);
> arch_leave_lazy_mmu_mode();
> pte_unmap_unlock(pte - 1, ptl);
> - return 0;
> + return err;
> }
>
> static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> + unsigned long addr, unsigned long end,
> + unsigned long pfn, pgprot_t prot,
> + bool first)
> {
> pmd_t *pmd;
> unsigned long next;
> + int err;
>
> pfn -= addr >> PAGE_SHIFT;
> pmd = pmd_alloc(mm, pud, addr);
> @@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
> VM_BUG_ON(pmd_trans_huge(*pmd));
> do {
> next = pmd_addr_end(addr, end);
> - if (remap_pte_range(mm, pmd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> + err = remap_pte_range(mm, pmd, addr, next,
> + pfn + (addr >> PAGE_SHIFT), prot, first);
> + if (err)
> + return err;
> +
> + first = false;
> } while (pmd++, addr = next, addr != end);
> return 0;
> }
>
> static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> + unsigned long addr, unsigned long end,
> + unsigned long pfn, pgprot_t prot, bool first)
> {
> pud_t *pud;
> unsigned long next;
> + int err;
>
> pfn -= addr >> PAGE_SHIFT;
> pud = pud_alloc(mm, pgd, addr);
> @@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> return -ENOMEM;
> do {
> next = pud_addr_end(addr, end);
> - if (remap_pmd_range(mm, pud, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> + err = remap_pmd_range(mm, pud, addr, next,
> + pfn + (addr >> PAGE_SHIFT), prot, first);
> + if (err)
> + return err;
> +
> + first = false;
> } while (pud++, addr = next, addr != end);
> return 0;
> }
> @@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> unsigned long next;
> unsigned long end = addr + PAGE_ALIGN(size);
> struct mm_struct *mm = vma->vm_mm;
> + bool first = true;
> int err;
>
> /*
> @@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> do {
> next = pgd_addr_end(addr, end);
> err = remap_pud_range(mm, pgd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot);
> + pfn + (addr >> PAGE_SHIFT), prot, first);
> if (err)
> break;
> +
> + first = false;
> } while (pgd++, addr = next, addr != end);
>
> - if (err)
> + if (err) {
> untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> + if (err != -EBUSY)
> + zap_page_range_single(vma, addr, size, NULL);
Hm. If I read it correctly, you zap whole range, not only what you've
set up. Looks wrong.
And for after zap, you probably whant to return -EBUSY to caller of
remap_pfn_range(), not -EINVAL.
> + }
>
> return err;
> }
> --
> 1.9.1
>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-19 11:50 ` Kirill A. Shutemov
(?)
@ 2014-06-19 12:00 ` Chris Wilson
2014-06-19 12:57 ` Kirill A. Shutemov
-1 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2014-06-19 12:00 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: intel-gfx, Andrew Morton, Peter Zijlstra, Rik van Riel,
Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm
On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote:
> > + if (err) {
> > untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> > + if (err != -EBUSY)
> > + zap_page_range_single(vma, addr, size, NULL);
>
> Hm. If I read it correctly, you zap whole range, not only what you've
> set up. Looks wrong.
Yes. I didn't fancy threading the last touched pte back, but that should
be easier if moving to a struct.
> And for after zap, you probably whant to return -EBUSY to caller of
> remap_pfn_range(), not -EINVAL.
No, it has to be EINVAL for my purpose. If we return EBUSY, the caller
will just report VM_NOPAGE back to the fault handler, and the fault will
be retriggered - but the overlapping object will still be present. So the
EINVAL is there to report that the range conflicts with another and that
the caller should abort. It's a nasty semantic that works only when the
concurrent pagefaults are serialised around the call to remap_pfn_range().
The alternative would be to always report EINVAL and clean up, and
export pte_exists() so that the caller can detect when the PTEs have
already been populated by the concurrent fault.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-19 12:00 ` Chris Wilson
@ 2014-06-19 12:57 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-19 12:57 UTC (permalink / raw)
To: Chris Wilson
Cc: Kirill A. Shutemov, intel-gfx, Andrew Morton, Peter Zijlstra,
Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner,
linux-mm
Chris Wilson wrote:
> On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote:
> > > + if (err) {
> > > untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> > > + if (err != -EBUSY)
> > > + zap_page_range_single(vma, addr, size, NULL);
> >
> > Hm. If I read it correctly, you zap whole range, not only what you've
> > set up. Looks wrong.
>
> Yes. I didn't fancy threading the last touched pte back, but that should
> be easier if moving to a struct.
>
> > And for after zap, you probably whant to return -EBUSY to caller of
> > remap_pfn_range(), not -EINVAL.
>
> No, it has to be EINVAL for my purpose. If we return EBUSY, the caller
> will just report VM_NOPAGE back to the fault handler, and the fault will
> be retriggered - but the overlapping object will still be present.
IIUC, what you're saying makes sense only if the range starts from PTE
you've got fault to. I failed to see why this assumption should be part of
remap_pfn_range() interface.
One possible option is to create a variant of remap_pfn_range() which will
return how many PTEs it was able to setup, before hitting the !pte_none().
Caller will decide what to do with partially filled range.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
@ 2014-06-19 12:57 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-19 12:57 UTC (permalink / raw)
To: Chris Wilson
Cc: Rik van Riel, Peter Zijlstra, intel-gfx, Cyrill Gorcunov,
linux-mm, Mel Gorman, Johannes Weiner, Andrew Morton,
Kirill A. Shutemov
Chris Wilson wrote:
> On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote:
> > > + if (err) {
> > > untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> > > + if (err != -EBUSY)
> > > + zap_page_range_single(vma, addr, size, NULL);
> >
> > Hm. If I read it correctly, you zap whole range, not only what you've
> > set up. Looks wrong.
>
> Yes. I didn't fancy threading the last touched pte back, but that should
> be easier if moving to a struct.
>
> > And for after zap, you probably whant to return -EBUSY to caller of
> > remap_pfn_range(), not -EINVAL.
>
> No, it has to be EINVAL for my purpose. If we return EBUSY, the caller
> will just report VM_NOPAGE back to the fault handler, and the fault will
> be retriggered - but the overlapping object will still be present.
IIUC, what you're saying makes sense only if the range starts from PTE
you've got fault to. I failed to see why this assumption should be part of
remap_pfn_range() interface.
One possible option is to create a variant of remap_pfn_range() which will
return how many PTEs it was able to setup, before hitting the !pte_none().
Caller will decide what to do with partially filled range.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-19 12:57 ` Kirill A. Shutemov
(?)
@ 2014-06-19 13:22 ` Chris Wilson
2014-06-19 13:59 ` Kirill A. Shutemov
-1 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2014-06-19 13:22 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: intel-gfx, Andrew Morton, Peter Zijlstra, Rik van Riel,
Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm
On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote:
> Chris Wilson wrote:
> > On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote:
> > > > + if (err) {
> > > > untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> > > > + if (err != -EBUSY)
> > > > + zap_page_range_single(vma, addr, size, NULL);
> > >
> > > Hm. If I read it correctly, you zap whole range, not only what you've
> > > set up. Looks wrong.
> >
> > Yes. I didn't fancy threading the last touched pte back, but that should
> > be easier if moving to a struct.
> >
> > > And for after zap, you probably whant to return -EBUSY to caller of
> > > remap_pfn_range(), not -EINVAL.
> >
> > No, it has to be EINVAL for my purpose. If we return EBUSY, the caller
> > will just report VM_NOPAGE back to the fault handler, and the fault will
> > be retriggered - but the overlapping object will still be present.
>
> IIUC, what you're saying makes sense only if the range starts from PTE
> you've got fault to. I failed to see why this assumption should be part of
> remap_pfn_range() interface.
That I agree with.
> One possible option is to create a variant of remap_pfn_range() which will
> return how many PTEs it was able to setup, before hitting the !pte_none().
> Caller will decide what to do with partially filled range.
Looked at just returning the address remap_pfn_range() got up to, which is
easy enough, but I think given that remap_pfn_range() will clean up
correctly after a failed remap, any EBUSY from partway through would be
a pathological driver error.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
2014-06-19 13:22 ` Chris Wilson
@ 2014-06-19 13:59 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-19 13:59 UTC (permalink / raw)
To: Chris Wilson
Cc: Kirill A. Shutemov, intel-gfx, Andrew Morton, Peter Zijlstra,
Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner,
linux-mm
Chris Wilson wrote:
> On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote:
> > One possible option is to create a variant of remap_pfn_range() which will
> > return how many PTEs it was able to setup, before hitting the !pte_none().
> > Caller will decide what to do with partially filled range.
>
> Looked at just returning the address remap_pfn_range() got up to, which is
> easy enough, but I think given that remap_pfn_range() will clean up
> correctly after a failed remap, any EBUSY from partway through would be
> a pathological driver error.
I would prefer keep remap_pfn_range() interface intact with BUG_ON() on
unexpected !pte_none() and introduce new function with more flexible
behaviour (sharing underlying infrastructure).
This way we can avoid changing every remap_pfn_range() caller.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range()
@ 2014-06-19 13:59 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-19 13:59 UTC (permalink / raw)
To: Chris Wilson
Cc: Rik van Riel, Peter Zijlstra, intel-gfx, Cyrill Gorcunov,
linux-mm, Mel Gorman, Johannes Weiner, Andrew Morton,
Kirill A. Shutemov
Chris Wilson wrote:
> On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote:
> > One possible option is to create a variant of remap_pfn_range() which will
> > return how many PTEs it was able to setup, before hitting the !pte_none().
> > Caller will decide what to do with partially filled range.
>
> Looked at just returning the address remap_pfn_range() got up to, which is
> easy enough, but I think given that remap_pfn_range() will clean up
> correctly after a failed remap, any EBUSY from partway through would be
> a pathological driver error.
I would prefer keep remap_pfn_range() interface intact with BUG_ON() on
unexpected !pte_none() and introduce new function with more flexible
behaviour (sharing underlying infrastructure).
This way we can avoid changing every remap_pfn_range() caller.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] mm: Refactor remap_pfn_range()
2014-06-19 13:59 ` Kirill A. Shutemov
(?)
@ 2014-06-21 15:53 ` Chris Wilson
2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
` (3 more replies)
-1 siblings, 4 replies; 22+ messages in thread
From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw)
To: intel-gfx
Cc: Rik van Riel, Peter Zijlstra, Cyrill Gorcunov, linux-mm,
Mel Gorman, Johannes Weiner, Andrew Morton, Kirill A. Shutemov
In preparation for exporting very similar functionality through another
interface, gut the current remap_pfn_range(). The motivating factor here
is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of
errors rather than BUG_ON.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
---
mm/memory.c | 102 +++++++++++++++++++++++++++++++++---------------------------
1 file changed, 57 insertions(+), 45 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 037b812a9531..d2c7fe88a289 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
}
EXPORT_SYMBOL(vm_insert_mixed);
+struct remap_pfn {
+ struct mm_struct *mm;
+ unsigned long addr;
+ unsigned long pfn;
+ pgprot_t prot;
+};
+
/*
* maps a range of physical memory into the requested pages. the old
* mappings are removed. any references to nonexistent pages results
* in null mappings (currently treated as "copy-on-access")
*/
-static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
- unsigned long addr, unsigned long end,
- unsigned long pfn, pgprot_t prot)
+static inline int remap_pfn(struct remap_pfn *r, pte_t *pte)
+{
+ if (!pte_none(*pte))
+ return -EBUSY;
+
+ set_pte_at(r->mm, r->addr, pte,
+ pte_mkspecial(pfn_pte(r->pfn, r->prot)));
+ r->pfn++;
+ r->addr += PAGE_SIZE;
+ return 0;
+}
+
+static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end)
{
pte_t *pte;
spinlock_t *ptl;
+ int err;
- pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
+ pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl);
if (!pte)
return -ENOMEM;
+
arch_enter_lazy_mmu_mode();
do {
- BUG_ON(!pte_none(*pte));
- set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
- pfn++;
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ err = remap_pfn(r, pte++);
+ } while (err == 0 && r->addr < end);
arch_leave_lazy_mmu_mode();
+
pte_unmap_unlock(pte - 1, ptl);
- return 0;
+ return err;
}
-static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
- unsigned long addr, unsigned long end,
- unsigned long pfn, pgprot_t prot)
+static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end)
{
pmd_t *pmd;
- unsigned long next;
+ int err;
- pfn -= addr >> PAGE_SHIFT;
- pmd = pmd_alloc(mm, pud, addr);
+ pmd = pmd_alloc(r->mm, pud, r->addr);
if (!pmd)
return -ENOMEM;
VM_BUG_ON(pmd_trans_huge(*pmd));
+
do {
- next = pmd_addr_end(addr, end);
- if (remap_pte_range(mm, pmd, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot))
- return -ENOMEM;
- } while (pmd++, addr = next, addr != end);
- return 0;
+ err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end));
+ } while (err == 0 && r->addr < end);
+
+ return err;
}
-static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
- unsigned long addr, unsigned long end,
- unsigned long pfn, pgprot_t prot)
+static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end)
{
pud_t *pud;
- unsigned long next;
+ int err;
- pfn -= addr >> PAGE_SHIFT;
- pud = pud_alloc(mm, pgd, addr);
+ pud = pud_alloc(r->mm, pgd, r->addr);
if (!pud)
return -ENOMEM;
+
do {
- next = pud_addr_end(addr, end);
- if (remap_pmd_range(mm, pud, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot))
- return -ENOMEM;
- } while (pud++, addr = next, addr != end);
- return 0;
+ err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end));
+ } while (err == 0 && r->addr < end);
+
+ return err;
}
/**
@@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t prot)
{
- pgd_t *pgd;
- unsigned long next;
unsigned long end = addr + PAGE_ALIGN(size);
- struct mm_struct *mm = vma->vm_mm;
+ struct remap_pfn r;
+ pgd_t *pgd;
int err;
/*
@@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
BUG_ON(addr >= end);
- pfn -= addr >> PAGE_SHIFT;
- pgd = pgd_offset(mm, addr);
flush_cache_range(vma, addr, end);
+
+ r.mm = vma->vm_mm;
+ r.addr = addr;
+ r.pfn = pfn;
+ r.prot = prot;
+
+ pgd = pgd_offset(r.mm, addr);
do {
- next = pgd_addr_end(addr, end);
- err = remap_pud_range(mm, pgd, addr, next,
- pfn + (addr >> PAGE_SHIFT), prot);
- if (err)
- break;
- } while (pgd++, addr = next, addr != end);
+ err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end));
+ } while (err == 0 && r.addr < end);
- if (err)
+ if (err) {
untrack_pfn(vma, pfn, PAGE_ALIGN(size));
+ BUG_ON(err == -EBUSY);
+ }
return err;
}
--
2.0.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping
2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson
@ 2014-06-21 15:53 ` Chris Wilson
2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson
` (2 subsequent siblings)
3 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw)
To: intel-gfx; +Cc: linux-mm
Currently, we only allocate a structure to hold metadata if we need to
allocate an ioremap for every access, such as on x86-32. However, it
would be useful to store basic information about the io-mapping, such as
its page protection, on all platforms.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-mm@kvack.org
---
include/linux/io-mapping.h | 52 ++++++++++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 20 deletions(-)
diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h
index 657fab4efab3..e053011f50bb 100644
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -31,16 +31,17 @@
* See Documentation/io-mapping.txt
*/
-#ifdef CONFIG_HAVE_ATOMIC_IOMAP
-
-#include <asm/iomap.h>
-
struct io_mapping {
resource_size_t base;
unsigned long size;
pgprot_t prot;
+ void __iomem *iomem;
};
+
+#ifdef CONFIG_HAVE_ATOMIC_IOMAP
+
+#include <asm/iomap.h>
/*
* For small address space machines, mapping large objects
* into the kernel virtual space isn't practical. Where
@@ -119,48 +120,59 @@ io_mapping_unmap(void __iomem *vaddr)
#else
#include <linux/uaccess.h>
-
-/* this struct isn't actually defined anywhere */
-struct io_mapping;
+#include <asm/pgtable_types.h>
/* Create the io_mapping object*/
static inline struct io_mapping *
io_mapping_create_wc(resource_size_t base, unsigned long size)
{
- return (struct io_mapping __force *) ioremap_wc(base, size);
+ struct io_mapping *iomap;
+
+ iomap = kmalloc(sizeof(*iomap), GFP_KERNEL);
+ if (!iomap)
+ return NULL;
+
+ iomap->base = base;
+ iomap->size = size;
+ iomap->iomem = ioremap_wc(base, size);
+ iomap->prot = pgprot_writecombine(PAGE_KERNEL_IO);
+
+ return iomap;
}
static inline void
io_mapping_free(struct io_mapping *mapping)
{
- iounmap((void __force __iomem *) mapping);
+ iounmap(mapping->iomem);
+ kfree(mapping);
}
-/* Atomic map/unmap */
+/* Non-atomic map/unmap */
static inline void __iomem *
-io_mapping_map_atomic_wc(struct io_mapping *mapping,
- unsigned long offset)
+io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
{
- pagefault_disable();
- return ((char __force __iomem *) mapping) + offset;
+ return mapping->iomem + offset;
}
static inline void
-io_mapping_unmap_atomic(void __iomem *vaddr)
+io_mapping_unmap(void __iomem *vaddr)
{
- pagefault_enable();
}
-/* Non-atomic map/unmap */
+/* Atomic map/unmap */
static inline void __iomem *
-io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
+io_mapping_map_atomic_wc(struct io_mapping *mapping,
+ unsigned long offset)
{
- return ((char __force __iomem *) mapping) + offset;
+ pagefault_disable();
+ return io_mapping_map_wc(mapping, offset);
}
static inline void
-io_mapping_unmap(void __iomem *vaddr)
+io_mapping_unmap_atomic(void __iomem *vaddr)
{
+ io_mapping_unmap(vaddr);
+ pagefault_enable();
}
#endif /* HAVE_ATOMIC_IOMAP */
--
2.0.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/4] mm: Export remap_io_mapping()
2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson
2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
@ 2014-06-21 15:53 ` Chris Wilson
2014-06-30 14:32 ` Kirill A. Shutemov
2014-06-21 15:53 ` [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson
2014-06-30 14:26 ` Kirill A. Shutemov
3 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw)
To: intel-gfx
Cc: Rik van Riel, Peter Zijlstra, Cyrill Gorcunov, linux-mm,
Mel Gorman, Johannes Weiner, Andrew Morton, Kirill A. Shutemov
This is similar to remap_pfn_range(), and uses the recently refactor
code to do the page table walking. The key difference is that is back
propagates its error as this is required for use from within a pagefault
handler. The other difference, is that it combine the page protection
from io-mapping, which is known from when the io-mapping is created,
with the per-vma page protection flags. This avoids having to walk the
entire system description to rediscover the special page protection
established for the io-mapping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
---
include/linux/mm.h | 4 ++++
mm/memory.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d6777060449f..aa766bbc6981 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1941,6 +1941,10 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
+struct io_mapping;
+int remap_io_mapping(struct vm_area_struct *,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ struct io_mapping *iomap);
int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
diff --git a/mm/memory.c b/mm/memory.c
index d2c7fe88a289..8af2bd2de98e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -61,6 +61,7 @@
#include <linux/string.h>
#include <linux/dma-debug.h>
#include <linux/debugfs.h>
+#include <linux/io-mapping.h>
#include <asm/io.h>
#include <asm/pgalloc.h>
@@ -2443,6 +2444,51 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
EXPORT_SYMBOL(remap_pfn_range);
/**
+ * remap_io_mapping - remap an IO mapping to userspace
+ * @vma: user vma to map to
+ * @addr: target user address to start at
+ * @pfn: physical address of kernel memory
+ * @size: size of map area
+ * @iomap: the source io_mapping
+ *
+ * Note: this is only safe if the mm semaphore is held when called.
+ */
+int remap_io_mapping(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn, unsigned long size,
+ struct io_mapping *iomap)
+{
+ unsigned long end = addr + PAGE_ALIGN(size);
+ struct remap_pfn r;
+ pgd_t *pgd;
+ int err;
+
+ if (WARN_ON(addr >= end))
+ return -EINVAL;
+
+#define MUST_SET (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP)
+ BUG_ON(is_cow_mapping(vma->vm_flags));
+ BUG_ON((vma->vm_flags & MUST_SET) != MUST_SET);
+#undef MUST_SET
+
+ r.mm = vma->vm_mm;
+ r.addr = addr;
+ r.pfn = pfn;
+ r.prot = __pgprot((pgprot_val(iomap->prot) & _PAGE_CACHE_MASK) |
+ (pgprot_val(vma->vm_page_prot) & ~_PAGE_CACHE_MASK));
+
+ pgd = pgd_offset(r.mm, addr);
+ do {
+ err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end));
+ } while (err == 0 && r.addr < end);
+
+ if (err)
+ zap_page_range_single(vma, addr, r.addr - addr, NULL);
+
+ return err;
+}
+EXPORT_SYMBOL(remap_io_mapping);
+
+/**
* vm_iomap_memory - remap memory to userspace
* @vma: user vma to map to
* @start: start of area
--
2.0.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass
2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson
2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson
@ 2014-06-21 15:53 ` Chris Wilson
2014-06-30 14:26 ` Kirill A. Shutemov
3 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw)
To: intel-gfx; +Cc: linux-mm
On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces: 8134MiB/s -> 8154MiB/s
Upload rate for 2 tiled surfaces: 8625MiB/s -> 8632MiB/s
Upload rate for 4 linear surfaces: 8127MiB/s -> 8134MiB/s
Upload rate for 4 tiled surfaces: 8602MiB/s -> 8629MiB/s
Upload rate for 8 linear surfaces: 8124MiB/s -> 8137MiB/s
Upload rate for 8 tiled surfaces: 8603MiB/s -> 8624MiB/s
Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
Upload rate for 16 tiled surfaces: 8606MiB/s -> 8618MiB/s
Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
Upload rate for 32 tiled surfaces: 8605MiB/s -> 8614MiB/s
Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
Upload rate for 64 tiled surfaces: 3017MiB/s -> 5202MiB/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Cc: linux-mm@kvack.org
---
drivers/gpu/drm/i915/i915_gem.c | 23 ++++++-----------------
1 file changed, 6 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f6d123828926..f0628e40cb1d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1565,25 +1565,14 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj);
pfn >>= PAGE_SHIFT;
- if (!obj->fault_mappable) {
- unsigned long size = min_t(unsigned long,
- vma->vm_end - vma->vm_start,
- obj->base.size);
- int i;
+ ret = remap_io_mapping(vma,
+ vma->vm_start, pfn, vma->vm_end - vma->vm_start,
+ dev_priv->gtt.mappable);
+ if (ret)
+ goto unpin;
- for (i = 0; i < size >> PAGE_SHIFT; i++) {
- ret = vm_insert_pfn(vma,
- (unsigned long)vma->vm_start + i * PAGE_SIZE,
- pfn + i);
- if (ret)
- break;
- }
+ obj->fault_mappable = true;
- obj->fault_mappable = true;
- } else
- ret = vm_insert_pfn(vma,
- (unsigned long)vmf->virtual_address,
- pfn + page_offset);
unpin:
i915_gem_object_ggtt_unpin(obj);
unlock:
--
2.0.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* RE: [PATCH 1/4] mm: Refactor remap_pfn_range()
2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson
@ 2014-06-30 14:26 ` Kirill A. Shutemov
2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson
` (2 subsequent siblings)
3 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-30 14:26 UTC (permalink / raw)
To: Chris Wilson
Cc: intel-gfx, linux-mm, Andrew Morton, Kirill A. Shutemov,
Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov,
Johannes Weiner
Chris Wilson wrote:
> In preparation for exporting very similar functionality through another
> interface, gut the current remap_pfn_range(). The motivating factor here
> is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of
> errors rather than BUG_ON.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> ---
> mm/memory.c | 102 +++++++++++++++++++++++++++++++++---------------------------
> 1 file changed, 57 insertions(+), 45 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 037b812a9531..d2c7fe88a289 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
> }
> EXPORT_SYMBOL(vm_insert_mixed);
>
> +struct remap_pfn {
> + struct mm_struct *mm;
> + unsigned long addr;
> + unsigned long pfn;
> + pgprot_t prot;
> +};
> +
> /*
> * maps a range of physical memory into the requested pages. the old
> * mappings are removed. any references to nonexistent pages results
> * in null mappings (currently treated as "copy-on-access")
> */
> -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> +static inline int remap_pfn(struct remap_pfn *r, pte_t *pte)
> +{
> + if (!pte_none(*pte))
> + return -EBUSY;
> +
> + set_pte_at(r->mm, r->addr, pte,
> + pte_mkspecial(pfn_pte(r->pfn, r->prot)));
> + r->pfn++;
> + r->addr += PAGE_SIZE;
> + return 0;
> +}
> +
> +static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end)
> {
> pte_t *pte;
> spinlock_t *ptl;
> + int err;
>
> - pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
> + pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl);
> if (!pte)
> return -ENOMEM;
> +
> arch_enter_lazy_mmu_mode();
> do {
> - BUG_ON(!pte_none(*pte));
> - set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
> - pfn++;
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + err = remap_pfn(r, pte++);
> + } while (err == 0 && r->addr < end);
> arch_leave_lazy_mmu_mode();
> +
> pte_unmap_unlock(pte - 1, ptl);
> - return 0;
> + return err;
> }
>
> -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> +static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end)
> {
> pmd_t *pmd;
> - unsigned long next;
> + int err;
>
> - pfn -= addr >> PAGE_SHIFT;
> - pmd = pmd_alloc(mm, pud, addr);
> + pmd = pmd_alloc(r->mm, pud, r->addr);
> if (!pmd)
> return -ENOMEM;
> VM_BUG_ON(pmd_trans_huge(*pmd));
> +
> do {
> - next = pmd_addr_end(addr, end);
> - if (remap_pte_range(mm, pmd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> - } while (pmd++, addr = next, addr != end);
> - return 0;
> + err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end));
> + } while (err == 0 && r->addr < end);
> +
> + return err;
> }
>
> -static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> +static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end)
> {
> pud_t *pud;
> - unsigned long next;
> + int err;
>
> - pfn -= addr >> PAGE_SHIFT;
> - pud = pud_alloc(mm, pgd, addr);
> + pud = pud_alloc(r->mm, pgd, r->addr);
> if (!pud)
> return -ENOMEM;
> +
> do {
> - next = pud_addr_end(addr, end);
> - if (remap_pmd_range(mm, pud, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> - } while (pud++, addr = next, addr != end);
> - return 0;
> + err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end));
> + } while (err == 0 && r->addr < end);
> +
> + return err;
> }
>
> /**
> @@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> unsigned long pfn, unsigned long size, pgprot_t prot)
> {
> - pgd_t *pgd;
> - unsigned long next;
> unsigned long end = addr + PAGE_ALIGN(size);
> - struct mm_struct *mm = vma->vm_mm;
> + struct remap_pfn r;
> + pgd_t *pgd;
> int err;
>
> /*
> @@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
>
> BUG_ON(addr >= end);
> - pfn -= addr >> PAGE_SHIFT;
> - pgd = pgd_offset(mm, addr);
> flush_cache_range(vma, addr, end);
> +
> + r.mm = vma->vm_mm;
> + r.addr = addr;
> + r.pfn = pfn;
> + r.prot = prot;
> +
> + pgd = pgd_offset(r.mm, addr);
> do {
> - next = pgd_addr_end(addr, end);
> - err = remap_pud_range(mm, pgd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot);
> - if (err)
> - break;
> - } while (pgd++, addr = next, addr != end);
> + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end));
> + } while (err == 0 && r.addr < end);
>
> - if (err)
> + if (err) {
> untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> + BUG_ON(err == -EBUSY);
We probably need a comment for the BUG_ON().
Otherwise,
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> + }
>
> return err;
> }
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 1/4] mm: Refactor remap_pfn_range()
@ 2014-06-30 14:26 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-30 14:26 UTC (permalink / raw)
Cc: intel-gfx, linux-mm, Chris Wilson, Andrew Morton,
Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman,
Cyrill Gorcunov, Johannes Weiner
Chris Wilson wrote:
> In preparation for exporting very similar functionality through another
> interface, gut the current remap_pfn_range(). The motivating factor here
> is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of
> errors rather than BUG_ON.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: linux-mm@kvack.org
> ---
> mm/memory.c | 102 +++++++++++++++++++++++++++++++++---------------------------
> 1 file changed, 57 insertions(+), 45 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 037b812a9531..d2c7fe88a289 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
> }
> EXPORT_SYMBOL(vm_insert_mixed);
>
> +struct remap_pfn {
> + struct mm_struct *mm;
> + unsigned long addr;
> + unsigned long pfn;
> + pgprot_t prot;
> +};
> +
> /*
> * maps a range of physical memory into the requested pages. the old
> * mappings are removed. any references to nonexistent pages results
> * in null mappings (currently treated as "copy-on-access")
> */
> -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> +static inline int remap_pfn(struct remap_pfn *r, pte_t *pte)
> +{
> + if (!pte_none(*pte))
> + return -EBUSY;
> +
> + set_pte_at(r->mm, r->addr, pte,
> + pte_mkspecial(pfn_pte(r->pfn, r->prot)));
> + r->pfn++;
> + r->addr += PAGE_SIZE;
> + return 0;
> +}
> +
> +static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end)
> {
> pte_t *pte;
> spinlock_t *ptl;
> + int err;
>
> - pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
> + pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl);
> if (!pte)
> return -ENOMEM;
> +
> arch_enter_lazy_mmu_mode();
> do {
> - BUG_ON(!pte_none(*pte));
> - set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot)));
> - pfn++;
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + err = remap_pfn(r, pte++);
> + } while (err == 0 && r->addr < end);
> arch_leave_lazy_mmu_mode();
> +
> pte_unmap_unlock(pte - 1, ptl);
> - return 0;
> + return err;
> }
>
> -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> +static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end)
> {
> pmd_t *pmd;
> - unsigned long next;
> + int err;
>
> - pfn -= addr >> PAGE_SHIFT;
> - pmd = pmd_alloc(mm, pud, addr);
> + pmd = pmd_alloc(r->mm, pud, r->addr);
> if (!pmd)
> return -ENOMEM;
> VM_BUG_ON(pmd_trans_huge(*pmd));
> +
> do {
> - next = pmd_addr_end(addr, end);
> - if (remap_pte_range(mm, pmd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> - } while (pmd++, addr = next, addr != end);
> - return 0;
> + err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end));
> + } while (err == 0 && r->addr < end);
> +
> + return err;
> }
>
> -static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> - unsigned long addr, unsigned long end,
> - unsigned long pfn, pgprot_t prot)
> +static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end)
> {
> pud_t *pud;
> - unsigned long next;
> + int err;
>
> - pfn -= addr >> PAGE_SHIFT;
> - pud = pud_alloc(mm, pgd, addr);
> + pud = pud_alloc(r->mm, pgd, r->addr);
> if (!pud)
> return -ENOMEM;
> +
> do {
> - next = pud_addr_end(addr, end);
> - if (remap_pmd_range(mm, pud, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot))
> - return -ENOMEM;
> - } while (pud++, addr = next, addr != end);
> - return 0;
> + err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end));
> + } while (err == 0 && r->addr < end);
> +
> + return err;
> }
>
> /**
> @@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
> int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> unsigned long pfn, unsigned long size, pgprot_t prot)
> {
> - pgd_t *pgd;
> - unsigned long next;
> unsigned long end = addr + PAGE_ALIGN(size);
> - struct mm_struct *mm = vma->vm_mm;
> + struct remap_pfn r;
> + pgd_t *pgd;
> int err;
>
> /*
> @@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
> vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
>
> BUG_ON(addr >= end);
> - pfn -= addr >> PAGE_SHIFT;
> - pgd = pgd_offset(mm, addr);
> flush_cache_range(vma, addr, end);
> +
> + r.mm = vma->vm_mm;
> + r.addr = addr;
> + r.pfn = pfn;
> + r.prot = prot;
> +
> + pgd = pgd_offset(r.mm, addr);
> do {
> - next = pgd_addr_end(addr, end);
> - err = remap_pud_range(mm, pgd, addr, next,
> - pfn + (addr >> PAGE_SHIFT), prot);
> - if (err)
> - break;
> - } while (pgd++, addr = next, addr != end);
> + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end));
> + } while (err == 0 && r.addr < end);
>
> - if (err)
> + if (err) {
> untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> + BUG_ON(err == -EBUSY);
We probably need a comment for the BUG_ON().
Otherwise,
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> + }
>
> return err;
> }
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 3/4] mm: Export remap_io_mapping()
2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson
@ 2014-06-30 14:32 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-30 14:32 UTC (permalink / raw)
To: Chris Wilson
Cc: intel-gfx, linux-mm, Andrew Morton, Kirill A. Shutemov,
Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov,
Johannes Weiner
Chris Wilson wrote:
> This is similar to remap_pfn_range(), and uses the recently refactor
> code to do the page table walking. The key difference is that is back
> propagates its error as this is required for use from within a pagefault
> handler. The other difference, is that it combine the page protection
> from io-mapping, which is known from when the io-mapping is created,
> with the per-vma page protection flags. This avoids having to walk the
> entire system description to rediscover the special page protection
> established for the io-mapping.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Looks okay to me:
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH 3/4] mm: Export remap_io_mapping()
@ 2014-06-30 14:32 ` Kirill A. Shutemov
0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2014-06-30 14:32 UTC (permalink / raw)
Cc: intel-gfx, linux-mm, Chris Wilson, Andrew Morton,
Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman,
Cyrill Gorcunov, Johannes Weiner
Chris Wilson wrote:
> This is similar to remap_pfn_range(), and uses the recently refactor
> code to do the page table walking. The key difference is that is back
> propagates its error as this is required for use from within a pagefault
> handler. The other difference, is that it combine the page protection
> from io-mapping, which is known from when the io-mapping is created,
> with the per-vma page protection flags. This avoids having to walk the
> entire system description to rediscover the special page protection
> established for the io-mapping.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Looks okay to me:
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-06-30 14:32 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson
2014-06-13 16:34 ` [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
2014-06-16 13:41 ` Kirill A. Shutemov
2014-06-16 13:41 ` Kirill A. Shutemov
2014-06-19 7:19 ` [PATCH] " Chris Wilson
2014-06-19 11:50 ` Kirill A. Shutemov
2014-06-19 11:50 ` Kirill A. Shutemov
2014-06-19 12:00 ` Chris Wilson
2014-06-19 12:57 ` Kirill A. Shutemov
2014-06-19 12:57 ` Kirill A. Shutemov
2014-06-19 13:22 ` Chris Wilson
2014-06-19 13:59 ` Kirill A. Shutemov
2014-06-19 13:59 ` Kirill A. Shutemov
2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson
2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson
2014-06-30 14:32 ` Kirill A. Shutemov
2014-06-30 14:32 ` Kirill A. Shutemov
2014-06-21 15:53 ` [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson
2014-06-30 14:26 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Kirill A. Shutemov
2014-06-30 14:26 ` Kirill A. Shutemov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.