From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f173.google.com (mail-we0-f173.google.com [74.125.82.173]) by kanga.kvack.org (Postfix) with ESMTP id B55406B0035 for ; Fri, 13 Jun 2014 12:34:59 -0400 (EDT) Received: by mail-we0-f173.google.com with SMTP id t60so3075073wes.32 for ; Fri, 13 Jun 2014 09:34:59 -0700 (PDT) Received: from fireflyinternet.com (mail.fireflyinternet.com. [87.106.93.118]) by mx.google.com with ESMTP id gg4si7471611wjd.15.2014.06.13.09.34.57 for ; Fri, 13 Jun 2014 09:34:58 -0700 (PDT) Date: Fri, 13 Jun 2014 17:34:54 +0100 From: Chris Wilson Subject: Re: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Message-ID: <20140613163454.GM6451@nuc-i3427.alporthouse.com> References: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> Sender: owner-linux-mm@kvack.org List-ID: To: intel-gfx@lists.freedesktop.org Cc: Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org On Fri, Jun 13, 2014 at 05:26:17PM +0100, Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > Signed-off-by: Chris Wilson > Cc: Andrew Morton > Cc: "Kirill A. Shutemov" > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Cyrill Gorcunov > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > --- > mm/memory.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..6603a9e6a731 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > { > pte_t *pte; > spinlock_t *ptl; > + int ret = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + ret = -EBUSY; > + break; > + } > set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > pfn++; > } while (pte++, addr += PAGE_SIZE, addr != end); > arch_leave_lazy_mmu_mode(); > pte_unmap_unlock(pte - 1, ptl); Oh. That will want the EBUSY path to increment pte or we will try to unmap the wrong page. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f173.google.com (mail-pd0-f173.google.com [209.85.192.173]) by kanga.kvack.org (Postfix) with ESMTP id 8891D6B0031 for ; Mon, 16 Jun 2014 09:41:29 -0400 (EDT) Received: by mail-pd0-f173.google.com with SMTP id r10so4450450pdi.32 for ; Mon, 16 Jun 2014 06:41:29 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id yv3si13744280pac.77.2014.06.16.06.41.28 for ; Mon, 16 Jun 2014 06:41:28 -0700 (PDT) From: "Kirill A. Shutemov" In-Reply-To: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> References: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> Subject: RE: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Content-Transfer-Encoding: 7bit Message-Id: <20140616134124.0ED73E00A2@blue.fi.intel.com> Date: Mon, 16 Jun 2014 16:41:24 +0300 (EEST) Sender: owner-linux-mm@kvack.org List-ID: To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > Signed-off-by: Chris Wilson > Cc: Andrew Morton > Cc: "Kirill A. Shutemov" > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Cyrill Gorcunov > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > --- > mm/memory.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..6603a9e6a731 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > { > pte_t *pte; > spinlock_t *ptl; > + int ret = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + ret = -EBUSY; > + break; I think you need at least remove entries you've setup if the check failed not at first iteration. And nobody propagate your -EBUSY back to remap_pfn_range(): caller will see -ENOMEM, which is not what you want, I believe. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) by kanga.kvack.org (Postfix) with ESMTP id 61F226B0031 for ; Thu, 19 Jun 2014 03:19:19 -0400 (EDT) Received: by mail-wi0-f175.google.com with SMTP id r20so8813866wiv.2 for ; Thu, 19 Jun 2014 00:19:18 -0700 (PDT) Received: from fireflyinternet.com (mail.fireflyinternet.com. [87.106.93.118]) by mx.google.com with ESMTP id hg9si6075680wjc.7.2014.06.19.00.19.17 for ; Thu, 19 Jun 2014 00:19:17 -0700 (PDT) From: Chris Wilson Subject: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Date: Thu, 19 Jun 2014 08:19:09 +0100 Message-Id: <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> In-Reply-To: <20140616134124.0ED73E00A2@blue.fi.intel.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: intel-gfx@lists.freedesktop.org Cc: Chris Wilson , Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org When using remap_pfn_range() from a fault handler, we are exposed to races between concurrent faults. Rather than hitting a BUG, report the error back to the caller, like vm_insert_pfn(). v2: Fix the pte address for unmapping along the error path. v3: Report the error back and cleanup partial remaps. Signed-off-by: Chris Wilson Cc: Andrew Morton Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Rik van Riel Cc: Mel Gorman Cc: Cyrill Gorcunov Cc: Johannes Weiner Cc: linux-mm@kvack.org --- Whilst this has the semantics I want to allow two concurrent, but serialised, pagefaults that try to prefault the same object to succeed, it looks fragile and fraught with subtlety. -Chris --- mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 38 insertions(+), 16 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index d67fd9f..be51fcc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed); * in null mappings (currently treated as "copy-on-access") */ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot, + bool first) { pte_t *pte; spinlock_t *ptl; + int err = 0; pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; arch_enter_lazy_mmu_mode(); do { - BUG_ON(!pte_none(*pte)); + if (!pte_none(*pte)) { + err = first ? -EBUSY : -EINVAL; + pte++; + break; + } + first = false; set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); - return 0; + return err; } static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot, + bool first) { pmd_t *pmd; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pmd = pmd_alloc(mm, pud, addr); @@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, VM_BUG_ON(pmd_trans_huge(*pmd)); do { next = pmd_addr_end(addr, end); - if (remap_pte_range(mm, pmd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pte_range(mm, pmd, addr, next, + pfn + (addr >> PAGE_SHIFT), prot, first); + if (err) + return err; + + first = false; } while (pmd++, addr = next, addr != end); return 0; } static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot, bool first) { pud_t *pud; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pud = pud_alloc(mm, pgd, addr); @@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, return -ENOMEM; do { next = pud_addr_end(addr, end); - if (remap_pmd_range(mm, pud, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pmd_range(mm, pud, addr, next, + pfn + (addr >> PAGE_SHIFT), prot, first); + if (err) + return err; + + first = false; } while (pud++, addr = next, addr != end); return 0; } @@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, unsigned long next; unsigned long end = addr + PAGE_ALIGN(size); struct mm_struct *mm = vma->vm_mm; + bool first = true; int err; /* @@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, do { next = pgd_addr_end(addr, end); err = remap_pud_range(mm, pgd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot); + pfn + (addr >> PAGE_SHIFT), prot, first); if (err) break; + + first = false; } while (pgd++, addr = next, addr != end); - if (err) + if (err) { untrack_pfn(vma, pfn, PAGE_ALIGN(size)); + if (err != -EBUSY) + zap_page_range_single(vma, addr, size, NULL); + } return err; } -- 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f50.google.com (mail-pb0-f50.google.com [209.85.160.50]) by kanga.kvack.org (Postfix) with ESMTP id 1F5FC6B0031 for ; Thu, 19 Jun 2014 07:50:55 -0400 (EDT) Received: by mail-pb0-f50.google.com with SMTP id rp16so1842361pbb.23 for ; Thu, 19 Jun 2014 04:50:54 -0700 (PDT) Received: from mga03.intel.com (mga03.intel.com. [143.182.124.21]) by mx.google.com with ESMTP id bo1si5526801pbc.7.2014.06.19.04.50.53 for ; Thu, 19 Jun 2014 04:50:54 -0700 (PDT) From: "Kirill A. Shutemov" In-Reply-To: <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> Subject: RE: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Content-Transfer-Encoding: 7bit Message-Id: <20140619115018.412D2E00A3@blue.fi.intel.com> Date: Thu, 19 Jun 2014 14:50:18 +0300 (EEST) Sender: owner-linux-mm@kvack.org List-ID: To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > v2: Fix the pte address for unmapping along the error path. > v3: Report the error back and cleanup partial remaps. > > Signed-off-by: Chris Wilson > Cc: Andrew Morton > Cc: "Kirill A. Shutemov" > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Cyrill Gorcunov > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > --- > > Whilst this has the semantics I want to allow two concurrent, but > serialised, pagefaults that try to prefault the same object to succeed, > it looks fragile and fraught with subtlety. > -Chris > > --- > mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 38 insertions(+), 16 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index d67fd9f..be51fcc 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed); > * in null mappings (currently treated as "copy-on-access") > */ > static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, > + bool first) > { With this long parameter list, wouldn't it cleaner to pass down a point to structure instead? This could simplify the code, I believe. > pte_t *pte; > spinlock_t *ptl; > + int err = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + err = first ? -EBUSY : -EINVAL; > + pte++; > + break; > + } > + first = false; > set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > pfn++; > } while (pte++, addr += PAGE_SIZE, addr != end); > arch_leave_lazy_mmu_mode(); > pte_unmap_unlock(pte - 1, ptl); > - return 0; > + return err; > } > > static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, > + bool first) > { > pmd_t *pmd; > unsigned long next; > + int err; > > pfn -= addr >> PAGE_SHIFT; > pmd = pmd_alloc(mm, pud, addr); > @@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > VM_BUG_ON(pmd_trans_huge(*pmd)); > do { > next = pmd_addr_end(addr, end); > - if (remap_pte_range(mm, pmd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > + err = remap_pte_range(mm, pmd, addr, next, > + pfn + (addr >> PAGE_SHIFT), prot, first); > + if (err) > + return err; > + > + first = false; > } while (pmd++, addr = next, addr != end); > return 0; > } > > static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, bool first) > { > pud_t *pud; > unsigned long next; > + int err; > > pfn -= addr >> PAGE_SHIFT; > pud = pud_alloc(mm, pgd, addr); > @@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > return -ENOMEM; > do { > next = pud_addr_end(addr, end); > - if (remap_pmd_range(mm, pud, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > + err = remap_pmd_range(mm, pud, addr, next, > + pfn + (addr >> PAGE_SHIFT), prot, first); > + if (err) > + return err; > + > + first = false; > } while (pud++, addr = next, addr != end); > return 0; > } > @@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > unsigned long next; > unsigned long end = addr + PAGE_ALIGN(size); > struct mm_struct *mm = vma->vm_mm; > + bool first = true; > int err; > > /* > @@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > do { > next = pgd_addr_end(addr, end); > err = remap_pud_range(mm, pgd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot); > + pfn + (addr >> PAGE_SHIFT), prot, first); > if (err) > break; > + > + first = false; > } while (pgd++, addr = next, addr != end); > > - if (err) > + if (err) { > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > + if (err != -EBUSY) > + zap_page_range_single(vma, addr, size, NULL); Hm. If I read it correctly, you zap whole range, not only what you've set up. Looks wrong. And for after zap, you probably whant to return -EBUSY to caller of remap_pfn_range(), not -EINVAL. > + } > > return err; > } > -- > 1.9.1 > -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f176.google.com (mail-we0-f176.google.com [74.125.82.176]) by kanga.kvack.org (Postfix) with ESMTP id 137F66B0037 for ; Thu, 19 Jun 2014 08:00:09 -0400 (EDT) Received: by mail-we0-f176.google.com with SMTP id u56so2189511wes.21 for ; Thu, 19 Jun 2014 05:00:09 -0700 (PDT) Received: from fireflyinternet.com (mail.fireflyinternet.com. [87.106.93.118]) by mx.google.com with ESMTP id q17si6530260wiv.54.2014.06.19.05.00.08 for ; Thu, 19 Jun 2014 05:00:08 -0700 (PDT) Date: Thu, 19 Jun 2014 13:00:04 +0100 From: Chris Wilson Subject: Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Message-ID: <20140619120004.GC25975@nuc-i3427.alporthouse.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> <20140619115018.412D2E00A3@blue.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140619115018.412D2E00A3@blue.fi.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: intel-gfx@lists.freedesktop.org, Andrew Morton , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote: > > + if (err) { > > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > > + if (err != -EBUSY) > > + zap_page_range_single(vma, addr, size, NULL); > > Hm. If I read it correctly, you zap whole range, not only what you've > set up. Looks wrong. Yes. I didn't fancy threading the last touched pte back, but that should be easier if moving to a struct. > And for after zap, you probably whant to return -EBUSY to caller of > remap_pfn_range(), not -EINVAL. No, it has to be EINVAL for my purpose. If we return EBUSY, the caller will just report VM_NOPAGE back to the fault handler, and the fault will be retriggered - but the overlapping object will still be present. So the EINVAL is there to report that the range conflicts with another and that the caller should abort. It's a nasty semantic that works only when the concurrent pagefaults are serialised around the call to remap_pfn_range(). The alternative would be to always report EINVAL and clean up, and export pte_exists() so that the caller can detect when the PTEs have already been populated by the concurrent fault. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by kanga.kvack.org (Postfix) with ESMTP id 9B7A76B0031 for ; Thu, 19 Jun 2014 08:57:51 -0400 (EDT) Received: by mail-pa0-f43.google.com with SMTP id lf10so1889629pab.2 for ; Thu, 19 Jun 2014 05:57:51 -0700 (PDT) Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTP id e10si5793327pat.80.2014.06.19.05.57.50 for ; Thu, 19 Jun 2014 05:57:50 -0700 (PDT) From: "Kirill A. Shutemov" In-Reply-To: <20140619120004.GC25975@nuc-i3427.alporthouse.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> <20140619115018.412D2E00A3@blue.fi.intel.com> <20140619120004.GC25975@nuc-i3427.alporthouse.com> Subject: Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Content-Transfer-Encoding: 7bit Message-Id: <20140619125746.25A03E00A3@blue.fi.intel.com> Date: Thu, 19 Jun 2014 15:57:46 +0300 (EEST) Sender: owner-linux-mm@kvack.org List-ID: To: Chris Wilson Cc: "Kirill A. Shutemov" , intel-gfx@lists.freedesktop.org, Andrew Morton , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org Chris Wilson wrote: > On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote: > > > + if (err) { > > > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > > > + if (err != -EBUSY) > > > + zap_page_range_single(vma, addr, size, NULL); > > > > Hm. If I read it correctly, you zap whole range, not only what you've > > set up. Looks wrong. > > Yes. I didn't fancy threading the last touched pte back, but that should > be easier if moving to a struct. > > > And for after zap, you probably whant to return -EBUSY to caller of > > remap_pfn_range(), not -EINVAL. > > No, it has to be EINVAL for my purpose. If we return EBUSY, the caller > will just report VM_NOPAGE back to the fault handler, and the fault will > be retriggered - but the overlapping object will still be present. IIUC, what you're saying makes sense only if the range starts from PTE you've got fault to. I failed to see why this assumption should be part of remap_pfn_range() interface. One possible option is to create a variant of remap_pfn_range() which will return how many PTEs it was able to setup, before hitting the !pte_none(). Caller will decide what to do with partially filled range. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) by kanga.kvack.org (Postfix) with ESMTP id EBCD46B0031 for ; Thu, 19 Jun 2014 09:22:45 -0400 (EDT) Received: by mail-wg0-f52.google.com with SMTP id b13so2247631wgh.11 for ; Thu, 19 Jun 2014 06:22:45 -0700 (PDT) Received: from fireflyinternet.com (mail.fireflyinternet.com. [87.106.93.118]) by mx.google.com with ESMTP id we9si7213879wjb.88.2014.06.19.06.22.44 for ; Thu, 19 Jun 2014 06:22:44 -0700 (PDT) Date: Thu, 19 Jun 2014 14:22:40 +0100 From: Chris Wilson Subject: Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Message-ID: <20140619132240.GF25975@nuc-i3427.alporthouse.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> <20140619115018.412D2E00A3@blue.fi.intel.com> <20140619120004.GC25975@nuc-i3427.alporthouse.com> <20140619125746.25A03E00A3@blue.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140619125746.25A03E00A3@blue.fi.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: intel-gfx@lists.freedesktop.org, Andrew Morton , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote: > Chris Wilson wrote: > > On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote: > > > > + if (err) { > > > > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > > > > + if (err != -EBUSY) > > > > + zap_page_range_single(vma, addr, size, NULL); > > > > > > Hm. If I read it correctly, you zap whole range, not only what you've > > > set up. Looks wrong. > > > > Yes. I didn't fancy threading the last touched pte back, but that should > > be easier if moving to a struct. > > > > > And for after zap, you probably whant to return -EBUSY to caller of > > > remap_pfn_range(), not -EINVAL. > > > > No, it has to be EINVAL for my purpose. If we return EBUSY, the caller > > will just report VM_NOPAGE back to the fault handler, and the fault will > > be retriggered - but the overlapping object will still be present. > > IIUC, what you're saying makes sense only if the range starts from PTE > you've got fault to. I failed to see why this assumption should be part of > remap_pfn_range() interface. That I agree with. > One possible option is to create a variant of remap_pfn_range() which will > return how many PTEs it was able to setup, before hitting the !pte_none(). > Caller will decide what to do with partially filled range. Looked at just returning the address remap_pfn_range() got up to, which is easy enough, but I think given that remap_pfn_range() will clean up correctly after a failed remap, any EBUSY from partway through would be a pathological driver error. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f182.google.com (mail-pd0-f182.google.com [209.85.192.182]) by kanga.kvack.org (Postfix) with ESMTP id 6F4D46B0031 for ; Thu, 19 Jun 2014 09:59:54 -0400 (EDT) Received: by mail-pd0-f182.google.com with SMTP id y13so1847133pdi.41 for ; Thu, 19 Jun 2014 06:59:54 -0700 (PDT) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTP id hb8si5912017pbc.8.2014.06.19.06.59.53 for ; Thu, 19 Jun 2014 06:59:53 -0700 (PDT) From: "Kirill A. Shutemov" In-Reply-To: <20140619132240.GF25975@nuc-i3427.alporthouse.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> <20140619115018.412D2E00A3@blue.fi.intel.com> <20140619120004.GC25975@nuc-i3427.alporthouse.com> <20140619125746.25A03E00A3@blue.fi.intel.com> <20140619132240.GF25975@nuc-i3427.alporthouse.com> Subject: Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Content-Transfer-Encoding: 7bit Message-Id: <20140619135944.20837E00A3@blue.fi.intel.com> Date: Thu, 19 Jun 2014 16:59:44 +0300 (EEST) Sender: owner-linux-mm@kvack.org List-ID: To: Chris Wilson Cc: "Kirill A. Shutemov" , intel-gfx@lists.freedesktop.org, Andrew Morton , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org Chris Wilson wrote: > On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote: > > One possible option is to create a variant of remap_pfn_range() which will > > return how many PTEs it was able to setup, before hitting the !pte_none(). > > Caller will decide what to do with partially filled range. > > Looked at just returning the address remap_pfn_range() got up to, which is > easy enough, but I think given that remap_pfn_range() will clean up > correctly after a failed remap, any EBUSY from partway through would be > a pathological driver error. I would prefer keep remap_pfn_range() interface intact with BUG_ON() on unexpected !pte_none() and introduce new function with more flexible behaviour (sharing underlying infrastructure). This way we can avoid changing every remap_pfn_range() caller. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id CE9CB6B0031 for ; Mon, 30 Jun 2014 10:27:05 -0400 (EDT) Received: by mail-pa0-f45.google.com with SMTP id rd3so8693222pab.18 for ; Mon, 30 Jun 2014 07:27:05 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id bu3si23362586pbb.98.2014.06.30.07.27.04 for ; Mon, 30 Jun 2014 07:27:04 -0700 (PDT) From: "Kirill A. Shutemov" In-Reply-To: <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> References: <20140619135944.20837E00A3@blue.fi.intel.com> <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> Subject: RE: [PATCH 1/4] mm: Refactor remap_pfn_range() Content-Transfer-Encoding: 7bit Message-Id: <20140630142659.B6F7DE00A3@blue.fi.intel.com> Date: Mon, 30 Jun 2014 17:26:59 +0300 (EEST) Sender: owner-linux-mm@kvack.org List-ID: To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, linux-mm@kvack.org, Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner Chris Wilson wrote: > In preparation for exporting very similar functionality through another > interface, gut the current remap_pfn_range(). The motivating factor here > is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of > errors rather than BUG_ON. > > Signed-off-by: Chris Wilson > Cc: Andrew Morton > Cc: "Kirill A. Shutemov" > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Cyrill Gorcunov > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > --- > mm/memory.c | 102 +++++++++++++++++++++++++++++++++--------------------------- > 1 file changed, 57 insertions(+), 45 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..d2c7fe88a289 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, > } > EXPORT_SYMBOL(vm_insert_mixed); > > +struct remap_pfn { > + struct mm_struct *mm; > + unsigned long addr; > + unsigned long pfn; > + pgprot_t prot; > +}; > + > /* > * maps a range of physical memory into the requested pages. the old > * mappings are removed. any references to nonexistent pages results > * in null mappings (currently treated as "copy-on-access") > */ > -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pfn(struct remap_pfn *r, pte_t *pte) > +{ > + if (!pte_none(*pte)) > + return -EBUSY; > + > + set_pte_at(r->mm, r->addr, pte, > + pte_mkspecial(pfn_pte(r->pfn, r->prot))); > + r->pfn++; > + r->addr += PAGE_SIZE; > + return 0; > +} > + > +static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end) > { > pte_t *pte; > spinlock_t *ptl; > + int err; > > - pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > + pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl); > if (!pte) > return -ENOMEM; > + > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > - set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > - pfn++; > - } while (pte++, addr += PAGE_SIZE, addr != end); > + err = remap_pfn(r, pte++); > + } while (err == 0 && r->addr < end); > arch_leave_lazy_mmu_mode(); > + > pte_unmap_unlock(pte - 1, ptl); > - return 0; > + return err; > } > > -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end) > { > pmd_t *pmd; > - unsigned long next; > + int err; > > - pfn -= addr >> PAGE_SHIFT; > - pmd = pmd_alloc(mm, pud, addr); > + pmd = pmd_alloc(r->mm, pud, r->addr); > if (!pmd) > return -ENOMEM; > VM_BUG_ON(pmd_trans_huge(*pmd)); > + > do { > - next = pmd_addr_end(addr, end); > - if (remap_pte_range(mm, pmd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > - } while (pmd++, addr = next, addr != end); > - return 0; > + err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end)); > + } while (err == 0 && r->addr < end); > + > + return err; > } > > -static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end) > { > pud_t *pud; > - unsigned long next; > + int err; > > - pfn -= addr >> PAGE_SHIFT; > - pud = pud_alloc(mm, pgd, addr); > + pud = pud_alloc(r->mm, pgd, r->addr); > if (!pud) > return -ENOMEM; > + > do { > - next = pud_addr_end(addr, end); > - if (remap_pmd_range(mm, pud, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > - } while (pud++, addr = next, addr != end); > - return 0; > + err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end)); > + } while (err == 0 && r->addr < end); > + > + return err; > } > > /** > @@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > unsigned long pfn, unsigned long size, pgprot_t prot) > { > - pgd_t *pgd; > - unsigned long next; > unsigned long end = addr + PAGE_ALIGN(size); > - struct mm_struct *mm = vma->vm_mm; > + struct remap_pfn r; > + pgd_t *pgd; > int err; > > /* > @@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; > > BUG_ON(addr >= end); > - pfn -= addr >> PAGE_SHIFT; > - pgd = pgd_offset(mm, addr); > flush_cache_range(vma, addr, end); > + > + r.mm = vma->vm_mm; > + r.addr = addr; > + r.pfn = pfn; > + r.prot = prot; > + > + pgd = pgd_offset(r.mm, addr); > do { > - next = pgd_addr_end(addr, end); > - err = remap_pud_range(mm, pgd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot); > - if (err) > - break; > - } while (pgd++, addr = next, addr != end); > + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end)); > + } while (err == 0 && r.addr < end); > > - if (err) > + if (err) { > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > + BUG_ON(err == -EBUSY); We probably need a comment for the BUG_ON(). Otherwise, Acked-by: Kirill A. Shutemov > + } > > return err; > } -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com [209.85.192.172]) by kanga.kvack.org (Postfix) with ESMTP id 8D9126B0035 for ; Mon, 30 Jun 2014 10:32:46 -0400 (EDT) Received: by mail-pd0-f172.google.com with SMTP id w10so8355329pde.31 for ; Mon, 30 Jun 2014 07:32:46 -0700 (PDT) Received: from mga03.intel.com (mga03.intel.com. [143.182.124.21]) by mx.google.com with ESMTP id vq10si23352843pab.121.2014.06.30.07.32.45 for ; Mon, 30 Jun 2014 07:32:45 -0700 (PDT) From: "Kirill A. Shutemov" In-Reply-To: <1403366036-10169-3-git-send-email-chris@chris-wilson.co.uk> References: <20140619135944.20837E00A3@blue.fi.intel.com> <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> <1403366036-10169-3-git-send-email-chris@chris-wilson.co.uk> Subject: RE: [PATCH 3/4] mm: Export remap_io_mapping() Content-Transfer-Encoding: 7bit Message-Id: <20140630143240.25B87E00A3@blue.fi.intel.com> Date: Mon, 30 Jun 2014 17:32:40 +0300 (EEST) Sender: owner-linux-mm@kvack.org List-ID: To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, linux-mm@kvack.org, Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner Chris Wilson wrote: > This is similar to remap_pfn_range(), and uses the recently refactor > code to do the page table walking. The key difference is that is back > propagates its error as this is required for use from within a pagefault > handler. The other difference, is that it combine the page protection > from io-mapping, which is known from when the io-mapping is created, > with the per-vma page protection flags. This avoids having to walk the > entire system description to rediscover the special page protection > established for the io-mapping. > > Signed-off-by: Chris Wilson Looks okay to me: Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Date: Fri, 13 Jun 2014 17:26:18 +0100 Message-ID: <1402676778-27174-2-git-send-email-chris@chris-wilson.co.uk> References: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: intel-gfx@lists.freedesktop.org Cc: linux-mm@kvack.org List-Id: linux-mm.kvack.org On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8134MiB/s -> 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s -> 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s -> 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s -> 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s -> 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s -> 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s -> 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s -> 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s -> 5127MiB/s Signed-off-by: Chris Wilson Testcase: igt/gem_fence_upload/performance Testcase: igt/gem_mmap_gtt Reviewed-by: Brad Volkin Cc: linux-mm@kvack.org --- drivers/gpu/drm/i915/i915_gem.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c313cb2b641b..e6246634b419 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1565,22 +1565,23 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj); pfn >>= PAGE_SHIFT; - if (!obj->fault_mappable) { - int i; + ret = remap_pfn_range(vma, vma->vm_start, + pfn, vma->vm_end - vma->vm_start, + vma->vm_page_prot); + if (ret) { + /* After passing the sanity checks on remap_pfn_range(), we may + * abort whilst updating the pagetables due to ENOMEM and leave + * the tables in an inconsistent state. Reset them all now. + * However, we do not want to undo the work of another thread + * that beat us to prefaulting the PTEs. + */ + if (ret != -EBUSY) + zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); + goto unpin; + } - for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) { - ret = vm_insert_pfn(vma, - (unsigned long)vma->vm_start + i * PAGE_SIZE, - pfn + i); - if (ret) - break; - } + obj->fault_mappable = true; - obj->fault_mappable = true; - } else - ret = vm_insert_pfn(vma, - (unsigned long)vmf->virtual_address, - pfn + page_offset); unpin: i915_gem_object_ggtt_unpin(obj); unlock: -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Date: Fri, 13 Jun 2014 17:26:17 +0100 Message-ID: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: intel-gfx@lists.freedesktop.org Cc: Rik van Riel , Peter Zijlstra , Cyrill Gorcunov , linux-mm@kvack.org, Mel Gorman , Johannes Weiner , Andrew Morton , "Kirill A. Shutemov" List-Id: linux-mm.kvack.org When using remap_pfn_range() from a fault handler, we are exposed to races between concurrent faults. Rather than hitting a BUG, report the error back to the caller, like vm_insert_pfn(). Signed-off-by: Chris Wilson Cc: Andrew Morton Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Rik van Riel Cc: Mel Gorman Cc: Cyrill Gorcunov Cc: Johannes Weiner Cc: linux-mm@kvack.org --- mm/memory.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 037b812a9531..6603a9e6a731 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, { pte_t *pte; spinlock_t *ptl; + int ret = 0; pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; arch_enter_lazy_mmu_mode(); do { - BUG_ON(!pte_none(*pte)); + if (!pte_none(*pte)) { + ret = -EBUSY; + break; + } set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); - return 0; + return ret; } static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Date: Sat, 21 Jun 2014 16:53:54 +0100 Message-ID: <1403366036-10169-2-git-send-email-chris@chris-wilson.co.uk> References: <20140619135944.20837E00A3@blue.fi.intel.com> <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: intel-gfx@lists.freedesktop.org Cc: linux-mm@kvack.org List-Id: linux-mm.kvack.org Currently, we only allocate a structure to hold metadata if we need to allocate an ioremap for every access, such as on x86-32. However, it would be useful to store basic information about the io-mapping, such as its page protection, on all platforms. Signed-off-by: Chris Wilson Cc: linux-mm@kvack.org --- include/linux/io-mapping.h | 52 ++++++++++++++++++++++++++++------------------ 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h index 657fab4efab3..e053011f50bb 100644 --- a/include/linux/io-mapping.h +++ b/include/linux/io-mapping.h @@ -31,16 +31,17 @@ * See Documentation/io-mapping.txt */ -#ifdef CONFIG_HAVE_ATOMIC_IOMAP - -#include - struct io_mapping { resource_size_t base; unsigned long size; pgprot_t prot; + void __iomem *iomem; }; + +#ifdef CONFIG_HAVE_ATOMIC_IOMAP + +#include /* * For small address space machines, mapping large objects * into the kernel virtual space isn't practical. Where @@ -119,48 +120,59 @@ io_mapping_unmap(void __iomem *vaddr) #else #include - -/* this struct isn't actually defined anywhere */ -struct io_mapping; +#include /* Create the io_mapping object*/ static inline struct io_mapping * io_mapping_create_wc(resource_size_t base, unsigned long size) { - return (struct io_mapping __force *) ioremap_wc(base, size); + struct io_mapping *iomap; + + iomap = kmalloc(sizeof(*iomap), GFP_KERNEL); + if (!iomap) + return NULL; + + iomap->base = base; + iomap->size = size; + iomap->iomem = ioremap_wc(base, size); + iomap->prot = pgprot_writecombine(PAGE_KERNEL_IO); + + return iomap; } static inline void io_mapping_free(struct io_mapping *mapping) { - iounmap((void __force __iomem *) mapping); + iounmap(mapping->iomem); + kfree(mapping); } -/* Atomic map/unmap */ +/* Non-atomic map/unmap */ static inline void __iomem * -io_mapping_map_atomic_wc(struct io_mapping *mapping, - unsigned long offset) +io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset) { - pagefault_disable(); - return ((char __force __iomem *) mapping) + offset; + return mapping->iomem + offset; } static inline void -io_mapping_unmap_atomic(void __iomem *vaddr) +io_mapping_unmap(void __iomem *vaddr) { - pagefault_enable(); } -/* Non-atomic map/unmap */ +/* Atomic map/unmap */ static inline void __iomem * -io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset) +io_mapping_map_atomic_wc(struct io_mapping *mapping, + unsigned long offset) { - return ((char __force __iomem *) mapping) + offset; + pagefault_disable(); + return io_mapping_map_wc(mapping, offset); } static inline void -io_mapping_unmap(void __iomem *vaddr) +io_mapping_unmap_atomic(void __iomem *vaddr) { + io_mapping_unmap(vaddr); + pagefault_enable(); } #endif /* HAVE_ATOMIC_IOMAP */ -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Date: Sat, 21 Jun 2014 16:53:56 +0100 Message-ID: <1403366036-10169-4-git-send-email-chris@chris-wilson.co.uk> References: <20140619135944.20837E00A3@blue.fi.intel.com> <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: intel-gfx@lists.freedesktop.org Cc: linux-mm@kvack.org List-Id: linux-mm.kvack.org On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8134MiB/s -> 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s -> 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s -> 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s -> 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s -> 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s -> 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s -> 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s -> 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s -> 5202MiB/s Signed-off-by: Chris Wilson Testcase: igt/gem_fence_upload/performance Testcase: igt/gem_mmap_gtt Reviewed-by: Brad Volkin Cc: linux-mm@kvack.org --- drivers/gpu/drm/i915/i915_gem.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f6d123828926..f0628e40cb1d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1565,25 +1565,14 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj); pfn >>= PAGE_SHIFT; - if (!obj->fault_mappable) { - unsigned long size = min_t(unsigned long, - vma->vm_end - vma->vm_start, - obj->base.size); - int i; + ret = remap_io_mapping(vma, + vma->vm_start, pfn, vma->vm_end - vma->vm_start, + dev_priv->gtt.mappable); + if (ret) + goto unpin; - for (i = 0; i < size >> PAGE_SHIFT; i++) { - ret = vm_insert_pfn(vma, - (unsigned long)vma->vm_start + i * PAGE_SIZE, - pfn + i); - if (ret) - break; - } + obj->fault_mappable = true; - obj->fault_mappable = true; - } else - ret = vm_insert_pfn(vma, - (unsigned long)vmf->virtual_address, - pfn + page_offset); unpin: i915_gem_object_ggtt_unpin(obj); unlock: -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: [PATCH 3/4] mm: Export remap_io_mapping() Date: Sat, 21 Jun 2014 16:53:55 +0100 Message-ID: <1403366036-10169-3-git-send-email-chris@chris-wilson.co.uk> References: <20140619135944.20837E00A3@blue.fi.intel.com> <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: intel-gfx@lists.freedesktop.org Cc: Rik van Riel , Peter Zijlstra , Cyrill Gorcunov , linux-mm@kvack.org, Mel Gorman , Johannes Weiner , Andrew Morton , "Kirill A. Shutemov" List-Id: linux-mm.kvack.org This is similar to remap_pfn_range(), and uses the recently refactor code to do the page table walking. The key difference is that is back propagates its error as this is required for use from within a pagefault handler. The other difference, is that it combine the page protection from io-mapping, which is known from when the io-mapping is created, with the per-vma page protection flags. This avoids having to walk the entire system description to rediscover the special page protection established for the io-mapping. Signed-off-by: Chris Wilson Cc: Andrew Morton Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Rik van Riel Cc: Mel Gorman Cc: Cyrill Gorcunov Cc: Johannes Weiner Cc: linux-mm@kvack.org --- include/linux/mm.h | 4 ++++ mm/memory.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index d6777060449f..aa766bbc6981 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1941,6 +1941,10 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); +struct io_mapping; +int remap_io_mapping(struct vm_area_struct *, + unsigned long addr, unsigned long pfn, unsigned long size, + struct io_mapping *iomap); int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); diff --git a/mm/memory.c b/mm/memory.c index d2c7fe88a289..8af2bd2de98e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -61,6 +61,7 @@ #include #include #include +#include #include #include @@ -2443,6 +2444,51 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, EXPORT_SYMBOL(remap_pfn_range); /** + * remap_io_mapping - remap an IO mapping to userspace + * @vma: user vma to map to + * @addr: target user address to start at + * @pfn: physical address of kernel memory + * @size: size of map area + * @iomap: the source io_mapping + * + * Note: this is only safe if the mm semaphore is held when called. + */ +int remap_io_mapping(struct vm_area_struct *vma, + unsigned long addr, unsigned long pfn, unsigned long size, + struct io_mapping *iomap) +{ + unsigned long end = addr + PAGE_ALIGN(size); + struct remap_pfn r; + pgd_t *pgd; + int err; + + if (WARN_ON(addr >= end)) + return -EINVAL; + +#define MUST_SET (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP) + BUG_ON(is_cow_mapping(vma->vm_flags)); + BUG_ON((vma->vm_flags & MUST_SET) != MUST_SET); +#undef MUST_SET + + r.mm = vma->vm_mm; + r.addr = addr; + r.pfn = pfn; + r.prot = __pgprot((pgprot_val(iomap->prot) & _PAGE_CACHE_MASK) | + (pgprot_val(vma->vm_page_prot) & ~_PAGE_CACHE_MASK)); + + pgd = pgd_offset(r.mm, addr); + do { + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end)); + } while (err == 0 && r.addr < end); + + if (err) + zap_page_range_single(vma, addr, r.addr - addr, NULL); + + return err; +} +EXPORT_SYMBOL(remap_io_mapping); + +/** * vm_iomap_memory - remap memory to userspace * @vma: user vma to map to * @start: start of area -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: [PATCH 1/4] mm: Refactor remap_pfn_range() Date: Sat, 21 Jun 2014 16:53:53 +0100 Message-ID: <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> References: <20140619135944.20837E00A3@blue.fi.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20140619135944.20837E00A3@blue.fi.intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: intel-gfx@lists.freedesktop.org Cc: Rik van Riel , Peter Zijlstra , Cyrill Gorcunov , linux-mm@kvack.org, Mel Gorman , Johannes Weiner , Andrew Morton , "Kirill A. Shutemov" List-Id: linux-mm.kvack.org In preparation for exporting very similar functionality through another interface, gut the current remap_pfn_range(). The motivating factor here is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of errors rather than BUG_ON. Signed-off-by: Chris Wilson Cc: Andrew Morton Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Rik van Riel Cc: Mel Gorman Cc: Cyrill Gorcunov Cc: Johannes Weiner Cc: linux-mm@kvack.org --- mm/memory.c | 102 +++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 57 insertions(+), 45 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 037b812a9531..d2c7fe88a289 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, } EXPORT_SYMBOL(vm_insert_mixed); +struct remap_pfn { + struct mm_struct *mm; + unsigned long addr; + unsigned long pfn; + pgprot_t prot; +}; + /* * maps a range of physical memory into the requested pages. the old * mappings are removed. any references to nonexistent pages results * in null mappings (currently treated as "copy-on-access") */ -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_pfn(struct remap_pfn *r, pte_t *pte) +{ + if (!pte_none(*pte)) + return -EBUSY; + + set_pte_at(r->mm, r->addr, pte, + pte_mkspecial(pfn_pte(r->pfn, r->prot))); + r->pfn++; + r->addr += PAGE_SIZE; + return 0; +} + +static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end) { pte_t *pte; spinlock_t *ptl; + int err; - pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); + pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl); if (!pte) return -ENOMEM; + arch_enter_lazy_mmu_mode(); do { - BUG_ON(!pte_none(*pte)); - set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); - pfn++; - } while (pte++, addr += PAGE_SIZE, addr != end); + err = remap_pfn(r, pte++); + } while (err == 0 && r->addr < end); arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(pte - 1, ptl); - return 0; + return err; } -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end) { pmd_t *pmd; - unsigned long next; + int err; - pfn -= addr >> PAGE_SHIFT; - pmd = pmd_alloc(mm, pud, addr); + pmd = pmd_alloc(r->mm, pud, r->addr); if (!pmd) return -ENOMEM; VM_BUG_ON(pmd_trans_huge(*pmd)); + do { - next = pmd_addr_end(addr, end); - if (remap_pte_range(mm, pmd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; - } while (pmd++, addr = next, addr != end); - return 0; + err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end)); + } while (err == 0 && r->addr < end); + + return err; } -static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end) { pud_t *pud; - unsigned long next; + int err; - pfn -= addr >> PAGE_SHIFT; - pud = pud_alloc(mm, pgd, addr); + pud = pud_alloc(r->mm, pgd, r->addr); if (!pud) return -ENOMEM; + do { - next = pud_addr_end(addr, end); - if (remap_pmd_range(mm, pud, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; - } while (pud++, addr = next, addr != end); - return 0; + err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end)); + } while (err == 0 && r->addr < end); + + return err; } /** @@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t prot) { - pgd_t *pgd; - unsigned long next; unsigned long end = addr + PAGE_ALIGN(size); - struct mm_struct *mm = vma->vm_mm; + struct remap_pfn r; + pgd_t *pgd; int err; /* @@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; BUG_ON(addr >= end); - pfn -= addr >> PAGE_SHIFT; - pgd = pgd_offset(mm, addr); flush_cache_range(vma, addr, end); + + r.mm = vma->vm_mm; + r.addr = addr; + r.pfn = pfn; + r.prot = prot; + + pgd = pgd_offset(r.mm, addr); do { - next = pgd_addr_end(addr, end); - err = remap_pud_range(mm, pgd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot); - if (err) - break; - } while (pgd++, addr = next, addr != end); + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end)); + } while (err == 0 && r.addr < end); - if (err) + if (err) { untrack_pfn(vma, pfn, PAGE_ALIGN(size)); + BUG_ON(err == -EBUSY); + } return err; } -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: RE: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Date: Mon, 16 Jun 2014 16:41:24 +0300 (EEST) Message-ID: <20140616134124.0ED73E00A2@blue.fi.intel.com> References: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1402676778-27174-1-git-send-email-chris@chris-wilson.co.uk> Sender: owner-linux-mm@kvack.org Cc: intel-gfx@lists.freedesktop.org, Chris Wilson , Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner , linux-mm@kvack.org List-Id: intel-gfx@lists.freedesktop.org Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > Signed-off-by: Chris Wilson > Cc: Andrew Morton > Cc: "Kirill A. Shutemov" > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Cyrill Gorcunov > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > --- > mm/memory.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..6603a9e6a731 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > { > pte_t *pte; > spinlock_t *ptl; > + int ret = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + ret = -EBUSY; > + break; I think you need at least remove entries you've setup if the check failed not at first iteration. And nobody propagate your -EBUSY back to remap_pfn_range(): caller will see -ENOMEM, which is not what you want, I believe. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Date: Thu, 19 Jun 2014 14:50:18 +0300 (EEST) Message-ID: <20140619115018.412D2E00A3@blue.fi.intel.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga03.intel.com (mga03.intel.com [143.182.124.21]) by gabe.freedesktop.org (Postfix) with ESMTP id 99D6E6E88A for ; Thu, 19 Jun 2014 04:50:53 -0700 (PDT) In-Reply-To: <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson Cc: Rik van Riel , Peter Zijlstra , intel-gfx@lists.freedesktop.org, Cyrill Gorcunov , linux-mm@kvack.org, Mel Gorman , Johannes Weiner , Andrew Morton , "Kirill A. Shutemov" List-Id: intel-gfx@lists.freedesktop.org Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > v2: Fix the pte address for unmapping along the error path. > v3: Report the error back and cleanup partial remaps. > > Signed-off-by: Chris Wilson > Cc: Andrew Morton > Cc: "Kirill A. Shutemov" > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Cyrill Gorcunov > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > --- > > Whilst this has the semantics I want to allow two concurrent, but > serialised, pagefaults that try to prefault the same object to succeed, > it looks fragile and fraught with subtlety. > -Chris > > --- > mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 38 insertions(+), 16 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index d67fd9f..be51fcc 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed); > * in null mappings (currently treated as "copy-on-access") > */ > static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, > + bool first) > { With this long parameter list, wouldn't it cleaner to pass down a point to structure instead? This could simplify the code, I believe. > pte_t *pte; > spinlock_t *ptl; > + int err = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + err = first ? -EBUSY : -EINVAL; > + pte++; > + break; > + } > + first = false; > set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > pfn++; > } while (pte++, addr += PAGE_SIZE, addr != end); > arch_leave_lazy_mmu_mode(); > pte_unmap_unlock(pte - 1, ptl); > - return 0; > + return err; > } > > static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, > + bool first) > { > pmd_t *pmd; > unsigned long next; > + int err; > > pfn -= addr >> PAGE_SHIFT; > pmd = pmd_alloc(mm, pud, addr); > @@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > VM_BUG_ON(pmd_trans_huge(*pmd)); > do { > next = pmd_addr_end(addr, end); > - if (remap_pte_range(mm, pmd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > + err = remap_pte_range(mm, pmd, addr, next, > + pfn + (addr >> PAGE_SHIFT), prot, first); > + if (err) > + return err; > + > + first = false; > } while (pmd++, addr = next, addr != end); > return 0; > } > > static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, bool first) > { > pud_t *pud; > unsigned long next; > + int err; > > pfn -= addr >> PAGE_SHIFT; > pud = pud_alloc(mm, pgd, addr); > @@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > return -ENOMEM; > do { > next = pud_addr_end(addr, end); > - if (remap_pmd_range(mm, pud, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > + err = remap_pmd_range(mm, pud, addr, next, > + pfn + (addr >> PAGE_SHIFT), prot, first); > + if (err) > + return err; > + > + first = false; > } while (pud++, addr = next, addr != end); > return 0; > } > @@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > unsigned long next; > unsigned long end = addr + PAGE_ALIGN(size); > struct mm_struct *mm = vma->vm_mm; > + bool first = true; > int err; > > /* > @@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > do { > next = pgd_addr_end(addr, end); > err = remap_pud_range(mm, pgd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot); > + pfn + (addr >> PAGE_SHIFT), prot, first); > if (err) > break; > + > + first = false; > } while (pgd++, addr = next, addr != end); > > - if (err) > + if (err) { > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > + if (err != -EBUSY) > + zap_page_range_single(vma, addr, size, NULL); Hm. If I read it correctly, you zap whole range, not only what you've set up. Looks wrong. And for after zap, you probably whant to return -EBUSY to caller of remap_pfn_range(), not -EINVAL. > + } > > return err; > } > -- > 1.9.1 > -- Kirill A. Shutemov From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Date: Thu, 19 Jun 2014 15:57:46 +0300 (EEST) Message-ID: <20140619125746.25A03E00A3@blue.fi.intel.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> <20140619115018.412D2E00A3@blue.fi.intel.com> <20140619120004.GC25975@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 4628E6E12D for ; Thu, 19 Jun 2014 05:57:50 -0700 (PDT) In-Reply-To: <20140619120004.GC25975@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson Cc: Rik van Riel , Peter Zijlstra , intel-gfx@lists.freedesktop.org, Cyrill Gorcunov , linux-mm@kvack.org, Mel Gorman , Johannes Weiner , Andrew Morton , "Kirill A. Shutemov" List-Id: intel-gfx@lists.freedesktop.org Chris Wilson wrote: > On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote: > > > + if (err) { > > > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > > > + if (err != -EBUSY) > > > + zap_page_range_single(vma, addr, size, NULL); > > > > Hm. If I read it correctly, you zap whole range, not only what you've > > set up. Looks wrong. > > Yes. I didn't fancy threading the last touched pte back, but that should > be easier if moving to a struct. > > > And for after zap, you probably whant to return -EBUSY to caller of > > remap_pfn_range(), not -EINVAL. > > No, it has to be EINVAL for my purpose. If we return EBUSY, the caller > will just report VM_NOPAGE back to the fault handler, and the fault will > be retriggered - but the overlapping object will still be present. IIUC, what you're saying makes sense only if the range starts from PTE you've got fault to. I failed to see why this assumption should be part of remap_pfn_range() interface. One possible option is to create a variant of remap_pfn_range() which will return how many PTEs it was able to setup, before hitting the !pte_none(). Caller will decide what to do with partially filled range. -- Kirill A. Shutemov From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() Date: Thu, 19 Jun 2014 16:59:44 +0300 (EEST) Message-ID: <20140619135944.20837E00A3@blue.fi.intel.com> References: <20140616134124.0ED73E00A2@blue.fi.intel.com> <1403162349-14512-1-git-send-email-chris@chris-wilson.co.uk> <20140619115018.412D2E00A3@blue.fi.intel.com> <20140619120004.GC25975@nuc-i3427.alporthouse.com> <20140619125746.25A03E00A3@blue.fi.intel.com> <20140619132240.GF25975@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 2E1BD6E8BD for ; Thu, 19 Jun 2014 06:59:53 -0700 (PDT) In-Reply-To: <20140619132240.GF25975@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson Cc: Rik van Riel , Peter Zijlstra , intel-gfx@lists.freedesktop.org, Cyrill Gorcunov , linux-mm@kvack.org, Mel Gorman , Johannes Weiner , Andrew Morton , "Kirill A. Shutemov" List-Id: intel-gfx@lists.freedesktop.org Chris Wilson wrote: > On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote: > > One possible option is to create a variant of remap_pfn_range() which will > > return how many PTEs it was able to setup, before hitting the !pte_none(). > > Caller will decide what to do with partially filled range. > > Looked at just returning the address remap_pfn_range() got up to, which is > easy enough, but I think given that remap_pfn_range() will clean up > correctly after a failed remap, any EBUSY from partway through would be > a pathological driver error. I would prefer keep remap_pfn_range() interface intact with BUG_ON() on unexpected !pte_none() and introduce new function with more flexible behaviour (sharing underlying infrastructure). This way we can avoid changing every remap_pfn_range() caller. -- Kirill A. Shutemov From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: RE: [PATCH 1/4] mm: Refactor remap_pfn_range() Date: Mon, 30 Jun 2014 17:26:59 +0300 (EEST) Message-ID: <20140630142659.B6F7DE00A3@blue.fi.intel.com> References: <20140619135944.20837E00A3@blue.fi.intel.com> <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> Sender: owner-linux-mm@kvack.org Cc: intel-gfx@lists.freedesktop.org, linux-mm@kvack.org, Chris Wilson , Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner List-Id: intel-gfx@lists.freedesktop.org Chris Wilson wrote: > In preparation for exporting very similar functionality through another > interface, gut the current remap_pfn_range(). The motivating factor here > is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of > errors rather than BUG_ON. > > Signed-off-by: Chris Wilson > Cc: Andrew Morton > Cc: "Kirill A. Shutemov" > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Mel Gorman > Cc: Cyrill Gorcunov > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > --- > mm/memory.c | 102 +++++++++++++++++++++++++++++++++--------------------------- > 1 file changed, 57 insertions(+), 45 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..d2c7fe88a289 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, > } > EXPORT_SYMBOL(vm_insert_mixed); > > +struct remap_pfn { > + struct mm_struct *mm; > + unsigned long addr; > + unsigned long pfn; > + pgprot_t prot; > +}; > + > /* > * maps a range of physical memory into the requested pages. the old > * mappings are removed. any references to nonexistent pages results > * in null mappings (currently treated as "copy-on-access") > */ > -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pfn(struct remap_pfn *r, pte_t *pte) > +{ > + if (!pte_none(*pte)) > + return -EBUSY; > + > + set_pte_at(r->mm, r->addr, pte, > + pte_mkspecial(pfn_pte(r->pfn, r->prot))); > + r->pfn++; > + r->addr += PAGE_SIZE; > + return 0; > +} > + > +static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end) > { > pte_t *pte; > spinlock_t *ptl; > + int err; > > - pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > + pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl); > if (!pte) > return -ENOMEM; > + > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > - set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > - pfn++; > - } while (pte++, addr += PAGE_SIZE, addr != end); > + err = remap_pfn(r, pte++); > + } while (err == 0 && r->addr < end); > arch_leave_lazy_mmu_mode(); > + > pte_unmap_unlock(pte - 1, ptl); > - return 0; > + return err; > } > > -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end) > { > pmd_t *pmd; > - unsigned long next; > + int err; > > - pfn -= addr >> PAGE_SHIFT; > - pmd = pmd_alloc(mm, pud, addr); > + pmd = pmd_alloc(r->mm, pud, r->addr); > if (!pmd) > return -ENOMEM; > VM_BUG_ON(pmd_trans_huge(*pmd)); > + > do { > - next = pmd_addr_end(addr, end); > - if (remap_pte_range(mm, pmd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > - } while (pmd++, addr = next, addr != end); > - return 0; > + err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end)); > + } while (err == 0 && r->addr < end); > + > + return err; > } > > -static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end) > { > pud_t *pud; > - unsigned long next; > + int err; > > - pfn -= addr >> PAGE_SHIFT; > - pud = pud_alloc(mm, pgd, addr); > + pud = pud_alloc(r->mm, pgd, r->addr); > if (!pud) > return -ENOMEM; > + > do { > - next = pud_addr_end(addr, end); > - if (remap_pmd_range(mm, pud, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > - } while (pud++, addr = next, addr != end); > - return 0; > + err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end)); > + } while (err == 0 && r->addr < end); > + > + return err; > } > > /** > @@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > unsigned long pfn, unsigned long size, pgprot_t prot) > { > - pgd_t *pgd; > - unsigned long next; > unsigned long end = addr + PAGE_ALIGN(size); > - struct mm_struct *mm = vma->vm_mm; > + struct remap_pfn r; > + pgd_t *pgd; > int err; > > /* > @@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; > > BUG_ON(addr >= end); > - pfn -= addr >> PAGE_SHIFT; > - pgd = pgd_offset(mm, addr); > flush_cache_range(vma, addr, end); > + > + r.mm = vma->vm_mm; > + r.addr = addr; > + r.pfn = pfn; > + r.prot = prot; > + > + pgd = pgd_offset(r.mm, addr); > do { > - next = pgd_addr_end(addr, end); > - err = remap_pud_range(mm, pgd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot); > - if (err) > - break; > - } while (pgd++, addr = next, addr != end); > + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end)); > + } while (err == 0 && r.addr < end); > > - if (err) > + if (err) { > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > + BUG_ON(err == -EBUSY); We probably need a comment for the BUG_ON(). Otherwise, Acked-by: Kirill A. Shutemov > + } > > return err; > } -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: RE: [PATCH 3/4] mm: Export remap_io_mapping() Date: Mon, 30 Jun 2014 17:32:40 +0300 (EEST) Message-ID: <20140630143240.25B87E00A3@blue.fi.intel.com> References: <20140619135944.20837E00A3@blue.fi.intel.com> <1403366036-10169-1-git-send-email-chris@chris-wilson.co.uk> <1403366036-10169-3-git-send-email-chris@chris-wilson.co.uk> Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1403366036-10169-3-git-send-email-chris@chris-wilson.co.uk> Sender: owner-linux-mm@kvack.org Cc: intel-gfx@lists.freedesktop.org, linux-mm@kvack.org, Chris Wilson , Andrew Morton , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman , Cyrill Gorcunov , Johannes Weiner List-Id: intel-gfx@lists.freedesktop.org Chris Wilson wrote: > This is similar to remap_pfn_range(), and uses the recently refactor > code to do the page table walking. The key difference is that is back > propagates its error as this is required for use from within a pagefault > handler. The other difference, is that it combine the page protection > from io-mapping, which is known from when the io-mapping is created, > with the per-vma page protection flags. This avoids having to walk the > entire system description to rediscover the special page protection > established for the io-mapping. > > Signed-off-by: Chris Wilson Looks okay to me: Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org