All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
       [not found] <bug-117731-27@https.bugzilla.kernel.org/>
@ 2016-05-06 22:01 ` Andrew Morton
  2016-05-09 18:07   ` Peter Feiner
  2016-05-16 13:35   ` Kirill A. Shutemov
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2016-05-06 22:01 UTC (permalink / raw)
  To: ashish0srivastava0
  Cc: bugzilla-daemon, Peter Feiner, Kirill A. Shutemov, linux-mm


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

Great bug report, thanks.

I assume the breakage was caused by

commit 64e455079e1bd7787cc47be30b7f601ce682a5f6
Author:     Peter Feiner <pfeiner@google.com>
AuthorDate: Mon Oct 13 15:55:46 2014 -0700
Commit:     Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Tue Oct 14 02:18:28 2014 +0200

    mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
    

Could someone (Peter, Kirill?) please take a look?

On Fri, 06 May 2016 13:15:19 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=117731
> 
>             Bug ID: 117731
>            Summary: Doing mprotect for PROT_NONE and then for
>                     PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.18 and beyond
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: ashish0srivastava0@gmail.com
>         Regression: No
> 
> Created attachment 215401
>   --> https://bugzilla.kernel.org/attachment.cgi?id=215401&action=edit
> Repro code
> 
> This is a regression that is present in kernel 3.18 and beyond and not in
> previous ones.
> Attached is a simple repro case. It measures the time taken to write and then
> read all pages in a buffer, then it does mprotect for PROT_NONE and then
> mprotect for PROT_READ|PROT_WRITE, then it again measures time taken to write
> and then read all pages in a buffer. The 2nd time taken is much larger (20 to
> 30 times) than the first one.
> 
> I have looked at the code in the kernel tree that is causing this and it is
> because writes are causing faults, as pte_mkwrite is not being done during
> mprotect_fixup for PROT_READ|PROT_WRITE.
> 
> This is the code inside mprotect_fixup in a tree v3.16.35 or older:
>     /*
>      * vm_flags and vm_page_prot are protected by the mmap_sem
>      * held in write mode.
>      */
>     vma->vm_flags = newflags;
>     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
>                       vm_get_page_prot(newflags));
> 
>     if (vma_wants_writenotify(vma)) {
>         vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
>         dirty_accountable = 1;
>     }
> This is the code in the same region inside mprotect_fixup in a recent tree:
>     /*
>      * vm_flags and vm_page_prot are protected by the mmap_sem
>      * held in write mode.
>      */
>     vma->vm_flags = newflags;
>     dirty_accountable = vma_wants_writenotify(vma);
>     vma_set_page_prot(vma);
> 
> The difference is the setting of dirty_accountable. result of
> vma_wants_writenotify does not depend on vma->vm_flags alone but also depends
> on vma->vm_page_prot and following code will make it return 0 because in newer
> code we are setting dirty_accountable before setting vma->vm_page_prot.
>     /* The open routine did something to the protections that pgprot_modify
>      * won't preserve? */
>     if (pgprot_val(vma->vm_page_prot) !=
>         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
>         return 0;
> 
> Now, suppose we change code by calling vma_set_page_prot before setting
> dirty_accountable:
>     vma->vm_flags = newflags;
>     vma_set_page_prot(vma);
>     dirty_accountable = vma_wants_writenotify(vma);
> Still, dirty_accountable will be 0. This is because following code in
> vma_set_page_prot modifies vma->vm_page_prot without modifying vma->vm_flags:
>     if (vma_wants_writenotify(vma)) {
>         vm_flags &= ~VM_SHARED;
>         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
>                              vm_flags);
>     }
> so this check in vma_wants_writenotify will again return 0: 
>     /* The open routine did something to the protections that pgprot_modify
>      * won't preserve? */
>     if (pgprot_val(vma->vm_page_prot) !=
>         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
>         return 0;
> So dirty_accountable is still 0.
> 
> This code in change_pte_range decides whether to call pte_mkwrite or not:
>             /* Avoid taking write faults for known dirty pages */
>             if (dirty_accountable && pte_dirty(ptent) &&
>                     (pte_soft_dirty(ptent) ||
>                      !(vma->vm_flags & VM_SOFTDIRTY))) {
>                 ptent = pte_mkwrite(ptent);
>             }
> If dirty_accountable is 0 even though the pte was dirty already, pte_mkwrite
> will not be done.
> 
> I think the correct solution should be that dirty_accountable be set with the
> value of vma_wants_writenotify queried before vma->vm_page_prot is set with
> VM_SHARED removed from flags. One way to do so could be to have
> vma_set_page_prot return the value of dirty_accountable that it can set right
> after vma_wants_writenotify check. Another way could be to do
>     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
>                       vm_get_page_prot(newflags));
> and then set dirty_accountable based on vma_wants_writenotify and then call
> vma_set_page_prot.
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
  2016-05-06 22:01 ` [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer Andrew Morton
@ 2016-05-09 18:07   ` Peter Feiner
  2016-05-16 13:35   ` Kirill A. Shutemov
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Feiner @ 2016-05-09 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ashish0srivastava0, bugzilla-daemon, Kirill A. Shutemov, linux-mm

On Fri, May 6, 2016 at 3:01 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> Great bug report, thanks.
>
> I assume the breakage was caused by
>
> commit 64e455079e1bd7787cc47be30b7f601ce682a5f6
> Author:     Peter Feiner <pfeiner@google.com>
> AuthorDate: Mon Oct 13 15:55:46 2014 -0700
> Commit:     Linus Torvalds <torvalds@linux-foundation.org>
> CommitDate: Tue Oct 14 02:18:28 2014 +0200
>
>     mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
>
>
> Could someone (Peter, Kirill?) please take a look?

Thanks for the report! I'm taking a look.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
  2016-05-06 22:01 ` [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer Andrew Morton
  2016-05-09 18:07   ` Peter Feiner
@ 2016-05-16 13:35   ` Kirill A. Shutemov
  2016-05-17 11:26     ` Ashish Srivastava
  1 sibling, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2016-05-16 13:35 UTC (permalink / raw)
  To: Andrew Morton, ashish0srivastava0; +Cc: bugzilla-daemon, Peter Feiner, linux-mm

On Fri, May 06, 2016 at 03:01:12PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> Great bug report, thanks.
> 
> I assume the breakage was caused by
> 
> commit 64e455079e1bd7787cc47be30b7f601ce682a5f6
> Author:     Peter Feiner <pfeiner@google.com>
> AuthorDate: Mon Oct 13 15:55:46 2014 -0700
> Commit:     Linus Torvalds <torvalds@linux-foundation.org>
> CommitDate: Tue Oct 14 02:18:28 2014 +0200
> 
>     mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
>     
> 
> Could someone (Peter, Kirill?) please take a look?
> 
> On Fri, 06 May 2016 13:15:19 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=117731
> > 
> >             Bug ID: 117731
> >            Summary: Doing mprotect for PROT_NONE and then for
> >                     PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 3.18 and beyond
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: ashish0srivastava0@gmail.com
> >         Regression: No
> > 
> > Created attachment 215401
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=215401&action=edit
> > Repro code

The code is somewhat broken: malloc doesn't guarantee to return
page-aligned pointer. And in my case it leads -EINVAL from mprotect().

Do you have a custom malloc()?

> > This is a regression that is present in kernel 3.18 and beyond and not in
> > previous ones.
> > Attached is a simple repro case. It measures the time taken to write and then
> > read all pages in a buffer, then it does mprotect for PROT_NONE and then
> > mprotect for PROT_READ|PROT_WRITE, then it again measures time taken to write
> > and then read all pages in a buffer. The 2nd time taken is much larger (20 to
> > 30 times) than the first one.
> > 
> > I have looked at the code in the kernel tree that is causing this and it is
> > because writes are causing faults, as pte_mkwrite is not being done during
> > mprotect_fixup for PROT_READ|PROT_WRITE.
> > 
> > This is the code inside mprotect_fixup in a tree v3.16.35 or older:
> >     /*
> >      * vm_flags and vm_page_prot are protected by the mmap_sem
> >      * held in write mode.
> >      */
> >     vma->vm_flags = newflags;
> >     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
> >                       vm_get_page_prot(newflags));
> > 
> >     if (vma_wants_writenotify(vma)) {
> >         vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
> >         dirty_accountable = 1;
> >     }
> > This is the code in the same region inside mprotect_fixup in a recent tree:
> >     /*
> >      * vm_flags and vm_page_prot are protected by the mmap_sem
> >      * held in write mode.
> >      */
> >     vma->vm_flags = newflags;
> >     dirty_accountable = vma_wants_writenotify(vma);
> >     vma_set_page_prot(vma);
> > 
> > The difference is the setting of dirty_accountable. result of
> > vma_wants_writenotify does not depend on vma->vm_flags alone but also depends
> > on vma->vm_page_prot and following code will make it return 0 because in newer
> > code we are setting dirty_accountable before setting vma->vm_page_prot.
> >     /* The open routine did something to the protections that pgprot_modify
> >      * won't preserve? */
> >     if (pgprot_val(vma->vm_page_prot) !=
> >         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
> >         return 0;

The test-case will never hit this, as normal malloc() returns anonymous
memory, which is handled by the first check in vma_wants_writenotify().

The only case when the case can change anything for you is if your
malloc() return file-backed memory. Which is possible, I guess, with
custom malloc().

> > Now, suppose we change code by calling vma_set_page_prot before setting
> > dirty_accountable:
> >     vma->vm_flags = newflags;
> >     vma_set_page_prot(vma);
> >     dirty_accountable = vma_wants_writenotify(vma);
> > Still, dirty_accountable will be 0. This is because following code in
> > vma_set_page_prot modifies vma->vm_page_prot without modifying vma->vm_flags:
> >     if (vma_wants_writenotify(vma)) {
> >         vm_flags &= ~VM_SHARED;
> >         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
> >                              vm_flags);
> >     }
> > so this check in vma_wants_writenotify will again return 0: 
> >     /* The open routine did something to the protections that pgprot_modify
> >      * won't preserve? */
> >     if (pgprot_val(vma->vm_page_prot) !=
> >         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
> >         return 0;
> > So dirty_accountable is still 0.
> > 
> > This code in change_pte_range decides whether to call pte_mkwrite or not:
> >             /* Avoid taking write faults for known dirty pages */
> >             if (dirty_accountable && pte_dirty(ptent) &&
> >                     (pte_soft_dirty(ptent) ||
> >                      !(vma->vm_flags & VM_SOFTDIRTY))) {
> >                 ptent = pte_mkwrite(ptent);
> >             }
> > If dirty_accountable is 0 even though the pte was dirty already, pte_mkwrite
> > will not be done.
> > 
> > I think the correct solution should be that dirty_accountable be set with the
> > value of vma_wants_writenotify queried before vma->vm_page_prot is set with
> > VM_SHARED removed from flags. One way to do so could be to have
> > vma_set_page_prot return the value of dirty_accountable that it can set right
> > after vma_wants_writenotify check. Another way could be to do
> >     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
> >                       vm_get_page_prot(newflags));
> > and then set dirty_accountable based on vma_wants_writenotify and then call
> > vma_set_page_prot.

Looks like a good catch, but I'm not sure if it's the root cause of your
problem.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
  2016-05-16 13:35   ` Kirill A. Shutemov
@ 2016-05-17 11:26     ` Ashish Srivastava
  2016-05-17 11:36       ` Kirill A. Shutemov
  2016-05-17 15:51       ` Peter Feiner
  0 siblings, 2 replies; 8+ messages in thread
From: Ashish Srivastava @ 2016-05-17 11:26 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Andrew Morton, bugzilla-daemon, Peter Feiner, linux-mm

[-- Attachment #1: Type: text/plain, Size: 7893 bytes --]

Yes, the original repro was using a custom allocator but I was seeing the
issue with malloc'd memory as well on my (ARMv7) platform.
I agree that the repro code won't reliably work so have modified the repro
code attached to the bug to use file backed memory.

That really is the root cause of the problem. I can make the following
change in the kernel that can make the slow writes problem go away.
This makes vma_set_page_prot return the value of vma_wants_writenotify to
the caller after setting vma->vmpage_prot.

In vma_set_page_prot:
-void vma_set_page_prot(struct vm_area_struct *vma)
+bool vma_set_page_prot(struct vm_area_struct *vma)
{
    unsigned long vm_flags = vma->vm_flags;

    vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot, vm_flags);
    if (vma_wants_writenotify(vma)) {
        vm_flags &= ~VM_SHARED;
        vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
                             vm_flags);
+        return 1;
     }
+    return 0;
}

In mprotect_fixup:

     * held in write mode.
      */
     vma->vm_flags = newflags;
-    dirty_accountable = vma_wants_writenotify(vma);
-    vma_set_page_prot(vma);
+    dirty_accountable = vma_set_page_prot(vma);

     change_protection(vma, start, end, vma->vm_page_prot,
               dirty_accountable, 0)

Thanks!
Ashish

On Mon, May 16, 2016 at 7:05 PM, Kirill A. Shutemov <kirill@shutemov.name>
wrote:

> On Fri, May 06, 2016 at 03:01:12PM -0700, Andrew Morton wrote:
> >
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > Great bug report, thanks.
> >
> > I assume the breakage was caused by
> >
> > commit 64e455079e1bd7787cc47be30b7f601ce682a5f6
> > Author:     Peter Feiner <pfeiner@google.com>
> > AuthorDate: Mon Oct 13 15:55:46 2014 -0700
> > Commit:     Linus Torvalds <torvalds@linux-foundation.org>
> > CommitDate: Tue Oct 14 02:18:28 2014 +0200
> >
> >     mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY
> cleared
> >
> >
> > Could someone (Peter, Kirill?) please take a look?
> >
> > On Fri, 06 May 2016 13:15:19 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=117731
> > >
> > >             Bug ID: 117731
> > >            Summary: Doing mprotect for PROT_NONE and then for
> > >                     PROT_READ|PROT_WRITE reduces CPU write B/W on
> buffer
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 3.18 and beyond
> > >           Hardware: All
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: high
> > >           Priority: P1
> > >          Component: Other
> > >           Assignee: akpm@linux-foundation.org
> > >           Reporter: ashish0srivastava0@gmail.com
> > >         Regression: No
> > >
> > > Created attachment 215401
> > >   --> https://bugzilla.kernel.org/attachment.cgi?id=215401&action=edit
> > > Repro code
>
> The code is somewhat broken: malloc doesn't guarantee to return
> page-aligned pointer. And in my case it leads -EINVAL from mprotect().
>
> Do you have a custom malloc()?
>
> > > This is a regression that is present in kernel 3.18 and beyond and not
> in
> > > previous ones.
> > > Attached is a simple repro case. It measures the time taken to write
> and then
> > > read all pages in a buffer, then it does mprotect for PROT_NONE and
> then
> > > mprotect for PROT_READ|PROT_WRITE, then it again measures time taken
> to write
> > > and then read all pages in a buffer. The 2nd time taken is much larger
> (20 to
> > > 30 times) than the first one.
> > >
> > > I have looked at the code in the kernel tree that is causing this and
> it is
> > > because writes are causing faults, as pte_mkwrite is not being done
> during
> > > mprotect_fixup for PROT_READ|PROT_WRITE.
> > >
> > > This is the code inside mprotect_fixup in a tree v3.16.35 or older:
> > >     /*
> > >      * vm_flags and vm_page_prot are protected by the mmap_sem
> > >      * held in write mode.
> > >      */
> > >     vma->vm_flags = newflags;
> > >     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
> > >                       vm_get_page_prot(newflags));
> > >
> > >     if (vma_wants_writenotify(vma)) {
> > >         vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
> > >         dirty_accountable = 1;
> > >     }
> > > This is the code in the same region inside mprotect_fixup in a recent
> tree:
> > >     /*
> > >      * vm_flags and vm_page_prot are protected by the mmap_sem
> > >      * held in write mode.
> > >      */
> > >     vma->vm_flags = newflags;
> > >     dirty_accountable = vma_wants_writenotify(vma);
> > >     vma_set_page_prot(vma);
> > >
> > > The difference is the setting of dirty_accountable. result of
> > > vma_wants_writenotify does not depend on vma->vm_flags alone but also
> depends
> > > on vma->vm_page_prot and following code will make it return 0 because
> in newer
> > > code we are setting dirty_accountable before setting vma->vm_page_prot.
> > >     /* The open routine did something to the protections that
> pgprot_modify
> > >      * won't preserve? */
> > >     if (pgprot_val(vma->vm_page_prot) !=
> > >         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
> > >         return 0;
>
> The test-case will never hit this, as normal malloc() returns anonymous
> memory, which is handled by the first check in vma_wants_writenotify().
>
> The only case when the case can change anything for you is if your
> malloc() return file-backed memory. Which is possible, I guess, with
> custom malloc().
>
> > > Now, suppose we change code by calling vma_set_page_prot before setting
> > > dirty_accountable:
> > >     vma->vm_flags = newflags;
> > >     vma_set_page_prot(vma);
> > >     dirty_accountable = vma_wants_writenotify(vma);
> > > Still, dirty_accountable will be 0. This is because following code in
> > > vma_set_page_prot modifies vma->vm_page_prot without modifying
> vma->vm_flags:
> > >     if (vma_wants_writenotify(vma)) {
> > >         vm_flags &= ~VM_SHARED;
> > >         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
> > >                              vm_flags);
> > >     }
> > > so this check in vma_wants_writenotify will again return 0:
> > >     /* The open routine did something to the protections that
> pgprot_modify
> > >      * won't preserve? */
> > >     if (pgprot_val(vma->vm_page_prot) !=
> > >         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
> > >         return 0;
> > > So dirty_accountable is still 0.
> > >
> > > This code in change_pte_range decides whether to call pte_mkwrite or
> not:
> > >             /* Avoid taking write faults for known dirty pages */
> > >             if (dirty_accountable && pte_dirty(ptent) &&
> > >                     (pte_soft_dirty(ptent) ||
> > >                      !(vma->vm_flags & VM_SOFTDIRTY))) {
> > >                 ptent = pte_mkwrite(ptent);
> > >             }
> > > If dirty_accountable is 0 even though the pte was dirty already,
> pte_mkwrite
> > > will not be done.
> > >
> > > I think the correct solution should be that dirty_accountable be set
> with the
> > > value of vma_wants_writenotify queried before vma->vm_page_prot is set
> with
> > > VM_SHARED removed from flags. One way to do so could be to have
> > > vma_set_page_prot return the value of dirty_accountable that it can
> set right
> > > after vma_wants_writenotify check. Another way could be to do
> > >     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
> > >                       vm_get_page_prot(newflags));
> > > and then set dirty_accountable based on vma_wants_writenotify and then
> call
> > > vma_set_page_prot.
>
> Looks like a good catch, but I'm not sure if it's the root cause of your
> problem.
>
> --
>  Kirill A. Shutemov
>

[-- Attachment #2: Type: text/html, Size: 10681 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
  2016-05-17 11:26     ` Ashish Srivastava
@ 2016-05-17 11:36       ` Kirill A. Shutemov
  2016-05-17 11:47         ` Ashish Srivastava
  2016-05-17 15:51       ` Peter Feiner
  1 sibling, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2016-05-17 11:36 UTC (permalink / raw)
  To: Ashish Srivastava; +Cc: Andrew Morton, bugzilla-daemon, Peter Feiner, linux-mm

On Tue, May 17, 2016 at 04:56:02PM +0530, Ashish Srivastava wrote:
> Yes, the original repro was using a custom allocator but I was seeing the
> issue with malloc'd memory as well on my (ARMv7) platform.

Test-case for that would be helpful, as normal malloc()'ed anon memory
cannot be subject for the bug. Unless I miss something obvious.

> I agree that the repro code won't reliably work so have modified the repro
> code attached to the bug to use file backed memory.
> 
> That really is the root cause of the problem. I can make the following
> change in the kernel that can make the slow writes problem go away.
> This makes vma_set_page_prot return the value of vma_wants_writenotify to
> the caller after setting vma->vmpage_prot.
> 
> In vma_set_page_prot:
> -void vma_set_page_prot(struct vm_area_struct *vma)
> +bool vma_set_page_prot(struct vm_area_struct *vma)
> {
>     unsigned long vm_flags = vma->vm_flags;
> 
>     vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot, vm_flags);
>     if (vma_wants_writenotify(vma)) {
>         vm_flags &= ~VM_SHARED;
>         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
>                              vm_flags);
> +        return 1;
>      }
> +    return 0;
> }
> 
> In mprotect_fixup:
> 
>      * held in write mode.
>       */
>      vma->vm_flags = newflags;
> -    dirty_accountable = vma_wants_writenotify(vma);
> -    vma_set_page_prot(vma);
> +    dirty_accountable = vma_set_page_prot(vma);
> 
>      change_protection(vma, start, end, vma->vm_page_prot,
>                dirty_accountable, 0)
> 

That looks good to me. Please prepare proper patch.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
  2016-05-17 11:36       ` Kirill A. Shutemov
@ 2016-05-17 11:47         ` Ashish Srivastava
  2016-05-17 12:03           ` Kirill A. Shutemov
  0 siblings, 1 reply; 8+ messages in thread
From: Ashish Srivastava @ 2016-05-17 11:47 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Andrew Morton, bugzilla-daemon, Peter Feiner, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2092 bytes --]

> Test-case for that would be helpful, as normal malloc()'ed anon memory
> cannot be subject for the bug. Unless I miss something obvious.

I've modified the test-case attached to the bug and now it doesn't use
malloc()'ed memory but file backed mmap shared memory.

On Tue, May 17, 2016 at 5:06 PM, Kirill A. Shutemov <kirill@shutemov.name>
wrote:

> On Tue, May 17, 2016 at 04:56:02PM +0530, Ashish Srivastava wrote:
> > Yes, the original repro was using a custom allocator but I was seeing the
> > issue with malloc'd memory as well on my (ARMv7) platform.
>
> Test-case for that would be helpful, as normal malloc()'ed anon memory
> cannot be subject for the bug. Unless I miss something obvious.
>
> > I agree that the repro code won't reliably work so have modified the
> repro
> > code attached to the bug to use file backed memory.
> >
> > That really is the root cause of the problem. I can make the following
> > change in the kernel that can make the slow writes problem go away.
> > This makes vma_set_page_prot return the value of vma_wants_writenotify to
> > the caller after setting vma->vmpage_prot.
> >
> > In vma_set_page_prot:
> > -void vma_set_page_prot(struct vm_area_struct *vma)
> > +bool vma_set_page_prot(struct vm_area_struct *vma)
> > {
> >     unsigned long vm_flags = vma->vm_flags;
> >
> >     vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot, vm_flags);
> >     if (vma_wants_writenotify(vma)) {
> >         vm_flags &= ~VM_SHARED;
> >         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
> >                              vm_flags);
> > +        return 1;
> >      }
> > +    return 0;
> > }
> >
> > In mprotect_fixup:
> >
> >      * held in write mode.
> >       */
> >      vma->vm_flags = newflags;
> > -    dirty_accountable = vma_wants_writenotify(vma);
> > -    vma_set_page_prot(vma);
> > +    dirty_accountable = vma_set_page_prot(vma);
> >
> >      change_protection(vma, start, end, vma->vm_page_prot,
> >                dirty_accountable, 0)
> >
>
> That looks good to me. Please prepare proper patch.
>
> --
>  Kirill A. Shutemov
>

[-- Attachment #2: Type: text/html, Size: 2914 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
  2016-05-17 11:47         ` Ashish Srivastava
@ 2016-05-17 12:03           ` Kirill A. Shutemov
  0 siblings, 0 replies; 8+ messages in thread
From: Kirill A. Shutemov @ 2016-05-17 12:03 UTC (permalink / raw)
  To: Ashish Srivastava; +Cc: Andrew Morton, bugzilla-daemon, Peter Feiner, linux-mm

On Tue, May 17, 2016 at 05:17:23PM +0530, Ashish Srivastava wrote:
> > Test-case for that would be helpful, as normal malloc()'ed anon memory
> > cannot be subject for the bug. Unless I miss something obvious.
> 
> I've modified the test-case attached to the bug and now it doesn't use
> malloc()'ed memory but file backed mmap shared memory.

Yes, that's consistent with your analysis.

You can post the patch with my Acked-by.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
  2016-05-17 11:26     ` Ashish Srivastava
  2016-05-17 11:36       ` Kirill A. Shutemov
@ 2016-05-17 15:51       ` Peter Feiner
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Feiner @ 2016-05-17 15:51 UTC (permalink / raw)
  To: Ashish Srivastava
  Cc: Kirill A. Shutemov, Andrew Morton, bugzilla-daemon, linux-mm

On Tue, May 17, 2016 at 4:26 AM, Ashish Srivastava
<ashish0srivastava0@gmail.com> wrote:
> Yes, the original repro was using a custom allocator but I was seeing the
> issue with malloc'd memory as well on my (ARMv7) platform.
> I agree that the repro code won't reliably work so have modified the repro
> code attached to the bug to use file backed memory.

Ah, I was going to ask if you were doing this on some platform other
than x86. I followed your reasoning, but when I tested the unpatched
kernel, I couldn't reproduce the problem. I used perf to count page
faults and still didn't see a difference.

> That really is the root cause of the problem. I can make the following
> change in the kernel that can make the slow writes problem go away.
> This makes vma_set_page_prot return the value of vma_wants_writenotify to
> the caller after setting vma->vmpage_prot.
>
> In vma_set_page_prot:
> -void vma_set_page_prot(struct vm_area_struct *vma)
> +bool vma_set_page_prot(struct vm_area_struct *vma)
> {
>     unsigned long vm_flags = vma->vm_flags;
>
>     vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot, vm_flags);
>     if (vma_wants_writenotify(vma)) {
>         vm_flags &= ~VM_SHARED;
>         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
>                              vm_flags);
> +        return 1;
>      }
> +    return 0;
> }
>
> In mprotect_fixup:
>
>      * held in write mode.
>       */
>      vma->vm_flags = newflags;
> -    dirty_accountable = vma_wants_writenotify(vma);
> -    vma_set_page_prot(vma);
> +    dirty_accountable = vma_set_page_prot(vma);
>
>      change_protection(vma, start, end, vma->vm_page_prot,
>                dirty_accountable, 0)
>
> Thanks!
> Ashish
>
> On Mon, May 16, 2016 at 7:05 PM, Kirill A. Shutemov <kirill@shutemov.name>
> wrote:
>>
>> On Fri, May 06, 2016 at 03:01:12PM -0700, Andrew Morton wrote:
>> >
>> > (switched to email.  Please respond via emailed reply-to-all, not via
>> > the
>> > bugzilla web interface).
>> >
>> > Great bug report, thanks.
>> >
>> > I assume the breakage was caused by
>> >
>> > commit 64e455079e1bd7787cc47be30b7f601ce682a5f6
>> > Author:     Peter Feiner <pfeiner@google.com>
>> > AuthorDate: Mon Oct 13 15:55:46 2014 -0700
>> > Commit:     Linus Torvalds <torvalds@linux-foundation.org>
>> > CommitDate: Tue Oct 14 02:18:28 2014 +0200
>> >
>> >     mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY
>> > cleared
>> >
>> >
>> > Could someone (Peter, Kirill?) please take a look?
>> >
>> > On Fri, 06 May 2016 13:15:19 +0000 bugzilla-daemon@bugzilla.kernel.org
>> > wrote:
>> >
>> > > https://bugzilla.kernel.org/show_bug.cgi?id=117731
>> > >
>> > >             Bug ID: 117731
>> > >            Summary: Doing mprotect for PROT_NONE and then for
>> > >                     PROT_READ|PROT_WRITE reduces CPU write B/W on
>> > > buffer
>> > >            Product: Memory Management
>> > >            Version: 2.5
>> > >     Kernel Version: 3.18 and beyond
>> > >           Hardware: All
>> > >                 OS: Linux
>> > >               Tree: Mainline
>> > >             Status: NEW
>> > >           Severity: high
>> > >           Priority: P1
>> > >          Component: Other
>> > >           Assignee: akpm@linux-foundation.org
>> > >           Reporter: ashish0srivastava0@gmail.com
>> > >         Regression: No
>> > >
>> > > Created attachment 215401
>> > >   --> https://bugzilla.kernel.org/attachment.cgi?id=215401&action=edit
>> > > Repro code
>>
>> The code is somewhat broken: malloc doesn't guarantee to return
>> page-aligned pointer. And in my case it leads -EINVAL from mprotect().
>>
>> Do you have a custom malloc()?
>>
>> > > This is a regression that is present in kernel 3.18 and beyond and not
>> > > in
>> > > previous ones.
>> > > Attached is a simple repro case. It measures the time taken to write
>> > > and then
>> > > read all pages in a buffer, then it does mprotect for PROT_NONE and
>> > > then
>> > > mprotect for PROT_READ|PROT_WRITE, then it again measures time taken
>> > > to write
>> > > and then read all pages in a buffer. The 2nd time taken is much larger
>> > > (20 to
>> > > 30 times) than the first one.
>> > >
>> > > I have looked at the code in the kernel tree that is causing this and
>> > > it is
>> > > because writes are causing faults, as pte_mkwrite is not being done
>> > > during
>> > > mprotect_fixup for PROT_READ|PROT_WRITE.
>> > >
>> > > This is the code inside mprotect_fixup in a tree v3.16.35 or older:
>> > >     /*
>> > >      * vm_flags and vm_page_prot are protected by the mmap_sem
>> > >      * held in write mode.
>> > >      */
>> > >     vma->vm_flags = newflags;
>> > >     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
>> > >                       vm_get_page_prot(newflags));
>> > >
>> > >     if (vma_wants_writenotify(vma)) {
>> > >         vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
>> > >         dirty_accountable = 1;
>> > >     }
>> > > This is the code in the same region inside mprotect_fixup in a recent
>> > > tree:
>> > >     /*
>> > >      * vm_flags and vm_page_prot are protected by the mmap_sem
>> > >      * held in write mode.
>> > >      */
>> > >     vma->vm_flags = newflags;
>> > >     dirty_accountable = vma_wants_writenotify(vma);
>> > >     vma_set_page_prot(vma);
>> > >
>> > > The difference is the setting of dirty_accountable. result of
>> > > vma_wants_writenotify does not depend on vma->vm_flags alone but also
>> > > depends
>> > > on vma->vm_page_prot and following code will make it return 0 because
>> > > in newer
>> > > code we are setting dirty_accountable before setting
>> > > vma->vm_page_prot.
>> > >     /* The open routine did something to the protections that
>> > > pgprot_modify
>> > >      * won't preserve? */
>> > >     if (pgprot_val(vma->vm_page_prot) !=
>> > >         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
>> > >         return 0;
>>
>> The test-case will never hit this, as normal malloc() returns anonymous
>> memory, which is handled by the first check in vma_wants_writenotify().
>>
>> The only case when the case can change anything for you is if your
>> malloc() return file-backed memory. Which is possible, I guess, with
>> custom malloc().
>>
>> > > Now, suppose we change code by calling vma_set_page_prot before
>> > > setting
>> > > dirty_accountable:
>> > >     vma->vm_flags = newflags;
>> > >     vma_set_page_prot(vma);
>> > >     dirty_accountable = vma_wants_writenotify(vma);
>> > > Still, dirty_accountable will be 0. This is because following code in
>> > > vma_set_page_prot modifies vma->vm_page_prot without modifying
>> > > vma->vm_flags:
>> > >     if (vma_wants_writenotify(vma)) {
>> > >         vm_flags &= ~VM_SHARED;
>> > >         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
>> > >                              vm_flags);
>> > >     }
>> > > so this check in vma_wants_writenotify will again return 0:
>> > >     /* The open routine did something to the protections that
>> > > pgprot_modify
>> > >      * won't preserve? */
>> > >     if (pgprot_val(vma->vm_page_prot) !=
>> > >         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
>> > >         return 0;
>> > > So dirty_accountable is still 0.
>> > >
>> > > This code in change_pte_range decides whether to call pte_mkwrite or
>> > > not:
>> > >             /* Avoid taking write faults for known dirty pages */
>> > >             if (dirty_accountable && pte_dirty(ptent) &&
>> > >                     (pte_soft_dirty(ptent) ||
>> > >                      !(vma->vm_flags & VM_SOFTDIRTY))) {
>> > >                 ptent = pte_mkwrite(ptent);
>> > >             }
>> > > If dirty_accountable is 0 even though the pte was dirty already,
>> > > pte_mkwrite
>> > > will not be done.
>> > >
>> > > I think the correct solution should be that dirty_accountable be set
>> > > with the
>> > > value of vma_wants_writenotify queried before vma->vm_page_prot is set
>> > > with
>> > > VM_SHARED removed from flags. One way to do so could be to have
>> > > vma_set_page_prot return the value of dirty_accountable that it can
>> > > set right
>> > > after vma_wants_writenotify check. Another way could be to do
>> > >     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
>> > >                       vm_get_page_prot(newflags));
>> > > and then set dirty_accountable based on vma_wants_writenotify and then
>> > > call
>> > > vma_set_page_prot.
>>
>> Looks like a good catch, but I'm not sure if it's the root cause of your
>> problem.
>>
>> --
>>  Kirill A. Shutemov
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-17 15:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-117731-27@https.bugzilla.kernel.org/>
2016-05-06 22:01 ` [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer Andrew Morton
2016-05-09 18:07   ` Peter Feiner
2016-05-16 13:35   ` Kirill A. Shutemov
2016-05-17 11:26     ` Ashish Srivastava
2016-05-17 11:36       ` Kirill A. Shutemov
2016-05-17 11:47         ` Ashish Srivastava
2016-05-17 12:03           ` Kirill A. Shutemov
2016-05-17 15:51       ` Peter Feiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.