All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Shan Hai <haishan.bai@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	paulus@samba.org, tglx@linutronix.de, walken@google.com,
	dhowells@redhat.com, cmetcalf@tilera.com, tony.luck@intel.com,
	akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
Date: Sun, 17 Jul 2011 11:38:11 +0200	[thread overview]
Message-ID: <4b337921-d430-4b63-bc36-ad31753cf800@email.android.com> (raw)
In-Reply-To: <1310860194.25044.17.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

>On Sat, 2011-07-16 at 10:50 -0400, Shan Hai wrote:
>> 
>> That's right the gup(.write=1) want to do the same thing as the
>> above code snippet, but it failed for the following reason:
>> because the get_user_pages() would not dirty pte for the reason
>> the follow_page() returns not NULL on *present* and *writable*
>> page, the page which holds the lock is present because its a shared
>> page,
>> writable because demand paging set that up so for shared
>> writable page, so the handle_mm_fault() in the __get_user_page()
>> could not be called.
>> 
>> Why the above code could do the same task, because by calling
>> handle_mm_fault() will set pte dirty by
>> [do_annonymous_page(), memory.c]
>> if (vma->vm_flags & VM_WRITE)
>>                  entry = pte_mkwrite(pte_mkdirty(entry));
>> 
>
>Right. gup won't set page_dirty, it expects the caller to do so (in
>case
>it doesn't dirty all the gup'ed pages a suppose).
>
>You could probably fix the problem here by setting dirty after gup in
>the futex code if you know you're going to write. You must do that with
>the PTE lock held though and -not- at interrupt time.
>
>Note however that the exact same problem exist with normal "read"
>accesses and page_young (_PAGE_ACCESSED on powerpc). The page will not
>be accessible until that bit is set and it's set by SW.
>
>As I wrote earlier, fixing that by making "atomic" page faults perform
>the dirty/accessed tracking is not right, since such faults can happen
>at interrupt time and the PTE lock cannot be taken at interrupt time.
>
>IE. The implementation of those "SW" TLB archs heavily relies on the
>PTE
>lock to serialize write access to the PTE and writing it outside of
>that
>lock would do really bad things.
>
>So there's a deeper problem here. The whole user access "in atomic"
>concept is by itself a violation of some of the basic access rules of
>user memory that have existed from day 1 of the kernel. That we allow
>it
>for semi-harmless (and allowed to fail) things like snapshot of
>backtraces for perf is one thing, but relying on it for the futex case
>like that is not going to fly very well. I sincerely hope that this
>kind
>of usage is not going to become a habit.
>
>In the meantime, other than rewriting the futex code to not require
>those in-atomic accesses (can't it just access the pages via the linear
>mapping and/or kmap after the gup ?), all I see would be a way to force
>dirty and young after gup, with appropriate locks, or a variant of gup
>(via a flag ?) to require it to do so.
>
>Cheers,
>Ben.

Whats this talk about interrup context? There is non of that involved.

Furthermore, I still dont see the problem. The futex code is optimistically trying to poke at user memory while holding spinlocks.

We fully expect that to fail, hence the error path that drops all locks and does the gup(.write=1) to fix up the mapping after which we try again.

This has worked for years, its by no means new code. Nor do I see how its broken.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Shan Hai <haishan.bai@gmail.com>
Cc: tony.luck@intel.com, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel@vger.kernel.org, cmetcalf@tilera.com,
	dhowells@redhat.com, paulus@samba.org, tglx@linutronix.de,
	walken@google.com, linuxppc-dev@lists.ozlabs.org,
	akpm@linux-foundation.org
Subject: Re: [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core
Date: Sun, 17 Jul 2011 11:38:11 +0200	[thread overview]
Message-ID: <4b337921-d430-4b63-bc36-ad31753cf800@email.android.com> (raw)
In-Reply-To: <1310860194.25044.17.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

>On Sat, 2011-07-16 at 10:50 -0400, Shan Hai wrote:
>> 
>> That's right the gup(.write=1) want to do the same thing as the
>> above code snippet, but it failed for the following reason:
>> because the get_user_pages() would not dirty pte for the reason
>> the follow_page() returns not NULL on *present* and *writable*
>> page, the page which holds the lock is present because its a shared
>> page,
>> writable because demand paging set that up so for shared
>> writable page, so the handle_mm_fault() in the __get_user_page()
>> could not be called.
>> 
>> Why the above code could do the same task, because by calling
>> handle_mm_fault() will set pte dirty by
>> [do_annonymous_page(), memory.c]
>> if (vma->vm_flags & VM_WRITE)
>>                  entry = pte_mkwrite(pte_mkdirty(entry));
>> 
>
>Right. gup won't set page_dirty, it expects the caller to do so (in
>case
>it doesn't dirty all the gup'ed pages a suppose).
>
>You could probably fix the problem here by setting dirty after gup in
>the futex code if you know you're going to write. You must do that with
>the PTE lock held though and -not- at interrupt time.
>
>Note however that the exact same problem exist with normal "read"
>accesses and page_young (_PAGE_ACCESSED on powerpc). The page will not
>be accessible until that bit is set and it's set by SW.
>
>As I wrote earlier, fixing that by making "atomic" page faults perform
>the dirty/accessed tracking is not right, since such faults can happen
>at interrupt time and the PTE lock cannot be taken at interrupt time.
>
>IE. The implementation of those "SW" TLB archs heavily relies on the
>PTE
>lock to serialize write access to the PTE and writing it outside of
>that
>lock would do really bad things.
>
>So there's a deeper problem here. The whole user access "in atomic"
>concept is by itself a violation of some of the basic access rules of
>user memory that have existed from day 1 of the kernel. That we allow
>it
>for semi-harmless (and allowed to fail) things like snapshot of
>backtraces for perf is one thing, but relying on it for the futex case
>like that is not going to fly very well. I sincerely hope that this
>kind
>of usage is not going to become a habit.
>
>In the meantime, other than rewriting the futex code to not require
>those in-atomic accesses (can't it just access the pages via the linear
>mapping and/or kmap after the gup ?), all I see would be a way to force
>dirty and young after gup, with appropriate locks, or a variant of gup
>(via a flag ?) to require it to do so.
>
>Cheers,
>Ben.

Whats this talk about interrup context? There is non of that involved.

Furthermore, I still dont see the problem. The futex code is optimistically trying to poke at user memory while holding spinlocks.

We fully expect that to fail, hence the error path that drops all locks and does the gup(.write=1) to fix up the mapping after which we try again.

This has worked for years, its by no means new code. Nor do I see how its broken.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

  reply	other threads:[~2011-07-17  9:39 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-15  8:07 [PATCH 0/1] Fixup write permission of TLB on powerpc e500 core Shan Hai
2011-07-15  8:07 ` Shan Hai
2011-07-15  8:07 ` [PATCH 1/1] " Shan Hai
2011-07-15  8:07   ` Shan Hai
2011-07-15 10:23   ` Peter Zijlstra
2011-07-15 10:23     ` Peter Zijlstra
2011-07-15 15:18     ` Shan Hai
2011-07-15 15:18       ` Shan Hai
2011-07-15 15:24       ` Peter Zijlstra
2011-07-15 15:24         ` Peter Zijlstra
2011-07-16 15:36         ` Shan Hai
2011-07-16 15:36           ` Shan Hai
2011-07-16 14:50     ` Shan Hai
2011-07-16 14:50       ` Shan Hai
2011-07-16 23:49       ` Benjamin Herrenschmidt
2011-07-16 23:49         ` Benjamin Herrenschmidt
2011-07-17  9:38         ` Peter Zijlstra [this message]
2011-07-17  9:38           ` Peter Zijlstra
2011-07-17 14:29           ` Benjamin Herrenschmidt
2011-07-17 14:29             ` Benjamin Herrenschmidt
2011-07-17 23:14             ` Benjamin Herrenschmidt
2011-07-17 23:14               ` Benjamin Herrenschmidt
2011-07-18  3:53               ` Benjamin Herrenschmidt
2011-07-18  3:53                 ` Benjamin Herrenschmidt
2011-07-18  4:02                 ` Benjamin Herrenschmidt
2011-07-18  4:02                   ` Benjamin Herrenschmidt
2011-07-18  4:01               ` Benjamin Herrenschmidt
2011-07-18  4:01                 ` Benjamin Herrenschmidt
2011-07-18  6:48                 ` Shan Hai
2011-07-18  6:48                   ` Shan Hai
2011-07-18  7:01                   ` Benjamin Herrenschmidt
2011-07-18  7:01                     ` Benjamin Herrenschmidt
2011-07-18  7:26                     ` Shan Hai
2011-07-18  7:26                       ` Shan Hai
2011-07-18  7:36                       ` Benjamin Herrenschmidt
2011-07-18  7:36                         ` Benjamin Herrenschmidt
2011-07-18  7:50                         ` Shan Hai
2011-07-18  7:50                           ` Shan Hai
2011-07-19  3:30                         ` Shan Hai
2011-07-19  3:30                           ` Shan Hai
2011-07-19  4:20                           ` Benjamin Herrenschmidt
2011-07-19  4:20                             ` Benjamin Herrenschmidt
2011-07-19  4:29                           ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of dirty & young Benjamin Herrenschmidt
2011-07-19  4:29                             ` Benjamin Herrenschmidt
2011-07-19  4:55                             ` Shan Hai
2011-07-19  4:55                               ` Shan Hai
2011-07-19  5:17                             ` Shan Hai
2011-07-19  5:17                               ` Shan Hai
2011-07-19  5:24                               ` Benjamin Herrenschmidt
2011-07-19  5:24                                 ` Benjamin Herrenschmidt
2011-07-19  5:38                                 ` Shan Hai
2011-07-19  5:38                                   ` Shan Hai
2011-07-19  7:46                                   ` Benjamin Herrenschmidt
2011-07-19  7:46                                     ` Benjamin Herrenschmidt
2011-07-19  8:24                                     ` Shan Hai
2011-07-19  8:24                                       ` Shan Hai
2011-07-19  8:26                                       ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW trackingof " David Laight
2011-07-19  8:26                                         ` David Laight
2011-07-19  8:45                                         ` Benjamin Herrenschmidt
2011-07-19  8:45                                           ` Benjamin Herrenschmidt
2011-07-19  8:45                                         ` Shan Hai
2011-07-19  8:45                                           ` Shan Hai
2011-07-19 11:10                             ` [RFC/PATCH] mm/futex: Fix futex writes on archs with SW tracking of " Peter Zijlstra
2011-07-19 11:10                               ` Peter Zijlstra
2011-07-20 14:39                             ` Darren Hart
2011-07-20 14:39                               ` Darren Hart
2011-07-21 22:36                             ` Andrew Morton
2011-07-21 22:36                               ` Andrew Morton
2011-07-21 22:52                               ` Benjamin Herrenschmidt
2011-07-21 22:52                                 ` Benjamin Herrenschmidt
2011-07-21 22:57                                 ` Benjamin Herrenschmidt
2011-07-21 22:57                                   ` Benjamin Herrenschmidt
2011-07-21 22:59                                 ` Andrew Morton
2011-07-21 22:59                                   ` Andrew Morton
2011-07-22  1:40                                   ` Benjamin Herrenschmidt
2011-07-22  1:40                                     ` Benjamin Herrenschmidt
2011-07-22  1:54                                   ` Shan Hai
2011-07-22  1:54                                     ` Shan Hai
2011-07-27  6:50                             ` Mike Frysinger
2011-07-27  6:50                               ` Mike Frysinger
2011-07-27  7:58                               ` Benjamin Herrenschmidt
2011-07-27  7:58                                 ` Benjamin Herrenschmidt
2011-07-27  8:59                                 ` Peter Zijlstra
2011-07-27  8:59                                   ` Peter Zijlstra
2011-07-27 10:09                                 ` David Howells
2011-07-27 10:09                                   ` David Howells
2011-07-27 10:17                                   ` Peter Zijlstra
2011-07-27 10:17                                     ` Peter Zijlstra
2011-07-27 10:20                                     ` Benjamin Herrenschmidt
2011-07-27 10:20                                       ` Benjamin Herrenschmidt
2011-07-28  0:12                                       ` Mike Frysinger
2011-07-28  0:12                                         ` Mike Frysinger
2011-08-08  2:31                                     ` Mike Frysinger
2011-08-08  2:31                                       ` Mike Frysinger
2011-07-28 10:55                                   ` David Howells
2011-07-28 10:55                                     ` David Howells
2011-07-17 11:02         ` [PATCH 1/1] Fixup write permission of TLB on powerpc e500 core Peter Zijlstra
2011-07-17 11:02           ` Peter Zijlstra
2011-07-17 13:33           ` Shan Hai
2011-07-17 13:33             ` Shan Hai
2011-07-17 14:48             ` Benjamin Herrenschmidt
2011-07-17 14:48               ` Benjamin Herrenschmidt
2011-07-17 15:40               ` Shan Hai
2011-07-17 15:40                 ` Shan Hai
2011-07-17 22:34                 ` Benjamin Herrenschmidt
2011-07-17 22:34                   ` Benjamin Herrenschmidt
2011-07-17 14:34           ` Benjamin Herrenschmidt
2011-07-17 14:34             ` Benjamin Herrenschmidt
2011-07-15  8:20 ` [PATCH 0/1] " Peter Zijlstra
2011-07-15  8:20   ` Peter Zijlstra
2011-07-15  8:38   ` MailingLists
2011-07-15  8:38     ` MailingLists
2011-07-15  8:44     ` Peter Zijlstra
2011-07-15  8:44       ` Peter Zijlstra
2011-07-15  9:08       ` Shan Hai
2011-07-15  9:08         ` Shan Hai
2011-07-15  9:12         ` Benjamin Herrenschmidt
2011-07-15  9:12           ` Benjamin Herrenschmidt
2011-07-15  9:50         ` Peter Zijlstra
2011-07-15  9:50           ` Peter Zijlstra
2011-07-15 10:06           ` Shan Hai
2011-07-15 10:06             ` Shan Hai
2011-07-15 10:32             ` David Laight
2011-07-15 10:32               ` David Laight
2011-07-15 10:39               ` Peter Zijlstra
2011-07-15 10:39                 ` Peter Zijlstra
2011-07-15 15:32               ` Shan Hai
2011-07-15 15:32                 ` Shan Hai
2011-07-16  0:20                 ` Benjamin Herrenschmidt
2011-07-16  0:20                   ` Benjamin Herrenschmidt
2011-07-16 15:03                   ` Shan Hai
2011-07-16 15:03                     ` Shan Hai
2011-07-15 23:47               ` Benjamin Herrenschmidt
2011-07-15 23:47                 ` Benjamin Herrenschmidt
2011-07-15  9:07     ` Benjamin Herrenschmidt
2011-07-15  9:07       ` Benjamin Herrenschmidt
2011-07-15  9:05   ` Benjamin Herrenschmidt
2011-07-15  9:05     ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b337921-d430-4b63-bc36-ad31753cf800@email.android.com \
    --to=peterz@infradead.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cmetcalf@tilera.com \
    --cc=dhowells@redhat.com \
    --cc=haishan.bai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.