All of lore.kernel.org
 help / color / mirror / Atom feed
* preempted dup_mm misses TLB invalidate
@ 2013-07-15 18:19 Nickolas Fortino
  2013-07-17 19:27 ` Catalin Marinas
  0 siblings, 1 reply; 10+ messages in thread
From: Nickolas Fortino @ 2013-07-15 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

I?ve noticed an issue in simulation where the Linux kernel is executing 
a user process when the page tables and TLBs have gotten out of sync. 
The page tables have a page marked as user read only, but the TLB has 
the page marked as user read/write.

I?ve traced the issue back to the handling of copy on write pages 
generated from the ?do_fork?, ?copy_process?, ?dup_mm?, ?dup_mmap? call 
stack. If run without interruption, ?dup_mmap? calls 
?flush_tlb_mm(oldmm)? on completion, avoiding any issues. In this case, 
however, about 4 million instructions after ?dup_mm? is called, 
?copy_pte_range? yields to another thread via __cond_resched. About 20 
million instructions later, a user process with the ASID of the source 
mm is scheduled. This process performs a store to a page modified from 
read/write to read only in the copy on write logic of ?copy_one_pte?. 
Because the TLB was not invalidated, the store hits on a TLB entry with 
read/write permissions and succeeds without a fault.

What invariant in the Linux kernel is supposed to prevent this from 
happening? Note I have not observed user visible corruption, but it 
seems very unlikely a successful store to a page marked as read only in 
the kernel is safe.

For reference, this issue was found with the Linux 3.7 kernel coming 
from the Linaro 12.12 release available at 
http://releases.linaro.org/12.12/android/vexpress

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-15 18:19 preempted dup_mm misses TLB invalidate Nickolas Fortino
@ 2013-07-17 19:27 ` Catalin Marinas
  2013-07-17 19:52   ` Stephen Warren
  0 siblings, 1 reply; 10+ messages in thread
From: Catalin Marinas @ 2013-07-17 19:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 15, 2013 at 07:19:23PM +0100, Nickolas Fortino wrote:
> I?ve noticed an issue in simulation where the Linux kernel is executing 
> a user process when the page tables and TLBs have gotten out of sync. 
> The page tables have a page marked as user read only, but the TLB has 
> the page marked as user read/write.

This happens during fork() for the current process. I think mprotect()
as well. The caller is supposed not to have threads that write its
memory while another thread does a fork().

> I?ve traced the issue back to the handling of copy on write pages 
> generated from the ?do_fork?, ?copy_process?, ?dup_mm?, ?dup_mmap? call 
> stack. If run without interruption, ?dup_mmap? calls 
> ?flush_tlb_mm(oldmm)? on completion, avoiding any issues. In this case, 
> however, about 4 million instructions after ?dup_mm? is called, 
> ?copy_pte_range? yields to another thread via __cond_resched. About 20 
> million instructions later, a user process with the ASID of the source 
> mm is scheduled.

Why would it have the same ASID? We should not reuse an ASID unless
there was a TLB invalidation for that ASID. If it's a thread of the same
process, I think it's just a user programming bug.

> This process performs a store to a page modified from 
> read/write to read only in the copy on write logic of ?copy_one_pte?. 
> Because the TLB was not invalidated, the store hits on a TLB entry with 
> read/write permissions and succeeds without a fault.
> 
> What invariant in the Linux kernel is supposed to prevent this from 
> happening? Note I have not observed user visible corruption, but it 
> seems very unlikely a successful store to a page marked as read only in 
> the kernel is safe.

See above. The only workaround would be to stop all the threads of a
process while calling fork(). Threads and fork() are not nice to
each-other.

-- 
Catalin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 19:27 ` Catalin Marinas
@ 2013-07-17 19:52   ` Stephen Warren
  2013-07-17 20:01     ` Russell King - ARM Linux
  2013-07-17 20:09     ` Nickolas Fortino
  0 siblings, 2 replies; 10+ messages in thread
From: Stephen Warren @ 2013-07-17 19:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/17/2013 01:27 PM, Catalin Marinas wrote:
> On Mon, Jul 15, 2013 at 07:19:23PM +0100, Nickolas Fortino wrote:
>> I?ve noticed an issue in simulation where the Linux kernel is executing 
>> a user process when the page tables and TLBs have gotten out of sync. 
>> The page tables have a page marked as user read only, but the TLB has 
>> the page marked as user read/write.
> 
> This happens during fork() for the current process. I think mprotect()
> as well. The caller is supposed not to have threads that write its
> memory while another thread does a fork().

Hmmm. That sounds like a plausible explanation, but I'm not convinced
it's true.

I would guess that the only way to prevent threads of an application
from writing to its memory while a fork() happens in another thread is
to prevent those threads from running at all; almost any code is going
to do some writes e.g. to the stack at least. That would imply the
kernel must prevent the scheduling of the other threads, not the
user-space application.

I quickly searched and couldn't see anything that agreed with your
statement about this being a user-space bug. There are plenty of
articles pointing out potential problems if a threaded app forks, but I
didn't see anything that said it's no legal. I also note that pthreads
explicitly specifies what happens if a threaded app forks (just the
thread calling fork is duplicated into the child process), what
functions can be called after a fork ("async-safe" functions), and the
function pthread_at_fork() exists, all of which tend to imply that
forking-and-threading can be legally used together.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 19:52   ` Stephen Warren
@ 2013-07-17 20:01     ` Russell King - ARM Linux
  2013-07-17 20:11       ` Stephen Warren
  2013-07-17 20:09     ` Nickolas Fortino
  1 sibling, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2013-07-17 20:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 17, 2013 at 01:52:45PM -0600, Stephen Warren wrote:
> Hmmm. That sounds like a plausible explanation, but I'm not convinced
> it's true.
> 
> I would guess that the only way to prevent threads of an application
> from writing to its memory while a fork() happens in another thread is
> to prevent those threads from running at all; almost any code is going
> to do some writes e.g. to the stack at least. That would imply the
> kernel must prevent the scheduling of the other threads, not the
> user-space application.
> 
> I quickly searched and couldn't see anything that agreed with your
> statement about this being a user-space bug. There are plenty of
> articles pointing out potential problems if a threaded app forks, but I
> didn't see anything that said it's no legal. I also note that pthreads
> explicitly specifies what happens if a threaded app forks (just the
> thread calling fork is duplicated into the child process), what
> functions can be called after a fork ("async-safe" functions), and the
> function pthread_at_fork() exists, all of which tend to imply that
> forking-and-threading can be legally used together.

Yes, everything which you've said above is true, but if you read the
discussions on pthread_atfork(), you'll see that the whole notion that
you can somehow synchronize state for a fork() is a complete dead loss -
and pthread_atfork() is a pile of trash.

Semaphores must be released in the same thread as the thread which
acquired them.  If you take a semaphore in the pre-fork handler, and
release it in the parent post-fork handler, you can't legally release
it in the child post-fork handler because the child didn't acquire it!

What that means is that you can't be holding any semaphores when a
thread forks.  Remember that a thread ends up with a complete copy of
the VM space, semaphores and all in whatever state they were in _all_
the threads the moment when the fork happened.

Really, the only legal thing for a threaded process to do with fork()
is to immediately follow it with exec*() without doing anything else
with any pthread state (or any function which touches any pthread
state).  That much must work correctly - and if it doesn't, then we
definitely have a bug.

However, fork()ing a threaded app and trying to use pthread in the
child is not something I (or anyone) should really care about; it's
not a legal thing to do with all the requirements of the pthread APIs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 19:52   ` Stephen Warren
  2013-07-17 20:01     ` Russell King - ARM Linux
@ 2013-07-17 20:09     ` Nickolas Fortino
  2013-07-17 20:34       ` Russell King - ARM Linux
  1 sibling, 1 reply; 10+ messages in thread
From: Nickolas Fortino @ 2013-07-17 20:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/17/2013 12:52 PM, Stephen Warren wrote:
> On 07/17/2013 01:27 PM, Catalin Marinas wrote:
>> On Mon, Jul 15, 2013 at 07:19:23PM +0100, Nickolas Fortino wrote:
>>> The page tables have a page marked as user read only, but the TLB has
>>> the page marked as user read/write.
>>
>> This happens during fork() for the current process. I think mprotect()
>> as well. The caller is supposed not to have threads that write its
>> memory while another thread does a fork().

To be clear, the complaint is not that the page tables are ephemerally 
out of sync with the TLBs. I agree that is part of the expected 
operation of fork(), and if a TLB invalidate occurs prior to any memory 
access the code is valid.

The problem is eventually a user process performs a store which hits on 
a writeable TLB entry with the PTE marked as read only. Is it supposed 
to be possible for a user threading bug to end up in this state? I would 
have expected the kernel to be responsible for assuring no stores occur 
to a page it has marked as read only.

As for whether the application is threaded, it almost certainly is - 
it's cfbench equivalent to 
https://play.google.com/store/apps/details?id=eu.chainfire.cfbench&hl=en

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 20:01     ` Russell King - ARM Linux
@ 2013-07-17 20:11       ` Stephen Warren
  0 siblings, 0 replies; 10+ messages in thread
From: Stephen Warren @ 2013-07-17 20:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/17/2013 02:01 PM, Russell King - ARM Linux wrote:
> On Wed, Jul 17, 2013 at 01:52:45PM -0600, Stephen Warren wrote:
>> Hmmm. That sounds like a plausible explanation, but I'm not convinced
>> it's true.
>>
>> I would guess that the only way to prevent threads of an application
>> from writing to its memory while a fork() happens in another thread is
>> to prevent those threads from running at all; almost any code is going
>> to do some writes e.g. to the stack at least. That would imply the
>> kernel must prevent the scheduling of the other threads, not the
>> user-space application.
>>
>> I quickly searched and couldn't see anything that agreed with your
>> statement about this being a user-space bug. There are plenty of
>> articles pointing out potential problems if a threaded app forks, but I
>> didn't see anything that said it's no legal. I also note that pthreads
>> explicitly specifies what happens if a threaded app forks (just the
>> thread calling fork is duplicated into the child process), what
>> functions can be called after a fork ("async-safe" functions), and the
>> function pthread_at_fork() exists, all of which tend to imply that
>> forking-and-threading can be legally used together.
> 
> Yes, everything which you've said above is true, but if you read the
> discussions on pthread_atfork(), you'll see that the whole notion that
> you can somehow synchronize state for a fork() is a complete dead loss -
> and pthread_atfork() is a pile of trash.
> 
> Semaphores must be released in the same thread as the thread which
> acquired them.  If you take a semaphore in the pre-fork handler, and
> release it in the parent post-fork handler, you can't legally release
> it in the child post-fork handler because the child didn't acquire it!
> 
> What that means is that you can't be holding any semaphores when a
> thread forks.  Remember that a thread ends up with a complete copy of
> the VM space, semaphores and all in whatever state they were in _all_
> the threads the moment when the fork happened.
> 
> Really, the only legal thing for a threaded process to do with fork()
> is to immediately follow it with exec*() without doing anything else
> with any pthread state (or any function which touches any pthread
> state).  That much must work correctly - and if it doesn't, then we
> definitely have a bug.

Yes, there are definitely issues mixing threads and forking.

But I think the issue that Nickolas reported can be reproduced by a
threaded application doing nothing but fork() followed by an exec() in
the child, assuming that another thread in the parent process touches
memory (e.g. stack) between the "fork()" and exec() in the child, which
seems pretty likely unless the kernel itself prevents it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 20:09     ` Nickolas Fortino
@ 2013-07-17 20:34       ` Russell King - ARM Linux
  2013-07-17 21:03         ` Nickolas Fortino
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2013-07-17 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 17, 2013 at 01:09:52PM -0700, Nickolas Fortino wrote:
> The problem is eventually a user process performs a store which hits on  
> a writeable TLB entry with the PTE marked as read only. Is it supposed  
> to be possible for a user threading bug to end up in this state?

I've thought about that, and I'm not sure what we can do about this.
Moreover, I really don't think it matters at all.

Let's consider a SMP system running a multithreaded application.  CPUs
0 and 1 are running two threads, CPU 1 is about to do a fork, but CPU 0
is doing a large time consuming memcpy().

CPU 1 does the fork while CPU 0 is still running this large memcpy.  It
walks the page tables, setting the PTEs to read-only.  Let's say for
argument sake that it immediately invalidates each PTE after modification.

There is still a window which CPU0 can see the TLB entry, but the PTE has
already been write protected.  The only way to close this window is to
stop all threads of the process doing a fork().

However, before we think "oh, that sounds like a solution", let's think
about this a bit more first.

Let's say that we are on a system which doesn't need any TLB maintanence.
In other words, all PTE updates are seen by all observers immediately.

Consider the above scenario again.  What is the state of the memory at
the point the fork() returns, as seen from both the multithreaded parent
point of view and the child point of view?  Can you predict where in
that memcpy() CPU 0 will have been (and therefore what data the child can
see from that memcpy)?

The answer is you can't, because you don't know if CPU 0 might have had
an interrupt to deal with which stole time away from the memcpy().  You
don't know the relative timing of CPU 0's loads/stores against the time
it took CPU 1 to mark the PTE read-only.

Even if you stopped all threads on entry to a fork, the same problem
exists - at the point that you stopped the other threads, how do you know
what data they've written to memory?

What I'm pointing out here is that in this situation, the data visible to
the child process is unpredictable.


So, does it matter if a thread hits a page which has been marked read-only
in the PTE but hasn't been invalidated yet?  The answer to that is no -
because the parent and the child will see the update, and it will be
absolutely no different from what would have happened if the store had
happened _just before_ the PTE was marked read-only.

I'm pretty convinced that if you need to rely on a multi-threaded
programs state at the point you fork(), you must have some way to quiesce
your other threads _in user space_ rather than hoping that the kernel has
some magic to patch over this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 20:34       ` Russell King - ARM Linux
@ 2013-07-17 21:03         ` Nickolas Fortino
  2013-07-17 21:21           ` Russell King - ARM Linux
  0 siblings, 1 reply; 10+ messages in thread
From: Nickolas Fortino @ 2013-07-17 21:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/17/2013 1:34 PM, Russell King - ARM Linux wrote:
> Even if you stopped all threads on entry to a fork, the same problem
> exists - at the point that you stopped the other threads, how do you know
> what data they've written to memory?
>
> What I'm pointing out here is that in this situation, the data visible to
> the child process is unpredictable.

I agree the data visible to the child process is inherently 
unpredictable. If you stop all process threads on a fork, however, you 
do preserve the invariant that memory accesses are not seen out of 
order. In your memcopy case, it is indeterminate how much of the memcopy 
has completed, but it is known that later pages in the memcopy will only 
have been written if prior pages have been updated.

With the current kernel configuration, you can have holes. Any page 
which hits in the TLB has the memcopy data appear in the forked process. 
Any page which misses in the TLB will not appear in the forked process. 
The choice of which pages get memcopy data in the forked process will 
appear random based on TLB contents, a behavior you cannot have if you 
freeze threads on a fork.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 21:03         ` Nickolas Fortino
@ 2013-07-17 21:21           ` Russell King - ARM Linux
  2013-07-18  1:48             ` Nickolas Fortino
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2013-07-17 21:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 17, 2013 at 02:03:34PM -0700, Nickolas Fortino wrote:
> On 7/17/2013 1:34 PM, Russell King - ARM Linux wrote:
>> Even if you stopped all threads on entry to a fork, the same problem
>> exists - at the point that you stopped the other threads, how do you know
>> what data they've written to memory?
>>
>> What I'm pointing out here is that in this situation, the data visible to
>> the child process is unpredictable.
>
> I agree the data visible to the child process is inherently  
> unpredictable. If you stop all process threads on a fork, however, you  
> do preserve the invariant that memory accesses are not seen out of  
> order. In your memcopy case, it is indeterminate how much of the memcopy  
> has completed, but it is known that later pages in the memcopy will only  
> have been written if prior pages have been updated.
>
> With the current kernel configuration, you can have holes. Any page  
> which hits in the TLB has the memcopy data appear in the forked process.  
> Any page which misses in the TLB will not appear in the forked process.  
> The choice of which pages get memcopy data in the forked process will  
> appear random based on TLB contents, a behavior you cannot have if you  
> freeze threads on a fork.

So... how is this handled on x86 or any of the other architectures?  I'm
willing to bet that the behaviour you observe on ARM is inherently visible
on many of the other Linux architectures.

Short of modifying the generic kernel to halt all threads, this can not be
fixed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* preempted dup_mm misses TLB invalidate
  2013-07-17 21:21           ` Russell King - ARM Linux
@ 2013-07-18  1:48             ` Nickolas Fortino
  0 siblings, 0 replies; 10+ messages in thread
From: Nickolas Fortino @ 2013-07-18  1:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/17/2013 2:21 PM, Russell King - ARM Linux wrote:
> I'm
> willing to bet that the behaviour you observe on ARM is inherently visible
> on many of the other Linux architectures.

Based on the discussion from this thread, I agree there is nothing 
architecture specific here. This behavior also appears legal, so long as 
any page with write permissions in the TLB is considered dirty in the 
linux PTE structure.

I had missed the specification for fork() is silent on what happens to 
MAP_PRIVATE memory modifications which occur during the execution of 
fork(); it merely specifies what happens to writes which occur prior to 
fork being called and after fork returns.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-07-18  1:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-15 18:19 preempted dup_mm misses TLB invalidate Nickolas Fortino
2013-07-17 19:27 ` Catalin Marinas
2013-07-17 19:52   ` Stephen Warren
2013-07-17 20:01     ` Russell King - ARM Linux
2013-07-17 20:11       ` Stephen Warren
2013-07-17 20:09     ` Nickolas Fortino
2013-07-17 20:34       ` Russell King - ARM Linux
2013-07-17 21:03         ` Nickolas Fortino
2013-07-17 21:21           ` Russell King - ARM Linux
2013-07-18  1:48             ` Nickolas Fortino

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.