linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
@ 2006-12-21 15:26 Russell King
  2006-12-21 15:53 ` Miklos Szeredi
  2006-12-21 16:29 ` Arjan van de Ven
  0 siblings, 2 replies; 57+ messages in thread
From: Russell King @ 2006-12-21 15:26 UTC (permalink / raw)
  To: Linux Kernel List, linux-arch; +Cc: Miklos Szeredi

I've recently been asked to look at why fuse doesn't work on ARM.
In doing so and thinking about what fuse is doing, I've come to
the conclusion that the way fuse accesses the _current_ processes
memory space is less than ideal.

The problem is as follows:

- current process calls writev()
- fuse_dev_write is invoked, and this ends up calling
   get_user_pages(current, current->mm, address, 1, 0, ...)
- data is then read from the kernel mapping of this page (briefly) via:
   void *mapaddr = kmap_atomic(page, ..);
   memcpy(dest, mapaddr + offset, size);
   kunmap_atomic(mapaddr);

With a VIVT cache, there is an aliasing issue.  Data may be in the
cache lines corresponding with the userspace mapping, thereby making
data read from the kernel mapping incoherent.

On the face of it, the solution seems simple - we could assume that the
kernel mapping does not have any cache lines allocated with it, so we
can just write back the userspace mapping to make that data visible.

However, the action of that memcpy() will allocate some cache lines
in the kernel mapping of this page, which means when we next come to
read it (maybe via the same code) we could end up reading stale data.

Therefore, we also need to flush - more specifically, discard - the
kernel mapping cache lines.


Okay, now let's look at the read case:

- current process calls readv()
- fuse_dev_read is invoked, and eventually calls:
   get_user_pages(current, current->mm, address, 1, 1, ...)
- data is then written to the kernel mapping of this page (briefly) via:
   void *mapaddr = kmap_atomic(page, ..);
   memcpy(mapaddr + offset, source, size);
   kunmap_atomic(mapaddr);

A similar problem exists here.  The userspace mapping may have some
cache lines assocated with it, and as we found out above, so may the
kernel mapping.  We can apply the same fix to it.

However, and this is the problem, we need cache maintainence _after_
that memcpy() has completed - with a write allocate VIVT cache, the
memcpy() itself will allocate cache lines in the kernel mapping of
the page which will need to be written back for the user process to
see that data.

Moreover, the userspace mapping would need its cache lines evicted
(maybe by get_user_pages()) for the written back kernel cache lines
to be visible.  The more I think about this, the more I'm convinced
that this part could only be done by get_user_pages().


Now, throw in SMP or preempt with a multi-threaded userspace program
touching the page in question, and the problem just gets much much
worse.  In such a scenario, we can not guarantee, no matter how much
cache maintainence is applied to the kernel, that this API comes
anywhere near to being safe.


To summarise, in order for get_user_pages() to vaguely work with this
method of accessing the current processes address space on ARM, we'd
need to make the following changes:

1. get_user_pages() needs to unconditionally write back and invalidate
   the user space mapping.
2. get_user_pages() needs to invalidate the kernel space mapping.
3. all users of get_user_pages(current, current->mm, ...) need auditing,
   and additional cache maintainence added in their write paths to
   write back and invalidate the kernel mapping.
4. all fuse userspace programs need to run single-threaded.

Note: flush_anon_page() was added to work around 1 and 2 on other
architectures because fuse tripped over this issue there.

So, given all this additional complexity _and_ that it would only be
safe on non-preempt UP, the question becomes: is using get_user_pages()
to access the current processes memory space legal?  Given the above,
I would say not.

I suggest that in order for fuse to work reliably on ARM, it is modified
to behave more like a reasonable device driver, and use the functions
defined in asm/uaccess.h when it wants to access the current processes
VM space.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 15:26 fuse, get_user_pages, flush_anon_page, aliasing caches and all that again Russell King
@ 2006-12-21 15:53 ` Miklos Szeredi
  2006-12-21 16:57   ` Russell King
  2006-12-21 17:17   ` Russell King
  2006-12-21 16:29 ` Arjan van de Ven
  1 sibling, 2 replies; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-21 15:53 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel, linux-arch

I'll first answer the last paragraph.

> I suggest that in order for fuse to work reliably on ARM, it is modified
> to behave more like a reasonable device driver, and use the functions
> defined in asm/uaccess.h when it wants to access the current processes
> VM space.

Fuse needs to use get_user_pages() to work around certain deadlock
scenarios (see Documentation/filesystems/fuse.txt), that are
problematic with copy_*_user().

Other possibilities for dealing with these kinds of deadlocks are:

  1) detect deadlock before it happens

  2) copy data to a temporary buffer before copying to userspace

I think 1) is basically impossible.  2) is easy and would save a lot
of complexity, but it would certainly be much slower than the current
approach, especially for the mainstream architectures, where the
dcache aliasing problem doesn't exist.

> I've recently been asked to look at why fuse doesn't work on ARM.
> In doing so and thinking about what fuse is doing, I've come to
> the conclusion that the way fuse accesses the _current_ processes
> memory space is less than ideal.
> 
> The problem is as follows:
> 
> - current process calls writev()
> - fuse_dev_write is invoked, and this ends up calling
>    get_user_pages(current, current->mm, address, 1, 0, ...)
> - data is then read from the kernel mapping of this page (briefly) via:
>    void *mapaddr = kmap_atomic(page, ..);
>    memcpy(dest, mapaddr + offset, size);
>    kunmap_atomic(mapaddr);
> 
> With a VIVT cache, there is an aliasing issue.  Data may be in the
> cache lines corresponding with the userspace mapping, thereby making
> data read from the kernel mapping incoherent.
> 
> On the face of it, the solution seems simple - we could assume that the
> kernel mapping does not have any cache lines allocated with it, so we
> can just write back the userspace mapping to make that data visible.
> 
> However, the action of that memcpy() will allocate some cache lines
> in the kernel mapping of this page, which means when we next come to
> read it (maybe via the same code) we could end up reading stale data.
> 
> Therefore, we also need to flush - more specifically, discard - the
> kernel mapping cache lines.
> 
> 
> Okay, now let's look at the read case:
> 
> - current process calls readv()
> - fuse_dev_read is invoked, and eventually calls:
>    get_user_pages(current, current->mm, address, 1, 1, ...)
> - data is then written to the kernel mapping of this page (briefly) via:
>    void *mapaddr = kmap_atomic(page, ..);
>    memcpy(mapaddr + offset, source, size);
>    kunmap_atomic(mapaddr);
> 
> A similar problem exists here.  The userspace mapping may have some
> cache lines assocated with it, and as we found out above, so may the
> kernel mapping.  We can apply the same fix to it.
> 
> However, and this is the problem, we need cache maintainence _after_
> that memcpy() has completed - with a write allocate VIVT cache, the
> memcpy() itself will allocate cache lines in the kernel mapping of
> the page which will need to be written back for the user process to
> see that data.

Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
could be replaced by the flush_kernel_dcache_page() (added by James
Bottomley together with flush_anon_page()) when all relevant
architectures have defined it.

> Moreover, the userspace mapping would need its cache lines evicted
> (maybe by get_user_pages()) for the written back kernel cache lines
> to be visible.  The more I think about this, the more I'm convinced
> that this part could only be done by get_user_pages().

Yes, I think that was the concept.

> Now, throw in SMP or preempt with a multi-threaded userspace program
> touching the page in question, and the problem just gets much much
> worse.  In such a scenario, we can not guarantee, no matter how much
> cache maintainence is applied to the kernel, that this API comes
> anywhere near to being safe.

This is only problematic if multiple threads are touching the same
page, no?  If the page(s) used for reading/writing requests are
exclusive to each thread, then there should be no problem.  This is a
reasonable requirement towards the userspace filesystem I think.

> To summarise, in order for get_user_pages() to vaguely work with this
> method of accessing the current processes address space on ARM, we'd
> need to make the following changes:
> 
> 1. get_user_pages() needs to unconditionally write back and invalidate
>    the user space mapping.
> 2. get_user_pages() needs to invalidate the kernel space mapping.
> 3. all users of get_user_pages(current, current->mm, ...) need auditing,
>    and additional cache maintainence added in their write paths to
>    write back and invalidate the kernel mapping.
> 4. all fuse userspace programs need to run single-threaded.
> 
> Note: flush_anon_page() was added to work around 1 and 2 on other
> architectures because fuse tripped over this issue there.
> 
> So, given all this additional complexity _and_ that it would only be
> safe on non-preempt UP, the question becomes: is using get_user_pages()
> to access the current processes memory space legal?  Given the above,
> I would say not.

Note again, fuse is not the only user of get_user_pages().  Other uses
are just as prone to these issues.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 15:26 fuse, get_user_pages, flush_anon_page, aliasing caches and all that again Russell King
  2006-12-21 15:53 ` Miklos Szeredi
@ 2006-12-21 16:29 ` Arjan van de Ven
  2006-12-21 17:35   ` Russell King
  1 sibling, 1 reply; 57+ messages in thread
From: Arjan van de Ven @ 2006-12-21 16:29 UTC (permalink / raw)
  To: Russell King; +Cc: Linux Kernel List, linux-arch, Miklos Szeredi


> So, given all this additional complexity _and_ that it would only be
> safe on non-preempt UP, the question becomes: is using get_user_pages()
> to access the current processes memory space legal?  Given the above,
> I would say not.

I'd say that copy_from_user is the right api for this, not
get_user_pages + kmap hacks...


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 15:53 ` Miklos Szeredi
@ 2006-12-21 16:57   ` Russell King
  2006-12-21 17:51     ` Miklos Szeredi
  2007-01-01 15:04     ` James Bottomley
  2006-12-21 17:17   ` Russell King
  1 sibling, 2 replies; 57+ messages in thread
From: Russell King @ 2006-12-21 16:57 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-arch

On Thu, Dec 21, 2006 at 04:53:56PM +0100, Miklos Szeredi wrote:
> I'll first answer the last paragraph.
> 
> > I suggest that in order for fuse to work reliably on ARM, it is modified
> > to behave more like a reasonable device driver, and use the functions
> > defined in asm/uaccess.h when it wants to access the current processes
> > VM space.
> 
> Fuse needs to use get_user_pages() to work around certain deadlock
> scenarios (see Documentation/filesystems/fuse.txt), that are
> problematic with copy_*_user().

Hmm, okay (though the documentation doesn't really provide enough
explaination for me to get a grasp of exactly what's going on.)

> > However, and this is the problem, we need cache maintainence _after_
> > that memcpy() has completed - with a write allocate VIVT cache, the
> > memcpy() itself will allocate cache lines in the kernel mapping of
> > the page which will need to be written back for the user process to
> > see that data.
> 
> Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> could be replaced by the flush_kernel_dcache_page() (added by James
> Bottomley together with flush_anon_page()) when all relevant
> architectures have defined it.

I'm not entirely convinced that it can be replaced.  What if the page
is in the page cache and is shared with other processes?  That quite
clearly falls under flush_dcache_page()'s remit.

> > Now, throw in SMP or preempt with a multi-threaded userspace program
> > touching the page in question, and the problem just gets much much
> > worse.  In such a scenario, we can not guarantee, no matter how much
> > cache maintainence is applied to the kernel, that this API comes
> > anywhere near to being safe.
> 
> This is only problematic if multiple threads are touching the same
> page, no?  If the page(s) used for reading/writing requests are
> exclusive to each thread, then there should be no problem.  This is a
> reasonable requirement towards the userspace filesystem I think.

Such a restriction needs to be clearly documented against get_user_pages()
so that users don't expect something from it which it can't deliver.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 15:53 ` Miklos Szeredi
  2006-12-21 16:57   ` Russell King
@ 2006-12-21 17:17   ` Russell King
  2006-12-21 17:55     ` Miklos Szeredi
  2006-12-30 16:39     ` Russell King
  1 sibling, 2 replies; 57+ messages in thread
From: Russell King @ 2006-12-21 17:17 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-arch

On Thu, Dec 21, 2006 at 04:53:56PM +0100, Miklos Szeredi wrote:
> Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> could be replaced by the flush_kernel_dcache_page() (added by James
> Bottomley together with flush_anon_page()) when all relevant
> architectures have defined it.

I should say that flush_anon_page() in its current form is going to be
problematic for ARM.  It is passed:

1. the struct page
2. the virtual address in process memory for the page

It is not passed the mm or vma.  This means that we have no idea whether
the virtual address is in the currently mapped VM space or not.  The
common use of get_area_pages() is to get pages from other address
spaces.

If we use the supplied virtual address to perform cache maintainence of
the userspace mapping, we might end up hitting a completely different
processes address space, which may contain some page sensitive to such
operations, or may not contain any page and thereby could cause a page
fault on some ARM CPUs.

Hence, I first need something like this patch so I can tell if the
page to be flushed is in the current VMs address space or not:

diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index 73e794f..920d0f6 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -373,7 +373,8 @@ maps this page at its virtual address.
 	likely that you will need to flush the instruction cache
 	for copy_to_user_page().
 
-  void flush_anon_page(struct page *page, unsigned long vmaddr)
+  void flush_anon_page(struct vm_area_struct *vma, struct page *page,
+                       unsigned long vmaddr)
   	When the kernel needs to access the contents of an anonymous
 	page, it calls this function (currently only
 	get_user_pages()).  Note: flush_dcache_page() deliberately
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
diff --git a/include/asm-arm/cacheflush.h b/include/asm-arm/cacheflush.h
diff --git a/include/asm-parisc/cacheflush.h b/include/asm-parisc/cacheflush.h
index aedb051..a799dd8 100644
--- a/include/asm-parisc/cacheflush.h
+++ b/include/asm-parisc/cacheflush.h
@@ -186,7 +186,7 @@ flush_cache_page(struct vm_area_struct *
 }
 
 static inline void
-flush_anon_page(struct page *page, unsigned long vmaddr)
+flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
 {
 	if (PageAnon(page))
 		flush_user_dcache_page(vmaddr);
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ca9a602..645d440 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -8,7 +8,7 @@ #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
 
 #ifndef ARCH_HAS_FLUSH_ANON_PAGE
-static inline void flush_anon_page(struct page *page, unsigned long vmaddr)
+static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
 {
 }
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index c00bac6..13970fe 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1091,7 +1091,7 @@ int get_user_pages(struct task_struct *t
 			if (pages) {
 				pages[i] = page;
 
-				flush_anon_page(page, start);
+				flush_anon_page(vma, page, start);
 				flush_dcache_page(page);
 			}
 			if (vmas)


-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 16:29 ` Arjan van de Ven
@ 2006-12-21 17:35   ` Russell King
  0 siblings, 0 replies; 57+ messages in thread
From: Russell King @ 2006-12-21 17:35 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux Kernel List, linux-arch, Miklos Szeredi

On Thu, Dec 21, 2006 at 05:29:42PM +0100, Arjan van de Ven wrote:
> 
> > So, given all this additional complexity _and_ that it would only be
> > safe on non-preempt UP, the question becomes: is using get_user_pages()
> > to access the current processes memory space legal?  Given the above,
> > I would say not.
> 
> I'd say that copy_from_user is the right api for this, not
> get_user_pages + kmap hacks...

I would tend to agree.

So the question then comes down to: is there really an issue with using
copy_*_user in fuse.

Bearing in mind that get_user_pages() simulates page faults in the
memory it is trying to access, we're going to either take simulated
page faults at that time, or real page faults in copy_*_user.

(I was just about to test a hacked up implementation of flush_anon_page()
on my test system, but it seems its ethernet interface has warmed up
too much and won't obtain a link with my switch... which makes download
of kernels impossible.  Hence it's going to have to wait a few hours for
it to cool down.)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 16:57   ` Russell King
@ 2006-12-21 17:51     ` Miklos Szeredi
  2006-12-21 21:04       ` Jan Engelhardt
  2007-01-01 15:04     ` James Bottomley
  1 sibling, 1 reply; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-21 17:51 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel, linux-arch

> On Thu, Dec 21, 2006 at 04:53:56PM +0100, Miklos Szeredi wrote:
> > I'll first answer the last paragraph.
> > 
> > > I suggest that in order for fuse to work reliably on ARM, it is modified
> > > to behave more like a reasonable device driver, and use the functions
> > > defined in asm/uaccess.h when it wants to access the current processes
> > > VM space.
> > 
> > Fuse needs to use get_user_pages() to work around certain deadlock
> > scenarios (see Documentation/filesystems/fuse.txt), that are
> > problematic with copy_*_user().
> 
> Hmm, okay (though the documentation doesn't really provide enough
> explaination for me to get a grasp of exactly what's going on.)

The root of the problem is that copy_to_user() may cause page faults
on the userspace buffer, and the page fault might (in case of a
maliciously crafted filesystem) recurse into the filesystem itself.

This applies to get_user_pages() as well, but in that case the
function taking the fault is different from the one doing the copy,
hence the original request can be aborted while in get_user_pages().

It cannot be aborted during the copy (since data belonging to the
original request is still being used), but unlike copy_to_user(),
memcpy() will never block.

> > > However, and this is the problem, we need cache maintainence _after_
> > > that memcpy() has completed - with a write allocate VIVT cache, the
> > > memcpy() itself will allocate cache lines in the kernel mapping of
> > > the page which will need to be written back for the user process to
> > > see that data.
> > 
> > Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> > could be replaced by the flush_kernel_dcache_page() (added by James
> > Bottomley together with flush_anon_page()) when all relevant
> > architectures have defined it.
> 
> I'm not entirely convinced that it can be replaced.  What if the page
> is in the page cache and is shared with other processes?  That quite
> clearly falls under flush_dcache_page()'s remit.

But flush_dcache_page() has already been called in get_user_pages(),
so the user mappings are flushed before copying the data to the kernel
mapping, and only the kernel mapping need to be flushed _after_ the
copying.

> > > Now, throw in SMP or preempt with a multi-threaded userspace program
> > > touching the page in question, and the problem just gets much much
> > > worse.  In such a scenario, we can not guarantee, no matter how much
> > > cache maintainence is applied to the kernel, that this API comes
> > > anywhere near to being safe.
> > 
> > This is only problematic if multiple threads are touching the same
> > page, no?  If the page(s) used for reading/writing requests are
> > exclusive to each thread, then there should be no problem.  This is a
> > reasonable requirement towards the userspace filesystem I think.
> 
> Such a restriction needs to be clearly documented against get_user_pages()
> so that users don't expect something from it which it can't deliver.

Agreed.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 17:17   ` Russell King
@ 2006-12-21 17:55     ` Miklos Szeredi
  2006-12-21 18:11       ` Russell King
  2006-12-30 16:39     ` Russell King
  1 sibling, 1 reply; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-21 17:55 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel, linux-arch

> > Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> > could be replaced by the flush_kernel_dcache_page() (added by James
> > Bottomley together with flush_anon_page()) when all relevant
> > architectures have defined it.
> 
> I should say that flush_anon_page() in its current form is going to be
> problematic for ARM.  It is passed:
> 
> 1. the struct page
> 2. the virtual address in process memory for the page
> 
> It is not passed the mm or vma.  This means that we have no idea whether
> the virtual address is in the currently mapped VM space or not.  The
> common use of get_area_pages() is to get pages from other address
> spaces.

I'm not sure I understand.  flush_anon_page() needs only to flush the
mapping for the given virtual address, no?  It's always mapped at that
address (since it was just accessed through that).  Any other mappings
of the anonymous page are irrelevant, they don't need to be flushed.

> If we use the supplied virtual address to perform cache maintainence of
> the userspace mapping, we might end up hitting a completely different
> processes address space, which may contain some page sensitive to such
> operations, or may not contain any page and thereby could cause a page
> fault on some ARM CPUs.

I think calling get_user_pages() from a different process' address
space simply doesn't make any sense.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 17:55     ` Miklos Szeredi
@ 2006-12-21 18:11       ` Russell King
  2006-12-21 18:30         ` Miklos Szeredi
  0 siblings, 1 reply; 57+ messages in thread
From: Russell King @ 2006-12-21 18:11 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-arch

On Thu, Dec 21, 2006 at 06:55:47PM +0100, Miklos Szeredi wrote:
> > > Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> > > could be replaced by the flush_kernel_dcache_page() (added by James
> > > Bottomley together with flush_anon_page()) when all relevant
> > > architectures have defined it.
> > 
> > I should say that flush_anon_page() in its current form is going to be
> > problematic for ARM.  It is passed:
> > 
> > 1. the struct page
> > 2. the virtual address in process memory for the page
> > 
> > It is not passed the mm or vma.  This means that we have no idea whether
> > the virtual address is in the currently mapped VM space or not.  The
> > common use of get_area_pages() is to get pages from other address
> > spaces.
> 
> I'm not sure I understand.  flush_anon_page() needs only to flush the
> mapping for the given virtual address, no?

Yes, but that virtual /user/ address is meaningless without knowing
which process address space it belongs to.

> It's always mapped at that address (since it was just accessed through
> that).

No.  Consider ptrace() (invoked by gdb) reading data from another
processes address space to obtain structure data or instructions.

> Any other mappings
> of the anonymous page are irrelevant, they don't need to be flushed.

Again, incorrect.  Consider if the page you're accessing is a file-
backed page, and is mapped into a process using a shared mapping.
Because you've written to the file, those shared mappings need to see
that write, and the interface for achieving that is flush_dcache_page().
If not, data loss can occur.

> > If we use the supplied virtual address to perform cache maintainence of
> > the userspace mapping, we might end up hitting a completely different
> > processes address space, which may contain some page sensitive to such
> > operations, or may not contain any page and thereby could cause a page
> > fault on some ARM CPUs.
> 
> I think calling get_user_pages() from a different process' address
> space simply doesn't make any sense.

That was it's main use - to implement ptrace() to read other processes
address spaces.  Why do you think it takes a task_struct and mm_struct?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 18:11       ` Russell King
@ 2006-12-21 18:30         ` Miklos Szeredi
  2006-12-21 18:55           ` Russell King
  0 siblings, 1 reply; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-21 18:30 UTC (permalink / raw)
  To: rmk+lkml; +Cc: miklos, linux-kernel, linux-arch

> > > > Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> > > > could be replaced by the flush_kernel_dcache_page() (added by James
> > > > Bottomley together with flush_anon_page()) when all relevant
> > > > architectures have defined it.
> > > 
> > > I should say that flush_anon_page() in its current form is going to be
> > > problematic for ARM.  It is passed:
> > > 
> > > 1. the struct page
> > > 2. the virtual address in process memory for the page
> > > 
> > > It is not passed the mm or vma.  This means that we have no idea whether
> > > the virtual address is in the currently mapped VM space or not.  The
> > > common use of get_area_pages() is to get pages from other address
> > > spaces.
> > 
> > I'm not sure I understand.  flush_anon_page() needs only to flush the
> > mapping for the given virtual address, no?
> 
> Yes, but that virtual /user/ address is meaningless without knowing
> which process address space it belongs to.
> 
> > It's always mapped at that address (since it was just accessed through
> > that).
> 
> No.  Consider ptrace() (invoked by gdb) reading data from another
> processes address space to obtain structure data or instructions.
> 
> > Any other mappings
> > of the anonymous page are irrelevant, they don't need to be flushed.
> 
> Again, incorrect.  Consider if the page you're accessing is a file-
> backed page, and is mapped into a process using a shared mapping.
> Because you've written to the file, those shared mappings need to see
> that write, and the interface for achieving that is flush_dcache_page().
> If not, data loss can occur.

Yes, for file backed pages.  But flush_anon_page() only needs to deal
with anonymous (not file backed) pages.

> > > If we use the supplied virtual address to perform cache maintainence of
> > > the userspace mapping, we might end up hitting a completely different
> > > processes address space, which may contain some page sensitive to such
> > > operations, or may not contain any page and thereby could cause a page
> > > fault on some ARM CPUs.
> > 
> > I think calling get_user_pages() from a different process' address
> > space simply doesn't make any sense.
> 
> That was it's main use - to implement ptrace() to read other processes
> address spaces.  Why do you think it takes a task_struct and mm_struct?

Right, I missed that.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 18:30         ` Miklos Szeredi
@ 2006-12-21 18:55           ` Russell King
  2006-12-21 19:05             ` Miklos Szeredi
  0 siblings, 1 reply; 57+ messages in thread
From: Russell King @ 2006-12-21 18:55 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-arch

On Thu, Dec 21, 2006 at 07:30:11PM +0100, Miklos Szeredi wrote:
> > > > > Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> > > > > could be replaced by the flush_kernel_dcache_page() (added by James
> > > > > Bottomley together with flush_anon_page()) when all relevant
> > > > > architectures have defined it.
> > > > 
> > > > I should say that flush_anon_page() in its current form is going to be
> > > > problematic for ARM.  It is passed:
> > > > 
> > > > 1. the struct page
> > > > 2. the virtual address in process memory for the page
> > > > 
> > > > It is not passed the mm or vma.  This means that we have no idea whether
> > > > the virtual address is in the currently mapped VM space or not.  The
> > > > common use of get_area_pages() is to get pages from other address
> > > > spaces.
> > > 
> > > I'm not sure I understand.  flush_anon_page() needs only to flush the
> > > mapping for the given virtual address, no?
> > 
> > Yes, but that virtual /user/ address is meaningless without knowing
> > which process address space it belongs to.
> > 
> > > It's always mapped at that address (since it was just accessed through
> > > that).
> > 
> > No.  Consider ptrace() (invoked by gdb) reading data from another
> > processes address space to obtain structure data or instructions.
> > 
> > > Any other mappings
> > > of the anonymous page are irrelevant, they don't need to be flushed.
> > 
> > Again, incorrect.  Consider if the page you're accessing is a file-
> > backed page, and is mapped into a process using a shared mapping.
> > Because you've written to the file, those shared mappings need to see
> > that write, and the interface for achieving that is flush_dcache_page().
> > If not, data loss can occur.
> 
> Yes, for file backed pages.  But flush_anon_page() only needs to deal
> with anonymous (not file backed) pages.

Ignore my final paragraph, it was clearly wrong; I was thinking about
the flush after you've written to the page.

However, I continue to assert that I require the VMA to implement
flush_anon_page() since a userspace address without knowing which
userspace it corresponds with is utterly useless for cache maintainence
purposes.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 18:55           ` Russell King
@ 2006-12-21 19:05             ` Miklos Szeredi
  2006-12-21 23:51               ` Randolph Chung
  0 siblings, 1 reply; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-21 19:05 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel, linux-arch

> > > > > > Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> > > > > > could be replaced by the flush_kernel_dcache_page() (added by James
> > > > > > Bottomley together with flush_anon_page()) when all relevant
> > > > > > architectures have defined it.
> > > > > 
> > > > > I should say that flush_anon_page() in its current form is going to be
> > > > > problematic for ARM.  It is passed:
> > > > > 
> > > > > 1. the struct page
> > > > > 2. the virtual address in process memory for the page
> > > > > 
> > > > > It is not passed the mm or vma.  This means that we have no idea whether
> > > > > the virtual address is in the currently mapped VM space or not.  The
> > > > > common use of get_area_pages() is to get pages from other address
> > > > > spaces.
> > > > 
> > > > I'm not sure I understand.  flush_anon_page() needs only to flush the
> > > > mapping for the given virtual address, no?
> > > 
> > > Yes, but that virtual /user/ address is meaningless without knowing
> > > which process address space it belongs to.
> > > 
> > > > It's always mapped at that address (since it was just accessed through
> > > > that).
> > > 
> > > No.  Consider ptrace() (invoked by gdb) reading data from another
> > > processes address space to obtain structure data or instructions.
> > > 
> > > > Any other mappings
> > > > of the anonymous page are irrelevant, they don't need to be flushed.
> > > 
> > > Again, incorrect.  Consider if the page you're accessing is a file-
> > > backed page, and is mapped into a process using a shared mapping.
> > > Because you've written to the file, those shared mappings need to see
> > > that write, and the interface for achieving that is flush_dcache_page().
> > > If not, data loss can occur.
> > 
> > Yes, for file backed pages.  But flush_anon_page() only needs to deal
> > with anonymous (not file backed) pages.
> 
> Ignore my final paragraph, it was clearly wrong; I was thinking about
> the flush after you've written to the page.
> 
> However, I continue to assert that I require the VMA to implement
> flush_anon_page() since a userspace address without knowing which
> userspace it corresponds with is utterly useless for cache maintainence
> purposes.

I understand now.  I'm not sure how the PARISC implementation can be
correct in this light.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 17:51     ` Miklos Szeredi
@ 2006-12-21 21:04       ` Jan Engelhardt
  2006-12-21 21:30         ` Miklos Szeredi
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Engelhardt @ 2006-12-21 21:04 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: rmk+lkml, linux-kernel, linux-arch


On Dec 21 2006 18:51, Miklos Szeredi wrote:
>
>The root of the problem is that copy_to_user() may cause page faults
>on the userspace buffer, and the page fault might (in case of a
>maliciously crafted filesystem) recurse into the filesystem itself.

Would it be worthwhile to mlock the page? I know that needs root
privs or some capability, but a static buffer could be put aside when
fusermount is run.


	-`J'
-- 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 21:04       ` Jan Engelhardt
@ 2006-12-21 21:30         ` Miklos Szeredi
  0 siblings, 0 replies; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-21 21:30 UTC (permalink / raw)
  To: jengelh; +Cc: rmk+lkml, linux-kernel, linux-arch

> >
> >The root of the problem is that copy_to_user() may cause page faults
> >on the userspace buffer, and the page fault might (in case of a
> >maliciously crafted filesystem) recurse into the filesystem itself.
> 
> Would it be worthwhile to mlock the page? I know that needs root
> privs or some capability, but a static buffer could be put aside when
> fusermount is run.

And how would the kernel ensure, that the buffer supplied by userspace
is mlocked and stays mlocked during the memory copy?  I don't think
that would simplify the kerel side much, and would complicate the
userspace side considerably.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 19:05             ` Miklos Szeredi
@ 2006-12-21 23:51               ` Randolph Chung
  2006-12-22  8:43                 ` Russell King
  0 siblings, 1 reply; 57+ messages in thread
From: Randolph Chung @ 2006-12-21 23:51 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: rmk+lkml, linux-kernel, linux-arch

> I understand now.  I'm not sure how the PARISC implementation can be
> correct in this light.

According to cachetlb.txt:

   void flush_anon_page(struct page *page, unsigned long vmaddr)
         When the kernel needs to access the contents of an anonymous
         page, it calls this function (currently only
         get_user_pages()).  Note: flush_dcache_page() deliberately
         doesn't work for an anonymous page.  The default
         implementation is a nop (and should remain so for all coherent
         architectures).  For incoherent architectures, it should flush
         the cache of the page at vmaddr in the current user process.
                                                ^^^^^^^^^^^^^^^^^^^^

Is the documentation wrong?

randolph

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 23:51               ` Randolph Chung
@ 2006-12-22  8:43                 ` Russell King
  2006-12-22 14:45                   ` Randolph Chung
  0 siblings, 1 reply; 57+ messages in thread
From: Russell King @ 2006-12-22  8:43 UTC (permalink / raw)
  To: Randolph Chung; +Cc: Miklos Szeredi, linux-kernel, linux-arch

On Fri, Dec 22, 2006 at 07:51:34AM +0800, Randolph Chung wrote:
> >I understand now.  I'm not sure how the PARISC implementation can be
> >correct in this light.
> 
> According to cachetlb.txt:
> 
>   void flush_anon_page(struct page *page, unsigned long vmaddr)
>         When the kernel needs to access the contents of an anonymous
>         page, it calls this function (currently only
>         get_user_pages()).  Note: flush_dcache_page() deliberately
>         doesn't work for an anonymous page.  The default
>         implementation is a nop (and should remain so for all coherent
>         architectures).  For incoherent architectures, it should flush
>         the cache of the page at vmaddr in the current user process.
>                                                ^^^^^^^^^^^^^^^^^^^^
> 
> Is the documentation wrong?

Yes.  As I've already explained there is no guarantee that
get_user_pages() is only called to obtain pages for the current
process, and flush_anon_pages() is called irrespective of which
user process is being 'got'.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-22  8:43                 ` Russell King
@ 2006-12-22 14:45                   ` Randolph Chung
  0 siblings, 0 replies; 57+ messages in thread
From: Randolph Chung @ 2006-12-22 14:45 UTC (permalink / raw)
  To: Randolph Chung, Miklos Szeredi, linux-kernel, linux-arch

>> Is the documentation wrong?
> 
> Yes.  As I've already explained there is no guarantee that
> get_user_pages() is only called to obtain pages for the current
> process, and flush_anon_pages() is called irrespective of which
> user process is being 'got'.

ok, it's easy enough to fix, I'm just trying to point out that it's 
being implemented on parisc according to the documentation.

randolph




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 17:17   ` Russell King
  2006-12-21 17:55     ` Miklos Szeredi
@ 2006-12-30 16:39     ` Russell King
  2006-12-30 16:50       ` Russell King
  2006-12-30 18:21       ` Linus Torvalds
  1 sibling, 2 replies; 57+ messages in thread
From: Russell King @ 2006-12-30 16:39 UTC (permalink / raw)
  To: Miklos Szeredi, linux-kernel, linux-arch, Andrew Morton, Linus Torvalds

Given that no one has any outstanding issues with the following patch, I'm
going to ask akpm to put this into -mm, and shortly after (a week or so)
I'll submit it and the ARM flush_anon_page() patch to Linus for -rc to fix
ARM data corruption issues.

If anyone _does_ have a problem, holler ASAP.

On Thu, Dec 21, 2006 at 05:17:39PM +0000, Russell King wrote:
> On Thu, Dec 21, 2006 at 04:53:56PM +0100, Miklos Szeredi wrote:
> > Yes, note the flush_dcache_page() call in fuse_copy_finish().  That
> > could be replaced by the flush_kernel_dcache_page() (added by James
> > Bottomley together with flush_anon_page()) when all relevant
> > architectures have defined it.
> 
> I should say that flush_anon_page() in its current form is going to be
> problematic for ARM.  It is passed:
> 
> 1. the struct page
> 2. the virtual address in process memory for the page
> 
> It is not passed the mm or vma.  This means that we have no idea whether
> the virtual address is in the currently mapped VM space or not.  The
> common use of get_area_pages() is to get pages from other address
> spaces.
> 
> If we use the supplied virtual address to perform cache maintainence of
> the userspace mapping, we might end up hitting a completely different
> processes address space, which may contain some page sensitive to such
> operations, or may not contain any page and thereby could cause a page
> fault on some ARM CPUs.
> 
> Hence, I first need something like this patch so I can tell if the
> page to be flushed is in the current VMs address space or not:

diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index 73e794f..920d0f6 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -373,7 +373,8 @@ maps this page at its virtual address.
 	likely that you will need to flush the instruction cache
 	for copy_to_user_page().
 
-  void flush_anon_page(struct page *page, unsigned long vmaddr)
+  void flush_anon_page(struct vm_area_struct *vma, struct page *page,
+                       unsigned long vmaddr)
   	When the kernel needs to access the contents of an anonymous
 	page, it calls this function (currently only
 	get_user_pages()).  Note: flush_dcache_page() deliberately
diff --git a/include/asm-parisc/cacheflush.h b/include/asm-parisc/cacheflush.h
index aedb051..a799dd8 100644
--- a/include/asm-parisc/cacheflush.h
+++ b/include/asm-parisc/cacheflush.h
@@ -186,7 +186,7 @@ flush_cache_page(struct vm_area_struct *
 }
 
 static inline void
-flush_anon_page(struct page *page, unsigned long vmaddr)
+flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
 {
 	if (PageAnon(page))
 		flush_user_dcache_page(vmaddr);
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ca9a602..645d440 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -8,7 +8,7 @@ #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
 
 #ifndef ARCH_HAS_FLUSH_ANON_PAGE
-static inline void flush_anon_page(struct page *page, unsigned long vmaddr)
+static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
 {
 }
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index c00bac6..13970fe 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1091,7 +1091,7 @@ int get_user_pages(struct task_struct *t
 			if (pages) {
 				pages[i] = page;
 
-				flush_anon_page(page, start);
+				flush_anon_page(vma, page, start);
 				flush_dcache_page(page);
 			}
 			if (vmas)


> -- 
> Russell King
>  Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
>  maintainer of:
> -
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-30 16:39     ` Russell King
@ 2006-12-30 16:50       ` Russell King
  2006-12-30 18:26         ` Linus Torvalds
  2006-12-30 18:21       ` Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: Russell King @ 2006-12-30 16:50 UTC (permalink / raw)
  To: Miklos Szeredi, linux-kernel, linux-arch, Andrew Morton, Linus Torvalds

On Sat, Dec 30, 2006 at 04:39:55PM +0000, Russell King wrote:
> Given that no one has any outstanding issues with the following patch, I'm
> going to ask akpm to put this into -mm, and shortly after (a week or so)
> I'll submit it and the ARM flush_anon_page() patch to Linus for -rc to fix
> ARM data corruption issues.

And here's the flush_anon_page() part.

Add flush_anon_page() for ARM, to avoid data corruption issues when using
fuse or other subsystems using get_user_pages().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 628348c..86dd204 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -202,3 +202,39 @@ #endif
 	}
 }
 EXPORT_SYMBOL(flush_dcache_page);
+
+/*
+ * Flush an anonymous page so that users of get_user_pages()
+ * can safely access the data.  The expected sequence is:
+ *
+ *  get_user_pages()
+ *    -> flush_anon_page
+ *  memcpy() to/from page
+ *  if written to page, flush_dcache_page()
+ */
+void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
+{
+	/* VIPT non-aliasing caches need do nothing */
+	if (cache_is_vipt_nonaliasing())
+		return;
+
+	/*
+	 * Write back and invalidate userspace mapping.
+	 */
+	if (cache_is_vivt()) {
+		flush_cache_page(vma, vmaddr, page_to_pfn(page));
+	} else {
+		/*
+		 * For aliasing VIPT, we can flush an alias of the
+		 * userspace address only.
+		 */
+		flush_pfn_alias(page_to_pfn(page), vmaddr);
+	}
+
+	/*
+	 * Invalidate kernel mapping.  No data should be contained
+	 * in this mapping of the page.  FIXME: this is overkill
+	 * since we actually ask for a write-back and invalidate.
+	 */
+	__cpuc_flush_dcache_page(page_address(page));
+}
diff --git a/include/asm-arm/cacheflush.h b/include/asm-arm/cacheflush.h
index 378a3a2..506ef15 100644
--- a/include/asm-arm/cacheflush.h
+++ b/include/asm-arm/cacheflush.h
@@ -355,6 +355,16 @@ #define clean_dcache_area(start,size)	cp
  */
 extern void flush_dcache_page(struct page *);
 
+#define ARCH_HAS_FLUSH_ANON_PAGE
+static inline void flush_anon_page(struct vm_area_struct *vma,
+			 struct page *page, unsigned long vmaddr)
+{
+	extern void __flush_anon_page(struct vm_area_struct *vma,
+				struct page *, unsigned long);
+	if (PageAnon(page))
+		__flush_anon_page(vma, page, vmaddr);
+}
+
 #define flush_dcache_mmap_lock(mapping) \
 	write_lock_irq(&(mapping)->tree_lock)
 #define flush_dcache_mmap_unlock(mapping) \


-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-30 16:39     ` Russell King
  2006-12-30 16:50       ` Russell King
@ 2006-12-30 18:21       ` Linus Torvalds
  1 sibling, 0 replies; 57+ messages in thread
From: Linus Torvalds @ 2006-12-30 18:21 UTC (permalink / raw)
  To: Russell King; +Cc: Miklos Szeredi, linux-kernel, linux-arch, Andrew Morton



On Sat, 30 Dec 2006, Russell King wrote:
>
> Given that no one has any outstanding issues with the following patch, I'm
> going to ask akpm to put this into -mm, and shortly after (a week or so)
> I'll submit it and the ARM flush_anon_page() patch to Linus for -rc to fix
> ARM data corruption issues.
> 
> If anyone _does_ have a problem, holler ASAP.

Looks fine to me. 

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-30 16:50       ` Russell King
@ 2006-12-30 18:26         ` Linus Torvalds
  2006-12-30 22:46           ` Russell King
  2007-01-01 14:35           ` James Bottomley
  0 siblings, 2 replies; 57+ messages in thread
From: Linus Torvalds @ 2006-12-30 18:26 UTC (permalink / raw)
  To: Russell King; +Cc: Miklos Szeredi, linux-kernel, linux-arch, Andrew Morton



On Sat, 30 Dec 2006, Russell King wrote:
> 
> And here's the flush_anon_page() part.
> 
> Add flush_anon_page() for ARM, to avoid data corruption issues when using
> fuse or other subsystems using get_user_pages().

Btw, since this doesn't actually change any code for anybody but ARM, just 
adds a parameter that is obviously unused by everybody else, and if it 
actually fixes a real bug for ARM, I'll obviously happily take it even 
before 2.6.20. So go ahead put it in your ARM tree, and we'll get some 
testing through that. And just ask me to pull at some point.

I wonder why nobody else seems to have a "flush_anon_page()"? This would 
seem to be a potential issue for architectures like sparc too.. Although 
maybe sparc can do a flush by physical index with "flush_dcache_page()".

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-30 18:26         ` Linus Torvalds
@ 2006-12-30 22:46           ` Russell King
  2006-12-31  5:23             ` David Miller
  2007-01-01 14:35           ` James Bottomley
  1 sibling, 1 reply; 57+ messages in thread
From: Russell King @ 2006-12-30 22:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Miklos Szeredi, linux-kernel, linux-arch, Andrew Morton

On Sat, Dec 30, 2006 at 10:26:20AM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 30 Dec 2006, Russell King wrote:
> > 
> > And here's the flush_anon_page() part.
> > 
> > Add flush_anon_page() for ARM, to avoid data corruption issues when using
> > fuse or other subsystems using get_user_pages().
> 
> Btw, since this doesn't actually change any code for anybody but ARM, just 
> adds a parameter that is obviously unused by everybody else, and if it 
> actually fixes a real bug for ARM, I'll obviously happily take it even 
> before 2.6.20. So go ahead put it in your ARM tree, and we'll get some 
> testing through that. And just ask me to pull at some point.
> 
> I wonder why nobody else seems to have a "flush_anon_page()"? This would 
> seem to be a potential issue for architectures like sparc too.. Although 
> maybe sparc can do a flush by physical index with "flush_dcache_page()".

Well...

iirc, flush_anon_page() was introduced to fix non-working fuse on parisc,
which occurs because fuse wants to use get_user_pages() to read data from
the current processes memory space.

get_user_pages() contains a call to flush_dcache_page(), whose behaviour
is defined for shared mappings.  Anonymous pages are unspecified.  It
appears that flush_anon_page() was introduced to correct this oversight.

Looking at some of the other users of get_user_pages() which want to
access the current processes memory space, one finds the following:

some use flush_cache_page():
- binfmt_elf coredump
- ptrace (in arch code)

others don't:
- aio
- bio
- block (block_dev::blk_get_page seems to be for direct-io, there's problems
   reported with this on ARM)
- direct-io
- fuse
- vmsplice

So, anything except coredumps and ptrace are currently unsafe on ARM
without something being added to get_user_pages() to ensure coherency
of anonymous pages.

Given that we have reported corruption with direct-io, and debian bug
#402876 for nonworking fuse, it seems the correct thing to do is to
implement this function.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-30 22:46           ` Russell King
@ 2006-12-31  5:23             ` David Miller
  2006-12-31  9:10               ` Miklos Szeredi
  2006-12-31  9:23               ` Russell King
  0 siblings, 2 replies; 57+ messages in thread
From: David Miller @ 2006-12-31  5:23 UTC (permalink / raw)
  To: rmk+lkml; +Cc: torvalds, miklos, linux-kernel, linux-arch, akpm

From: Russell King <rmk+lkml@arm.linux.org.uk>
Date: Sat, 30 Dec 2006 22:46:04 +0000

> iirc, flush_anon_page() was introduced to fix non-working fuse on parisc,
> which occurs because fuse wants to use get_user_pages() to read data from
> the current processes memory space.
> 
> get_user_pages() contains a call to flush_dcache_page(), whose behaviour
> is defined for shared mappings.  Anonymous pages are unspecified.  It
> appears that flush_anon_page() was introduced to correct this oversight.

Sparc64 flushes anonymous pages too when flush_dcache_page() is
called on such pages.  It only tries to "defer" flushes for
pages which have a non-NULL page_mapping().  For NULL page_mapping()
we just flush immediately.

This works on sparc64, as sort of hinted by Linus, because we can
flush by physical page on sparc64.  I guess the lack of ability to
do that is why PARISC and ARM don't do that too.

For the ptrace() cases we created the copy_{to,from}_user_page()
interfaces.  So that when you access data inside of pages obtained via
a get_user_pages() call, you are supposed to use those two interfaces
so that everything works out right.

Therefore, FUSE probably could have been fixed by judicious use
of copy_{to,from}_user_page() calls instead of adding this new
ad-hoc flush_anon_page() thing.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  5:23             ` David Miller
@ 2006-12-31  9:10               ` Miklos Szeredi
  2006-12-31  9:45                 ` David Miller
  2006-12-31  9:23               ` Russell King
  1 sibling, 1 reply; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-31  9:10 UTC (permalink / raw)
  To: davem; +Cc: rmk+lkml, torvalds, linux-kernel, linux-arch, akpm

> Therefore, FUSE probably could have been fixed by judicious use
> of copy_{to,from}_user_page() calls instead of adding this new
> ad-hoc flush_anon_page() thing.

Probably, but I don't think either interface is perfect.
copy_*_user_page() will double flush the user mapping
(get_user_pages() already does a flush_dcache_page()).  Actually
nothing except ptrace uses the copy_*_user_page() interface even
though there are many uses of get_user_pages().

What I think get_user_pages() really needs is a single operation that

  - flushes the virtual address view
  - flushes the kernel view

regardles whether the page is anonymous or file backed.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  5:23             ` David Miller
  2006-12-31  9:10               ` Miklos Szeredi
@ 2006-12-31  9:23               ` Russell King
  2006-12-31  9:27                 ` Arjan van de Ven
  2006-12-31  9:46                 ` David Miller
  1 sibling, 2 replies; 57+ messages in thread
From: Russell King @ 2006-12-31  9:23 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, miklos, linux-kernel, linux-arch, akpm

On Sat, Dec 30, 2006 at 09:23:38PM -0800, David Miller wrote:
> From: Russell King <rmk+lkml@arm.linux.org.uk>
> Date: Sat, 30 Dec 2006 22:46:04 +0000
> 
> > iirc, flush_anon_page() was introduced to fix non-working fuse on parisc,
> > which occurs because fuse wants to use get_user_pages() to read data from
> > the current processes memory space.
> > 
> > get_user_pages() contains a call to flush_dcache_page(), whose behaviour
> > is defined for shared mappings.  Anonymous pages are unspecified.  It
> > appears that flush_anon_page() was introduced to correct this oversight.
> 
> Sparc64 flushes anonymous pages too when flush_dcache_page() is
> called on such pages.  It only tries to "defer" flushes for
> pages which have a non-NULL page_mapping().  For NULL page_mapping()
> we just flush immediately.

We do this on ARM - if page_mapping() is NULL, we flush the kernel
alias unconditionally.  However, we have no view where the user
mapping of that page is, which is where the problem is.  Cache lines
remain allocated for the user mapping and data contained within is
not visible via the kernel mapping.

> For the ptrace() cases we created the copy_{to,from}_user_page()
> interfaces.  So that when you access data inside of pages obtained via
> a get_user_pages() call, you are supposed to use those two interfaces
> so that everything works out right.
> 
> Therefore, FUSE probably could have been fixed by judicious use
> of copy_{to,from}_user_page() calls instead of adding this new
> ad-hoc flush_anon_page() thing.

It would've been nice to have had your view when the whole thing blew
up.

However, it's not only FUSE which is suffering - direct-IO also doesn't
work.  If my analysis is correct, only _two_ users of get_user_pages()
with the current process actually does the right thing and that's ptrace
and ELF core dumping.  All other users are buggy.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  9:23               ` Russell King
@ 2006-12-31  9:27                 ` Arjan van de Ven
  2006-12-31  9:47                   ` David Miller
  2006-12-31  9:55                   ` Russell King
  2006-12-31  9:46                 ` David Miller
  1 sibling, 2 replies; 57+ messages in thread
From: Arjan van de Ven @ 2006-12-31  9:27 UTC (permalink / raw)
  To: Russell King
  Cc: David Miller, torvalds, miklos, linux-kernel, linux-arch, akpm


> 
> However, it's not only FUSE which is suffering - direct-IO also doesn't
> work. 

for direct-IO the kernel won't touch the data *at all*... (that's the
point ;) 

is it still an issue then?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  9:10               ` Miklos Szeredi
@ 2006-12-31  9:45                 ` David Miller
  0 siblings, 0 replies; 57+ messages in thread
From: David Miller @ 2006-12-31  9:45 UTC (permalink / raw)
  To: miklos; +Cc: rmk+lkml, torvalds, linux-kernel, linux-arch, akpm

From: Miklos Szeredi <miklos@szeredi.hu>
Date: Sun, 31 Dec 2006 10:10:35 +0100

> > Therefore, FUSE probably could have been fixed by judicious use
> > of copy_{to,from}_user_page() calls instead of adding this new
> > ad-hoc flush_anon_page() thing.
> 
> Probably, but I don't think either interface is perfect.
> copy_*_user_page() will double flush the user mapping
> (get_user_pages() already does a flush_dcache_page()).  Actually
> nothing except ptrace uses the copy_*_user_page() interface even
> though there are many uses of get_user_pages().
> 
> What I think get_user_pages() really needs is a single operation that
> 
>   - flushes the virtual address view
>   - flushes the kernel view
> 
> regardles whether the page is anonymous or file backed.

I guess part of the problem is that flush_dcache_page() could have
better defined semantics.

Sparc64 never notices this precisely because it can flush by physical
page and thus it can just do the flush on anonymous pages
synchronously very cheaply.

ARM probably avoids it because it would be very expensive since it
needs to flush by virtual address, so in the best case it would need
to walk the RMAP mapping list and hit every user mapping, then flush
again for the kernel side virtual address of the direct mapping to
that physical page too.

It is certainly the case that all of this stuff could be done in
a much cleaner manner.  Although it's really hard for me to get
excited by this and do the work since sparc64 already works :-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  9:23               ` Russell King
  2006-12-31  9:27                 ` Arjan van de Ven
@ 2006-12-31  9:46                 ` David Miller
  1 sibling, 0 replies; 57+ messages in thread
From: David Miller @ 2006-12-31  9:46 UTC (permalink / raw)
  To: rmk+lkml; +Cc: torvalds, miklos, linux-kernel, linux-arch, akpm

From: Russell King <rmk+lkml@arm.linux.org.uk>
Date: Sun, 31 Dec 2006 09:23:18 +0000

> We do this on ARM - if page_mapping() is NULL, we flush the kernel
> alias unconditionally.  However, we have no view where the user
> mapping of that page is, which is where the problem is.  Cache lines
> remain allocated for the user mapping and data contained within is
> not visible via the kernel mapping.

You can walk the RMAP list.

> However, it's not only FUSE which is suffering - direct-IO also doesn't
> work.  If my analysis is correct, only _two_ users of get_user_pages()
> with the current process actually does the right thing and that's ptrace
> and ELF core dumping.  All other users are buggy.

I do not argue that these cases need work, in fact I am aware
of the direct-IO bit.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  9:27                 ` Arjan van de Ven
@ 2006-12-31  9:47                   ` David Miller
  2006-12-31 10:00                     ` Russell King
       [not found]                     ` <1167669252.5302.57.camel@mulgrave.il.steeleye.com>
  2006-12-31  9:55                   ` Russell King
  1 sibling, 2 replies; 57+ messages in thread
From: David Miller @ 2006-12-31  9:47 UTC (permalink / raw)
  To: arjan; +Cc: rmk+lkml, torvalds, miklos, linux-kernel, linux-arch, akpm

From: Arjan van de Ven <arjan@infradead.org>
Date: Sun, 31 Dec 2006 10:27:22 +0100

> 
> > 
> > However, it's not only FUSE which is suffering - direct-IO also doesn't
> > work. 
> 
> for direct-IO the kernel won't touch the data *at all*... (that's the
> point ;) 
> 
> is it still an issue then?

It can be an issue with virtual caches if the "I/O" is done
using cpu loads and stores, but we should be handling that
with explicit flushing anyways.

The core of the problem is that ARM doesn't look for the user
mappings for anonymous pages when flush_dcache_page() is invoked.
I think as a temporary fix it could walk the RMAP list and
use that to find the user virtual mappings.  Would that work
Russel?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  9:27                 ` Arjan van de Ven
  2006-12-31  9:47                   ` David Miller
@ 2006-12-31  9:55                   ` Russell King
  1 sibling, 0 replies; 57+ messages in thread
From: Russell King @ 2006-12-31  9:55 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: David Miller, torvalds, miklos, linux-kernel, linux-arch, akpm

On Sun, Dec 31, 2006 at 10:27:22AM +0100, Arjan van de Ven wrote:
> 
> > 
> > However, it's not only FUSE which is suffering - direct-IO also doesn't
> > work. 
> 
> for direct-IO the kernel won't touch the data *at all*... (that's the
> point ;) 

Wrong.  One word: PIO.  We _still_ to this day have no guarantee by
block device drivers that the data they've read into the page cache
will be flushed into RAM rather than sitting in the CPU cache.

> is it still an issue then?

http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2006-November/036906.html

Depends if you think about that patch I guess.  If you're an embedded
person trying to get direct-IO working on ARM I guess that patch is
very attractive, even if it is distasteful to us.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31  9:47                   ` David Miller
@ 2006-12-31 10:00                     ` Russell King
  2006-12-31 10:04                       ` David Miller
  2006-12-31 12:24                       ` Miklos Szeredi
       [not found]                     ` <1167669252.5302.57.camel@mulgrave.il.steeleye.com>
  1 sibling, 2 replies; 57+ messages in thread
From: Russell King @ 2006-12-31 10:00 UTC (permalink / raw)
  To: David Miller; +Cc: arjan, torvalds, miklos, linux-kernel, linux-arch, akpm

On Sun, Dec 31, 2006 at 01:47:56AM -0800, David Miller wrote:
> From: Arjan van de Ven <arjan@infradead.org>
> Date: Sun, 31 Dec 2006 10:27:22 +0100
> > > However, it's not only FUSE which is suffering - direct-IO also doesn't
> > > work. 
> > 
> > for direct-IO the kernel won't touch the data *at all*... (that's the
> > point ;) 
> > 
> > is it still an issue then?
> 
> It can be an issue with virtual caches if the "I/O" is done
> using cpu loads and stores, but we should be handling that
> with explicit flushing anyways.
> 
> The core of the problem is that ARM doesn't look for the user
> mappings for anonymous pages when flush_dcache_page() is invoked.
> I think as a temporary fix it could walk the RMAP list and
> use that to find the user virtual mappings.  Would that work
> Russel?

I'm willing to do that - and I guess this means we can probably do this
instead of walking the list of VMAs for the shared mapping, thereby
hitting both anonymous and shared mappings with the same code?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 10:00                     ` Russell King
@ 2006-12-31 10:04                       ` David Miller
  2006-12-31 12:24                       ` Miklos Szeredi
  1 sibling, 0 replies; 57+ messages in thread
From: David Miller @ 2006-12-31 10:04 UTC (permalink / raw)
  To: rmk+lkml; +Cc: arjan, torvalds, miklos, linux-kernel, linux-arch, akpm

From: Russell King <rmk+lkml@arm.linux.org.uk>
Date: Sun, 31 Dec 2006 10:00:07 +0000

> I'm willing to do that - and I guess this means we can probably do this
> instead of walking the list of VMAs for the shared mapping, thereby
> hitting both anonymous and shared mappings with the same code?

That's pretty much the idea.

BTW, I was in a similar boat as you on sparc 32-bit in the
pre-RMAP days, I was able to walk the VMA's for stuff
with a mapping but couldn't handle anonymous stuff very
well.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 10:00                     ` Russell King
  2006-12-31 10:04                       ` David Miller
@ 2006-12-31 12:24                       ` Miklos Szeredi
  2006-12-31 17:37                         ` Russell King
  2006-12-31 20:40                         ` David Miller
  1 sibling, 2 replies; 57+ messages in thread
From: Miklos Szeredi @ 2006-12-31 12:24 UTC (permalink / raw)
  To: rmk+lkml; +Cc: davem, arjan, torvalds, linux-kernel, linux-arch, akpm

> I'm willing to do that - and I guess this means we can probably do this
> instead of walking the list of VMAs for the shared mapping, thereby
> hitting both anonymous and shared mappings with the same code?

But for the get_user_pages() case there's no point, is there?  The VMA
and the virtual address is already available, so trying to find it
again through RMAP doesn't much make sense.

Users of get_user_pages() don't care about any other mappings (maybe
ptrace does, I don't know) only about one single user mapping and one
kernel mapping.

So using flush_dcache_page() there is an overkill, trying to teach it
about anonymous pages is not the real solution, flush_dcache_page()
was never meant to be used on anything but file mapped pages.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 12:24                       ` Miklos Szeredi
@ 2006-12-31 17:37                         ` Russell King
  2007-01-01 22:15                           ` Miklos Szeredi
  2006-12-31 20:40                         ` David Miller
  1 sibling, 1 reply; 57+ messages in thread
From: Russell King @ 2006-12-31 17:37 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: davem, arjan, torvalds, linux-kernel, linux-arch, akpm

On Sun, Dec 31, 2006 at 01:24:53PM +0100, Miklos Szeredi wrote:
> > I'm willing to do that - and I guess this means we can probably do this
> > instead of walking the list of VMAs for the shared mapping, thereby
> > hitting both anonymous and shared mappings with the same code?
> 
> But for the get_user_pages() case there's no point, is there?  The VMA
> and the virtual address is already available, so trying to find it
> again through RMAP doesn't much make sense.
> 
> Users of get_user_pages() don't care about any other mappings (maybe
> ptrace does, I don't know) only about one single user mapping and one
> kernel mapping.
> 
> So using flush_dcache_page() there is an overkill, trying to teach it
> about anonymous pages is not the real solution, flush_dcache_page()
> was never meant to be used on anything but file mapped pages.

It's not actually.  For flush_anon_page() we currently have to flush the
user mapping and the kernel mapping.  For flush_dcache_page(), it's
exactly the same - we have to flush the kernel mapping and the user
mapping.

So having both calls in get_user_pages() actually means we end up flushing
the kernel mapping _twice_ for every page.  _That_ is overkill.

As Dave points out, flush_dcache_page() _should_ take care of anonymous
pages as well (which if you remember was my argument to James, who
absolutely refused to accept it.)

So, since Dave is in agreement with me, I'll be handling this issue as
per the interface designer's (Dave's) suggestions which tie up directly
with my understanding of what should be done in the first place.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 12:24                       ` Miklos Szeredi
  2006-12-31 17:37                         ` Russell King
@ 2006-12-31 20:40                         ` David Miller
  2006-12-31 20:58                           ` Linus Torvalds
  1 sibling, 1 reply; 57+ messages in thread
From: David Miller @ 2006-12-31 20:40 UTC (permalink / raw)
  To: miklos; +Cc: rmk+lkml, arjan, torvalds, linux-kernel, linux-arch, akpm

From: Miklos Szeredi <miklos@szeredi.hu>
Date: Sun, 31 Dec 2006 13:24:53 +0100

> > I'm willing to do that - and I guess this means we can probably do this
> > instead of walking the list of VMAs for the shared mapping, thereby
> > hitting both anonymous and shared mappings with the same code?
> 
> But for the get_user_pages() case there's no point, is there?  The VMA
> and the virtual address is already available, so trying to find it
> again through RMAP doesn't much make sense.
> 
> Users of get_user_pages() don't care about any other mappings (maybe
> ptrace does, I don't know) only about one single user mapping and one
> kernel mapping.
> 
> So using flush_dcache_page() there is an overkill, trying to teach it
> about anonymous pages is not the real solution, flush_dcache_page()
> was never meant to be used on anything but file mapped pages.

Even in the ptrace() case, you do want to flush all the other VMA's
that might be out there with an aliased cached copy in the cpu cache.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 20:40                         ` David Miller
@ 2006-12-31 20:58                           ` Linus Torvalds
  2006-12-31 21:12                             ` David Miller
  0 siblings, 1 reply; 57+ messages in thread
From: Linus Torvalds @ 2006-12-31 20:58 UTC (permalink / raw)
  To: David Miller; +Cc: miklos, rmk+lkml, arjan, linux-kernel, linux-arch, akpm



On Sun, 31 Dec 2006, David Miller wrote:
>
> Even in the ptrace() case, you do want to flush all the other VMA's
> that might be out there with an aliased cached copy in the cpu cache.

I don't think that's necessarily true.

If the same page is cached differently (and virtually) in multiple 
different places, the end result is random _anyway_. The only thing you 
care about for ptrace is what the process YOU ARE TRACING sees. If 
somebody else has something else in their cachelines, that's utterly 
uninteresting to ptrace, imnsho.

So there really is two different cases here:

 - flush the cache as seen by A PARTICULAR virtual mapping.

   This is ptrace, but it's other things like "unmap page from this VM" 
   too.

 - flush the cache for all possible virtual mappings - simply because we 
   don't even know who has it mapped dirty. 

   And the thing is, the more I think about it, the more I end up 
   wondering:

   I'm not even sure how valid this is. Whatever path needs to do this is 
   likely doing something wrong anyway. If there are multiple possible 
   sources of cache conflicts, the thing is a disaster and the end result 
   depends on our ordering anyway, so I'd argue that it is just about as 
   correct to flush as it is to NOT flush.

So I have this nagging suspicion that "flush_dcache_page()" is always a 
bug when it is about "virtual caches". It should NEVER flush any virtual 
caches, since that whole operations is by necessity something where you 
should be talking about _which_ virtual cache you should flush.

So "flush_dcache_page()" is - I think - more validtly thought about as 
just DMA coherency (in a system where DMA does not participate in 
_physical_ cache coherency). Not about virtual caches at all.

(Or, possibly, we could specify that it does a _particular_ virtual cache 
flush, namely the "kernel mapping" for that page, and nothing else. So if 
you have done a "write()" system call, and actually updated a page cache 
page, _then_ it makes sense to flush the KERNEL virtual mapping to memory: 
you could think of that as the "physical" cache case).

Hmm? What say you?

		Linus

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 20:58                           ` Linus Torvalds
@ 2006-12-31 21:12                             ` David Miller
  2007-01-01 16:44                               ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: David Miller @ 2006-12-31 21:12 UTC (permalink / raw)
  To: torvalds; +Cc: miklos, rmk+lkml, arjan, linux-kernel, linux-arch, akpm

From: Linus Torvalds <torvalds@osdl.org>
Date: Sun, 31 Dec 2006 12:58:45 -0800 (PST)

> So there really is two different cases here:
> 
>  - flush the cache as seen by A PARTICULAR virtual mapping.
> 
>    This is ptrace, but it's other things like "unmap page from this VM" 
>    too.
> 
>  - flush the cache for all possible virtual mappings - simply because we 
>    don't even know who has it mapped dirty. 
> 
>    And the thing is, the more I think about it, the more I end up 
>    wondering:
> 
>    I'm not even sure how valid this is. Whatever path needs to do this is 
>    likely doing something wrong anyway. If there are multiple possible 
>    sources of cache conflicts, the thing is a disaster and the end result 
>    depends on our ordering anyway, so I'd argue that it is just about as 
>    correct to flush as it is to NOT flush.
> 
> So I have this nagging suspicion that "flush_dcache_page()" is always a 
> bug when it is about "virtual caches". It should NEVER flush any virtual 
> caches, since that whole operations is by necessity something where you 
> should be talking about _which_ virtual cache you should flush.

It's the aliasing between the _1_ valid user mapping and the kernel's
virtual mapping, that's the problem and that's very valid and that's
why we have flush_dcache_page() to begin with.

> So "flush_dcache_page()" is - I think - more validtly thought about as 
> just DMA coherency (in a system where DMA does not participate in 
> _physical_ cache coherency). Not about virtual caches at all.

And I guess that's what you're trying to say here.

I'm beginning to think that Ralf Baechle had the best idea here,
where he recently made it such that platforms could override
kmap() and friends even on non-HIGHMEM configurations.

In theory it's the perfect interface to handle this problem,
you flush exactly where the physical page is made visible to
the kernel for a cpu load/store.  All the locations where that
happens are perfectly annotated already with kmap() calls.

So then there are two ways to touch user mapped pages:

1) Inside of a kmap()/kunmap() region.

2) Via copy_user_page()/clear_user_page()

The only core requirement is that the interfaces know the
virtual address the thing is mapped at, and after Ralf's
changes both #1 and #2 do have this information.

Using kmap() even takes care of the PIO "dma" cases where
the CPU reads/writes to the buffer for the data transfer.

Furthermore, an implementation of #1 and #2 can avoid
cache flushing altogether.  Just like for HIGHMEM you
have a kmap() TLB mapping area that sets up a mapping at
the correct alias, and returns that pointer from kmap().
Since the alias is good, no cache flush is needed to access
the page in kernel space.

In fact I think this is what Ralf's implementation on MIPS is doing.
And, this is the scheme we use on sparc64 for {copy,clear}_user_page().

Now, going in the opposite direction (kernel page made visible to
userspace for the first time, so you have to kick out the kernel side
mapping from the cache) can be handled either at set_pte_at() time
(this is what sparc64 does) or in update_mmu_cache().

This leaves only one (arguably broken) case of, as you mention, the
user using MAP_SHARED at a set of several incompatible aliases.  As
far as I can see, the only sane thing to do in that situation seems to
be to mark the thing non-virtually-cacheable in the user mapping PTEs
if the cpu architecture allows that.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-30 18:26         ` Linus Torvalds
  2006-12-30 22:46           ` Russell King
@ 2007-01-01 14:35           ` James Bottomley
  2007-01-01 16:21             ` Russell King
  1 sibling, 1 reply; 57+ messages in thread
From: James Bottomley @ 2007-01-01 14:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Russell King, Miklos Szeredi, linux-kernel, linux-arch, Andrew Morton

On Sat, 2006-12-30 at 10:26 -0800, Linus Torvalds wrote:
> 
> On Sat, 30 Dec 2006, Russell King wrote:
> > 
> > And here's the flush_anon_page() part.

This looks fine to me (if you need my ack).

> > Add flush_anon_page() for ARM, to avoid data corruption issues when using
> > fuse or other subsystems using get_user_pages().
> 
> Btw, since this doesn't actually change any code for anybody but ARM, just 
> adds a parameter that is obviously unused by everybody else, and if it 
> actually fixes a real bug for ARM, I'll obviously happily take it even 
> before 2.6.20. So go ahead put it in your ARM tree, and we'll get some 
> testing through that. And just ask me to pull at some point.
> 
> I wonder why nobody else seems to have a "flush_anon_page()"? This would 
> seem to be a potential issue for architectures like sparc too.. Although 
> maybe sparc can do a flush by physical index with "flush_dcache_page()".

The sparc handling of anonymous pages is different ... they accitentally
sweep them up in flush_dcache_page().  When I audited the architectures
to try to make fuse work on parisc, parisc and arm were the only ones
that actually needed flush_anon_page().

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-21 16:57   ` Russell King
  2006-12-21 17:51     ` Miklos Szeredi
@ 2007-01-01 15:04     ` James Bottomley
  1 sibling, 0 replies; 57+ messages in thread
From: James Bottomley @ 2007-01-01 15:04 UTC (permalink / raw)
  To: Russell King; +Cc: Miklos Szeredi, linux-kernel, linux-arch

On Thu, 2006-12-21 at 16:57 +0000, Russell King wrote:
> I'm not entirely convinced that it can be replaced.  What if the page
> is in the page cache and is shared with other processes?  That quite
> clearly falls under flush_dcache_page()'s remit.

Actually, it should work.  flush_dcache_page() is the big hammer, it
clears the caches of all the user processes and the kernel for that
page.  On most architectures this is not that expensive, but on parisc
it is.  Most of the paths out of fuse have already called
flush_dcache_page() coming in to make the page coherent for the kernel
to use. Then fuse writes data to the page. There's no point calling it
again since there shouldn't be anything in the user cache (the model has
the kernel owning the page), so flush_kernel_dcache_page() is the
shortcut to simply flush out the kernel cache above the page (knowing
the users still have the page uncached).

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-01 14:35           ` James Bottomley
@ 2007-01-01 16:21             ` Russell King
  0 siblings, 0 replies; 57+ messages in thread
From: Russell King @ 2007-01-01 16:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linus Torvalds, Miklos Szeredi, linux-kernel, linux-arch, Andrew Morton

On Mon, Jan 01, 2007 at 09:35:17AM -0500, James Bottomley wrote:
> On Sat, 2006-12-30 at 10:26 -0800, Linus Torvalds wrote:
> > 
> > On Sat, 30 Dec 2006, Russell King wrote:
> > > 
> > > And here's the flush_anon_page() part.
> 
> This looks fine to me (if you need my ack).
> 
> > > Add flush_anon_page() for ARM, to avoid data corruption issues when using
> > > fuse or other subsystems using get_user_pages().
> > 
> > Btw, since this doesn't actually change any code for anybody but ARM, just 
> > adds a parameter that is obviously unused by everybody else, and if it 
> > actually fixes a real bug for ARM, I'll obviously happily take it even 
> > before 2.6.20. So go ahead put it in your ARM tree, and we'll get some 
> > testing through that. And just ask me to pull at some point.
> > 
> > I wonder why nobody else seems to have a "flush_anon_page()"? This would 
> > seem to be a potential issue for architectures like sparc too.. Although 
> > maybe sparc can do a flush by physical index with "flush_dcache_page()".
> 
> The sparc handling of anonymous pages is different ... they accitentally
> sweep them up in flush_dcache_page().  When I audited the architectures
> to try to make fuse work on parisc, parisc and arm were the only ones
> that actually needed flush_anon_page().

However, as David has said, anonymous pages _are_ supposed to be handled
by flush_dcache_page(), so I now have an (untested) implementation for
ARM which does this.  Therefore, I'm revoking the previous two patches.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 21:12                             ` David Miller
@ 2007-01-01 16:44                               ` James Bottomley
  2007-01-01 23:04                                 ` David Miller
  0 siblings, 1 reply; 57+ messages in thread
From: James Bottomley @ 2007-01-01 16:44 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, miklos, rmk+lkml, arjan, linux-kernel, linux-arch, akpm

On Sun, 2006-12-31 at 13:12 -0800, David Miller wrote:
> From: Linus Torvalds <torvalds@osdl.org>
> Date: Sun, 31 Dec 2006 12:58:45 -0800 (PST)
> 
> > So there really is two different cases here:
> > 
> >  - flush the cache as seen by A PARTICULAR virtual mapping.
> > 
> >    This is ptrace, but it's other things like "unmap page from this VM" 
> >    too.
> > 
> >  - flush the cache for all possible virtual mappings - simply because we 
> >    don't even know who has it mapped dirty. 
> > 
> >    And the thing is, the more I think about it, the more I end up 
> >    wondering:
> > 
> >    I'm not even sure how valid this is. Whatever path needs to do this is 
> >    likely doing something wrong anyway. If there are multiple possible 
> >    sources of cache conflicts, the thing is a disaster and the end result 
> >    depends on our ordering anyway, so I'd argue that it is just about as 
> >    correct to flush as it is to NOT flush.
> > 
> > So I have this nagging suspicion that "flush_dcache_page()" is always a 
> > bug when it is about "virtual caches". It should NEVER flush any virtual 
> > caches, since that whole operations is by necessity something where you 
> > should be talking about _which_ virtual cache you should flush.
> 
> It's the aliasing between the _1_ valid user mapping and the kernel's
> virtual mapping, that's the problem and that's very valid and that's
> why we have flush_dcache_page() to begin with.
> 
> > So "flush_dcache_page()" is - I think - more validtly thought about as 
> > just DMA coherency (in a system where DMA does not participate in 
> > _physical_ cache coherency). Not about virtual caches at all.
> 
> And I guess that's what you're trying to say here.
> 
> I'm beginning to think that Ralf Baechle had the best idea here,
> where he recently made it such that platforms could override
> kmap() and friends even on non-HIGHMEM configurations.
> 
> In theory it's the perfect interface to handle this problem,
> you flush exactly where the physical page is made visible to
> the kernel for a cpu load/store.  All the locations where that
> happens are perfectly annotated already with kmap() calls.

Actually, this was proposed here:

http://marc.theaimsgroup.com/?t=115409754100003

When I updated the interface to work for the combined VIPT/PIPT cache on
the latest pariscs.  However, there were no takers for the idea.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2006-12-31 17:37                         ` Russell King
@ 2007-01-01 22:15                           ` Miklos Szeredi
  2007-01-01 23:45                             ` Russell King
  0 siblings, 1 reply; 57+ messages in thread
From: Miklos Szeredi @ 2007-01-01 22:15 UTC (permalink / raw)
  To: rmk+lkml; +Cc: davem, arjan, torvalds, linux-kernel, linux-arch, akpm

> > > I'm willing to do that - and I guess this means we can probably do this
> > > instead of walking the list of VMAs for the shared mapping, thereby
> > > hitting both anonymous and shared mappings with the same code?
> > 
> > But for the get_user_pages() case there's no point, is there?  The VMA
> > and the virtual address is already available, so trying to find it
> > again through RMAP doesn't much make sense.
> > 
> > Users of get_user_pages() don't care about any other mappings (maybe
> > ptrace does, I don't know) only about one single user mapping and one
> > kernel mapping.
> > 
> > So using flush_dcache_page() there is an overkill, trying to teach it
> > about anonymous pages is not the real solution, flush_dcache_page()
> > was never meant to be used on anything but file mapped pages.
> 
> It's not actually.  For flush_anon_page() we currently have to flush the
> user mapping and the kernel mapping.  For flush_dcache_page(), it's
> exactly the same - we have to flush the kernel mapping and the user
> mapping.

I was never advocating flush_anon_page().  I was suggesting a _new_
cache operation:

   flush_kernel_user_page(page, vma, virt_addr)

which flushes the kernel mapping and the given user mapping.  Just
like flush_dcache_page() but without needing to find the user
mapping(s).

However the cache flushing in kmap/kunmap idea might be cleaner and
better.

Miklos

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
       [not found]                     ` <1167669252.5302.57.camel@mulgrave.il.steeleye.com>
@ 2007-01-01 23:01                       ` David Miller
  2007-01-01 23:17                         ` Russell King
  0 siblings, 1 reply; 57+ messages in thread
From: David Miller @ 2007-01-01 23:01 UTC (permalink / raw)
  To: James.Bottomley
  Cc: arjan, rmk+lkml, torvalds, miklos, linux-kernel, linux-arch, akpm

From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Mon, 01 Jan 2007 10:34:12 -0600

> Erm, well the whole reason for the flush_anon_pages() was that you told
> me not to do it in flush_dcache_page() ...
> 
> Although this is perhaps part of the confusion over what
> flush_dcache_page() is actually supposed to do.

I completely agree, it's confusing.  I've tried to make it
"just a hook" where architectures do whatever is necessary
at that point to synchronize things.  It's a poor definition
and gives the implementor not much more than a rope with which
to hang themselves :-)

That's why I'm thinking strongly about perhaps encouraging
people to go the kmap() route.  It would avoid all the flushing
in exchange for some specialized TLB accesses.  If the flushes
are really expensive, and the TLB operations to setup/teardown
the kmap()'s can be relatively cheap, it might be the thing to
do on PARISC.

More and more I like Ralf's kmap() approach because you only
do things exactly where the kernel actually touches the page.
And if it would really help in some way, we can even tag the
kmap() calls with "KMAP_READ" or "KMAP_WRITE" type attributes
as appropriate.  Because let's say you don't want to do the
TLB mapping thing, and still want to actually cache flush,
then this hint about the access could guide what kind of flush
you do.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-01 16:44                               ` James Bottomley
@ 2007-01-01 23:04                                 ` David Miller
  2007-01-01 23:23                                   ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: David Miller @ 2007-01-01 23:04 UTC (permalink / raw)
  To: James.Bottomley
  Cc: torvalds, miklos, rmk+lkml, arjan, linux-kernel, linux-arch, akpm

From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Mon, 01 Jan 2007 10:44:36 -0600

> Actually, this was proposed here:
> 
> http://marc.theaimsgroup.com/?t=115409754100003
> 
> When I updated the interface to work for the combined VIPT/PIPT cache on
> the latest pariscs.  However, there were no takers for the idea.


I thought this was accepted and Ralf is using it on MIPS?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-01 23:01                       ` David Miller
@ 2007-01-01 23:17                         ` Russell King
  0 siblings, 0 replies; 57+ messages in thread
From: Russell King @ 2007-01-01 23:17 UTC (permalink / raw)
  To: David Miller
  Cc: James.Bottomley, arjan, torvalds, miklos, linux-kernel, linux-arch, akpm

On Mon, Jan 01, 2007 at 03:01:52PM -0800, David Miller wrote:
> From: James Bottomley <James.Bottomley@SteelEye.com>
> Date: Mon, 01 Jan 2007 10:34:12 -0600
> 
> > Erm, well the whole reason for the flush_anon_pages() was that you told
> > me not to do it in flush_dcache_page() ...
> > 
> > Although this is perhaps part of the confusion over what
> > flush_dcache_page() is actually supposed to do.
> 
> I completely agree, it's confusing.  I've tried to make it
> "just a hook" where architectures do whatever is necessary
> at that point to synchronize things.  It's a poor definition
> and gives the implementor not much more than a rope with which
> to hang themselves :-)
> 
> That's why I'm thinking strongly about perhaps encouraging
> people to go the kmap() route.  It would avoid all the flushing
> in exchange for some specialized TLB accesses.  If the flushes
> are really expensive, and the TLB operations to setup/teardown
> the kmap()'s can be relatively cheap, it might be the thing to
> do on PARISC.

This all sounds wonderful, and also means that if/when ARM starts
implementing highmem, kmap becomes useful.  (When I looked at that
a while back, adding the necessary flushes where required would mean
that we ended up doing a lot of flushing all over the place.)

> More and more I like Ralf's kmap() approach because you only
> do things exactly where the kernel actually touches the page.
> And if it would really help in some way, we can even tag the
> kmap() calls with "KMAP_READ" or "KMAP_WRITE" type attributes
> as appropriate.  Because let's say you don't want to do the
> TLB mapping thing, and still want to actually cache flush,
> then this hint about the access could guide what kind of flush
> you do.

You'd still want to do the flushing on ARM because it's mostly VIVT.
Remapping pages with VIVT would just makes things much worse.

So yes, this sounds like a great idea.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-01 23:04                                 ` David Miller
@ 2007-01-01 23:23                                   ` James Bottomley
  0 siblings, 0 replies; 57+ messages in thread
From: James Bottomley @ 2007-01-01 23:23 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, miklos, rmk+lkml, arjan, linux-kernel, linux-arch, akpm

On Mon, 2007-01-01 at 15:04 -0800, David Miller wrote:
> I thought this was accepted and Ralf is using it on MIPS?

It partially is ... we're using it on parisc as well, but only as a
supplement to the current linux flushing APIs.  There's still no
guarantee in the standard linux API that 

kmap();  do something; kunmap();

leaves everything correctly coherent.

I have some code for parisc that implements the strong coherency around
kmap/kunmap.  What we do is actually use the page access flags to tell
us if the page was altered by the "do something" code and thus flush
only where necessary (i.e. our prototype was hint free).  If everyone's
happy with this, we can kill off flush_kernel_dcache_page() and a few
other flush_dcache_pages() around kmaps and just let the kmap API handle
it all.  Unfortunately, this still won't solve the anonymous page
problem.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-01 22:15                           ` Miklos Szeredi
@ 2007-01-01 23:45                             ` Russell King
  2007-01-02 19:40                               ` Dan Williams
  2007-01-02 22:53                               ` James Bottomley
  0 siblings, 2 replies; 57+ messages in thread
From: Russell King @ 2007-01-01 23:45 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: davem, arjan, torvalds, linux-kernel, linux-arch, akpm

On Mon, Jan 01, 2007 at 11:15:04PM +0100, Miklos Szeredi wrote:
> > > > I'm willing to do that - and I guess this means we can probably do this
> > > > instead of walking the list of VMAs for the shared mapping, thereby
> > > > hitting both anonymous and shared mappings with the same code?
> > > 
> > > But for the get_user_pages() case there's no point, is there?  The VMA
> > > and the virtual address is already available, so trying to find it
> > > again through RMAP doesn't much make sense.
> > > 
> > > Users of get_user_pages() don't care about any other mappings (maybe
> > > ptrace does, I don't know) only about one single user mapping and one
> > > kernel mapping.
> > > 
> > > So using flush_dcache_page() there is an overkill, trying to teach it
> > > about anonymous pages is not the real solution, flush_dcache_page()
> > > was never meant to be used on anything but file mapped pages.
> > 
> > It's not actually.  For flush_anon_page() we currently have to flush the
> > user mapping and the kernel mapping.  For flush_dcache_page(), it's
> > exactly the same - we have to flush the kernel mapping and the user
> > mapping.
> 
> I was never advocating flush_anon_page().  I was suggesting a _new_
> cache operation:
> 
>    flush_kernel_user_page(page, vma, virt_addr)
> 
> which flushes the kernel mapping and the given user mapping.  Just
> like flush_dcache_page() but without needing to find the user
> mapping(s).

There's a problem with defining cache coherency macros to that extent.
You take away flexibility to efficiently implement them on various
platforms which you didn't think about (eg, in the above case it's
perfectly fine for VIVT, but not really VIPT.)

> However the cache flushing in kmap/kunmap idea might be cleaner and
> better.

It has the significant advantage that, unlike the flush* calls, they
can't really be forgotten by folk programming on cache alias-free
hardware.  That's a _very_ persuasive argument for this proposed
interface.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-01 23:45                             ` Russell King
@ 2007-01-02 19:40                               ` Dan Williams
  2007-01-02 22:53                               ` James Bottomley
  1 sibling, 0 replies; 57+ messages in thread
From: Dan Williams @ 2007-01-02 19:40 UTC (permalink / raw)
  To: Miklos Szeredi, davem, arjan, torvalds, linux-kernel, linux-arch,
	akpm, James.Bottomley

On 1/1/07, Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> On Mon, Jan 01, 2007 at 11:15:04PM +0100, Miklos Szeredi wrote:
> > > > > I'm willing to do that - and I guess this means we can probably do this
> > > > > instead of walking the list of VMAs for the shared mapping, thereby
> > > > > hitting both anonymous and shared mappings with the same code?
> > > >
> > > > But for the get_user_pages() case there's no point, is there?  The VMA
> > > > and the virtual address is already available, so trying to find it
> > > > again through RMAP doesn't much make sense.
> > > >
> > > > Users of get_user_pages() don't care about any other mappings (maybe
> > > > ptrace does, I don't know) only about one single user mapping and one
> > > > kernel mapping.
> > > >
> > > > So using flush_dcache_page() there is an overkill, trying to teach it
> > > > about anonymous pages is not the real solution, flush_dcache_page()
> > > > was never meant to be used on anything but file mapped pages.
> > >
> > > It's not actually.  For flush_anon_page() we currently have to flush the
> > > user mapping and the kernel mapping.  For flush_dcache_page(), it's
> > > exactly the same - we have to flush the kernel mapping and the user
> > > mapping.
> >
> > I was never advocating flush_anon_page().  I was suggesting a _new_
> > cache operation:
> >
> >    flush_kernel_user_page(page, vma, virt_addr)
> >
> > which flushes the kernel mapping and the given user mapping.  Just
> > like flush_dcache_page() but without needing to find the user
> > mapping(s).
>
> There's a problem with defining cache coherency macros to that extent.
> You take away flexibility to efficiently implement them on various
> platforms which you didn't think about (eg, in the above case it's
> perfectly fine for VIVT, but not really VIPT.)
>
> > However the cache flushing in kmap/kunmap idea might be cleaner and
> > better.
>
> It has the significant advantage that, unlike the flush* calls, they
> can't really be forgotten by folk programming on cache alias-free
> hardware.  That's a _very_ persuasive argument for this proposed
> interface.
>
If the interface is going to change it seems like an opportunity to
add some helper routines for the dma to user case?  Russell
illuminated some of the mines on the field in this thread:
http://marc.theaimsgroup.com/?l=linux-arm-kernel&m=116669504732601&w=2,
but I think the concerns can be addressed.

I think it is important to note that I am talking specifically about
the offloaded memcpy (i.e. NET_DMA) case and not the capability to
mmap a general purpose dma region.  Since the dma transfer is kernel
initiated (versus device initiated) the modified dma api could
properly manage cache coherency.

I was at a loss at how to handle the MAP_SHARED case, but if those
mappings are non-virtually cacheable by default, as DavidM suggested,
then that particular problem goes away.

Now it sounds like the data corruption problem I ran into on ARM will
go away with the anonymous memory handling changes to the
get_user_pages path.  However, get_user_pages is overkill for the dma
memcpy case since it can optimize for transfer size and direction.

A rough sketch of the interface would be something like the following
struct dma_addr_user {
   dma_addr_t dma;
   void *user;
};
struct dma_addr_user *dma_map_single_user(struct device *dev, void
*kernel_addr, size_t size, enum dma_data_direction dir, void
*user_addr)

Comments?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-01 23:45                             ` Russell King
  2007-01-02 19:40                               ` Dan Williams
@ 2007-01-02 22:53                               ` James Bottomley
  2007-01-02 23:19                                 ` David Miller
  1 sibling, 1 reply; 57+ messages in thread
From: James Bottomley @ 2007-01-02 22:53 UTC (permalink / raw)
  To: Russell King
  Cc: Miklos Szeredi, davem, arjan, torvalds, linux-kernel, linux-arch, akpm

On Mon, 2007-01-01 at 23:45 +0000, Russell King wrote:
> > However the cache flushing in kmap/kunmap idea might be cleaner and
> > better.
> 
> It has the significant advantage that, unlike the flush* calls, they
> can't really be forgotten by folk programming on cache alias-free
> hardware.  That's a _very_ persuasive argument for this proposed
> interface.

OK, so lets get down to brass tacks and look at the API characteristics.

Some of the issues are:

     1. Should kmap() actually flush all the user spaces? 
     2. Do we need additional hints in to kmap/kunmap?

My initial thought on 1. is no, since by and large we use kmap on pages
that have come to use via an I/O path, so usually they've already had
the user caches made coherent, unless you want do do this via a hint.

For 2. like I said, I coded this on parisc without hints (using the page
table information instead to deduce what type of access the page had
taken) but we could equally well have used hints.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-02 22:53                               ` James Bottomley
@ 2007-01-02 23:19                                 ` David Miller
  2007-01-02 23:34                                   ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: David Miller @ 2007-01-02 23:19 UTC (permalink / raw)
  To: James.Bottomley
  Cc: rmk+lkml, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Tue, 02 Jan 2007 16:53:23 -0600

> OK, so lets get down to brass tacks and look at the API characteristics.
> 
> Some of the issues are:
> 
>      1. Should kmap() actually flush all the user spaces? 
>      2. Do we need additional hints in to kmap/kunmap?
> 
> My initial thought on 1. is no, since by and large we use kmap on pages
> that have come to use via an I/O path, so usually they've already had
> the user caches made coherent, unless you want do do this via a hint.

Who did this cache flush?  The idea is to make flush_dcache_page()
do nothing, and have all of the work be done via the kmap()/kunmap()
sequence.

The case you're speaking about isn't a fault path, so it's not
like there will be a flush_cache_page() call somewhere either.

> For 2. like I said, I coded this on parisc without hints (using the page
> table information instead to deduce what type of access the page had
> taken) but we could equally well have used hints.

There are two important attributes 1) most permissive page protection
of the user mappings, essentially does anyone have a writable access
to the page and 2) the access the kernel will make, read or write.

If the kernel is going to read, kmap() would need to flush any
writable user mappings, that's all.

If the kernel is going to write, all user mappings have to be
purged completely so that the writes are visible.

I'm going to coneniently remind you it's possible to avoid all of the
cache flushes by using TLB mappings at kmap() time.  I think PARISC
can even do this, to be honest.  What makes that scheme unusable on
PARISC?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-02 23:19                                 ` David Miller
@ 2007-01-02 23:34                                   ` James Bottomley
  2007-01-03  0:20                                     ` David Miller
  0 siblings, 1 reply; 57+ messages in thread
From: James Bottomley @ 2007-01-02 23:34 UTC (permalink / raw)
  To: David Miller
  Cc: rmk+lkml, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

On Tue, 2007-01-02 at 15:19 -0800, David Miller wrote:
> From: James Bottomley <James.Bottomley@SteelEye.com>
> Date: Tue, 02 Jan 2007 16:53:23 -0600
> 
> > OK, so lets get down to brass tacks and look at the API characteristics.
> > 
> > Some of the issues are:
> > 
> >      1. Should kmap() actually flush all the user spaces? 
> >      2. Do we need additional hints in to kmap/kunmap?
> > 
> > My initial thought on 1. is no, since by and large we use kmap on pages
> > that have come to use via an I/O path, so usually they've already had
> > the user caches made coherent, unless you want do do this via a hint.
> 
> Who did this cache flush?  The idea is to make flush_dcache_page()
> do nothing, and have all of the work be done via the kmap()/kunmap()
> sequence.

Erm ... for a device driver, if we're preparing to do I/O on the page
something must have made the user caches coherent ... that can't be
kmap, because the driver might elect to DMA on the page ... unless
another component of this API is going to be to make dma_map_... also
flush the user cache?

> The case you're speaking about isn't a fault path, so it's not
> like there will be a flush_cache_page() call somewhere either.

well ... currently the pages come down to fuse via get_user_pages()
which does have a flush_dcache_page() as part of its operation.

> > For 2. like I said, I coded this on parisc without hints (using the page
> > table information instead to deduce what type of access the page had
> > taken) but we could equally well have used hints.
> 
> There are two important attributes 1) most permissive page protection
> of the user mappings, essentially does anyone have a writable access
> to the page and 2) the access the kernel will make, read or write.
> 
> If the kernel is going to read, kmap() would need to flush any
> writable user mappings, that's all.

Agreed.

> If the kernel is going to write, all user mappings have to be
> purged completely so that the writes are visible.

Agreed.

> I'm going to coneniently remind you it's possible to avoid all of the
> cache flushes by using TLB mappings at kmap() time.  I think PARISC
> can even do this, to be honest.  What makes that scheme unusable on
> PARISC?

You mean to map congruently for the page using something like our
temporary cache alias scheme?  Yes ... it should be possible for the
short lived kmap_atomic() type maps ... it will be a bit harder for the
non-atomic longer lived maps ... but I think we're deprecating those.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-02 23:34                                   ` James Bottomley
@ 2007-01-03  0:20                                     ` David Miller
  2007-01-03 14:16                                       ` Russell King
  0 siblings, 1 reply; 57+ messages in thread
From: David Miller @ 2007-01-03  0:20 UTC (permalink / raw)
  To: James.Bottomley
  Cc: rmk+lkml, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

From: James Bottomley <James.Bottomley@SteelEye.com>
Date: Tue, 02 Jan 2007 17:34:18 -0600

> Erm ... for a device driver, if we're preparing to do I/O on the page
> something must have made the user caches coherent ... that can't be
> kmap, because the driver might elect to DMA on the page ... unless
> another component of this API is going to be to make dma_map_... also
> flush the user cache?

The DMA map/unmap/sync performs the necessary cache flushes.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-03  0:20                                     ` David Miller
@ 2007-01-03 14:16                                       ` Russell King
  2007-01-03 15:00                                         ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: Russell King @ 2007-01-03 14:16 UTC (permalink / raw)
  To: David Miller
  Cc: James.Bottomley, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

On Tue, Jan 02, 2007 at 04:20:58PM -0800, David Miller wrote:
> From: James Bottomley <James.Bottomley@SteelEye.com>
> Date: Tue, 02 Jan 2007 17:34:18 -0600
> 
> > Erm ... for a device driver, if we're preparing to do I/O on the page
> > something must have made the user caches coherent ... that can't be
> > kmap, because the driver might elect to DMA on the page ... unless
> > another component of this API is going to be to make dma_map_... also
> > flush the user cache?
> 
> The DMA map/unmap/sync performs the necessary cache flushes.

Even on ARM, we doesn't do that - we rely on flush_dcache_page() to catch
those where necessary.

The locking to walk the shared VMA list is already rather disgusting.
Trying to do that in IRQ context is not going to be fun.  The locking
to walk the anon_vma list is not public outside of mm/rmap.c - in my
ARM flushing-anonymous-caches v2 patch I'm duplicating a couple of
functions from mm/rmap.c (page_lock_anon_vma() and vma_address()) to
be able to sanely walk the anonymous vma list.

I'd rather avoid fiddling around in VMA lists from within the DMA API;
the DMA API implementation is already fairly complex on ARM what with
this silly DMA bounce buffer implementation (which exists because the
Linux VM is unable to properly honor dma_masks).  (Incidentally, I
don't like this, but because of that problem it's required to get
various ARM platforms working.)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-03 14:16                                       ` Russell King
@ 2007-01-03 15:00                                         ` James Bottomley
  2007-01-03 15:09                                           ` Russell King
  0 siblings, 1 reply; 57+ messages in thread
From: James Bottomley @ 2007-01-03 15:00 UTC (permalink / raw)
  To: Russell King
  Cc: David Miller, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

On Wed, 2007-01-03 at 14:16 +0000, Russell King wrote:
> On Tue, Jan 02, 2007 at 04:20:58PM -0800, David Miller wrote:
> > From: James Bottomley <James.Bottomley@SteelEye.com>
> > Date: Tue, 02 Jan 2007 17:34:18 -0600
> > 
> > > Erm ... for a device driver, if we're preparing to do I/O on the page
> > > something must have made the user caches coherent ... that can't be
> > > kmap, because the driver might elect to DMA on the page ... unless
> > > another component of this API is going to be to make dma_map_... also
> > > flush the user cache?
> > 
> > The DMA map/unmap/sync performs the necessary cache flushes.
> 
> Even on ARM, we doesn't do that - we rely on flush_dcache_page() to catch
> those where necessary.
> 
> The locking to walk the shared VMA list is already rather disgusting.
> Trying to do that in IRQ context is not going to be fun.  The locking
> to walk the anon_vma list is not public outside of mm/rmap.c - in my
> ARM flushing-anonymous-caches v2 patch I'm duplicating a couple of
> functions from mm/rmap.c (page_lock_anon_vma() and vma_address()) to
> be able to sanely walk the anonymous vma list.
> 
> I'd rather avoid fiddling around in VMA lists from within the DMA API;
> the DMA API implementation is already fairly complex on ARM what with
> this silly DMA bounce buffer implementation (which exists because the
> Linux VM is unable to properly honor dma_masks).  (Incidentally, I
> don't like this, but because of that problem it's required to get
> various ARM platforms working.)

Yes, this mirrors my concern on parisc too.  We had awful problems with
the locking in flush_dcache_page() as well.

However, I was wondering if there might be a different way around this.
We can't really walk all the user mappings because of the locks, but
could we store the user flush hints in the page (or a related
structure)?  On parisc we don't care about the process id (called space
in our architecture) because we've turned off the pieces of the cache
that match on space id.  Thus, all we care about is flushing with the
physical address and virtual address (and only about 10 bits of this are
significant for matching).  We go to great lengths to ensure that every
mapping in user space all has the same 10 bits of virtual address, so if
we just new what they were we could flush the whole of the user spaces
for the page without having to walk any VMA lists.  Could arm do this as
well?

James


James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-03 15:00                                         ` James Bottomley
@ 2007-01-03 15:09                                           ` Russell King
  2007-01-07 16:09                                             ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: Russell King @ 2007-01-03 15:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Miller, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

On Wed, Jan 03, 2007 at 09:00:58AM -0600, James Bottomley wrote:
> However, I was wondering if there might be a different way around this.
> We can't really walk all the user mappings because of the locks, but
> could we store the user flush hints in the page (or a related
> structure)?  On parisc we don't care about the process id (called space
> in our architecture) because we've turned off the pieces of the cache
> that match on space id.  Thus, all we care about is flushing with the
> physical address and virtual address (and only about 10 bits of this are
> significant for matching).  We go to great lengths to ensure that every
> mapping in user space all has the same 10 bits of virtual address, so if
> we just new what they were we could flush the whole of the user spaces
> for the page without having to walk any VMA lists.  Could arm do this as
> well?

I don't think so.  The organisation of the VIVT caches in terms of
how the set index and tag correspond with virtual addresses are hardly
ever documented.  When they are, they don't appear to lend themselves
to such an approach.  For example,  Xscale has:

 tag:       virtual address b31-10
 set index: b9-5

and there's 32 ways per set.  So there's nothing to be gained from
controlling the virtual address which individual mappings end up at
in this case.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-03 15:09                                           ` Russell King
@ 2007-01-07 16:09                                             ` James Bottomley
  2007-01-07 16:30                                               ` Russell King
  0 siblings, 1 reply; 57+ messages in thread
From: James Bottomley @ 2007-01-07 16:09 UTC (permalink / raw)
  To: Russell King
  Cc: David Miller, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

On Wed, 2007-01-03 at 15:09 +0000, Russell King wrote:
> On Wed, Jan 03, 2007 at 09:00:58AM -0600, James Bottomley wrote:
> > However, I was wondering if there might be a different way around this.
> > We can't really walk all the user mappings because of the locks, but
> > could we store the user flush hints in the page (or a related
> > structure)?  On parisc we don't care about the process id (called space
> > in our architecture) because we've turned off the pieces of the cache
> > that match on space id.  Thus, all we care about is flushing with the
> > physical address and virtual address (and only about 10 bits of this are
> > significant for matching).  We go to great lengths to ensure that every
> > mapping in user space all has the same 10 bits of virtual address, so if
> > we just new what they were we could flush the whole of the user spaces
> > for the page without having to walk any VMA lists.  Could arm do this as
> > well?
> 
> I don't think so.  The organisation of the VIVT caches in terms of
> how the set index and tag correspond with virtual addresses are hardly
> ever documented.  When they are, they don't appear to lend themselves
> to such an approach.  For example,  Xscale has:
> 
>  tag:       virtual address b31-10
>  set index: b9-5
> 
> and there's 32 ways per set.  So there's nothing to be gained from
> controlling the virtual address which individual mappings end up at
> in this case.

OK, so the bottom line we seem to have reached is that we can't manage
the user coherency in the DMA API.  Does this also mean you can't do it
for non-DMA cases (kmap_atomic would seem to be a likely problem)? in
which case the only coherency kmap would control would be kernel
coherency?

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
  2007-01-07 16:09                                             ` James Bottomley
@ 2007-01-07 16:30                                               ` Russell King
  0 siblings, 0 replies; 57+ messages in thread
From: Russell King @ 2007-01-07 16:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Miller, miklos, arjan, torvalds, linux-kernel, linux-arch, akpm

On Sun, Jan 07, 2007 at 10:09:13AM -0600, James Bottomley wrote:
> On Wed, 2007-01-03 at 15:09 +0000, Russell King wrote:
> > On Wed, Jan 03, 2007 at 09:00:58AM -0600, James Bottomley wrote:
> > > However, I was wondering if there might be a different way around this.
> > > We can't really walk all the user mappings because of the locks, but
> > > could we store the user flush hints in the page (or a related
> > > structure)?  On parisc we don't care about the process id (called space
> > > in our architecture) because we've turned off the pieces of the cache
> > > that match on space id.  Thus, all we care about is flushing with the
> > > physical address and virtual address (and only about 10 bits of this are
> > > significant for matching).  We go to great lengths to ensure that every
> > > mapping in user space all has the same 10 bits of virtual address, so if
> > > we just new what they were we could flush the whole of the user spaces
> > > for the page without having to walk any VMA lists.  Could arm do this as
> > > well?
> > 
> > I don't think so.  The organisation of the VIVT caches in terms of
> > how the set index and tag correspond with virtual addresses are hardly
> > ever documented.  When they are, they don't appear to lend themselves
> > to such an approach.  For example,  Xscale has:
> > 
> >  tag:       virtual address b31-10
> >  set index: b9-5
> > 
> > and there's 32 ways per set.  So there's nothing to be gained from
> > controlling the virtual address which individual mappings end up at
> > in this case.
> 
> OK, so the bottom line we seem to have reached is that we can't manage
> the user coherency in the DMA API.  Does this also mean you can't do it
> for non-DMA cases (kmap_atomic would seem to be a likely problem)?

It will only work if we can walk the VMA lists associated with the
page from IRQ context.  By that I mean the address_space vma lists
as well as the anonymous memory list.

> in which case the only coherency kmap would control would be kernel
> coherency?

If that's all that kmap could do, it would solve the issues with PIO,
but not things like fuse and the other users of get_user_pages() with
the current context.  All those would remain potential sources of data
corruption.

My current attempt at solving this (following David's advice for anon
pages in flush_dcache_page()) is as follows (which involves duplicating
bits of mm/rmap.c) but I remain unconvinced that this is safe from all
contexts which flush_dcache_page() may be called from.

Given where we are in the -rc cycle, I feel that my original two patches
are far safer to solve the problems reported so far - especially as they've
been tested.  If someone can come up with a better way, then we can look
to implementing that instead once 2.6.20 has happened.

diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 628348c..e5830f6 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
+#include <linux/rmap.h>
 
 #include <asm/cacheflush.h>
 #include <asm/system.h>
@@ -117,6 +118,56 @@ void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 #define flush_pfn_alias(pfn,vaddr)	do { } while (0)
 #endif
 
+/*
+ * Copy of vma_address in mm/rmap.c... would be useful to have that non-static.
+ */
+static inline unsigned long
+arm_vma_address(struct page *page, struct vm_area_struct *vma)
+{
+	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	unsigned long address;
+
+	address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+	if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
+		/* page should be within any vma from prio_tree_next */
+		BUG_ON(!PageAnon(page));
+		return -EFAULT;
+	}
+	return address;
+}
+
+static void flush_user_mapped_page(struct vm_area_struct *vma, struct page *page)
+{
+	unsigned long address = arm_vma_address(page, vma);
+
+	if (address != -EFAULT)
+		flush_cache_page(vma, address, page_to_pfn(page));
+}
+
+static void __flush_anon_mapping(struct page *page)
+{
+	struct mm_struct *mm = current->active_mm;
+
+	rcu_read_lock();
+	if (page_mapped(page)) {
+		unsigned long anon_mapping = (unsigned long) page->mapping;
+		struct anon_vma *anon_vma;
+		struct vm_area_struct *vma;
+
+		anon_vma = (struct anon_vma *)(anon_mapping - PAGE_MAPPING_ANON);
+		spin_lock(&anon_vma->lock);
+		rcu_read_unlock();
+
+		list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
+			if (vma->vm_mm == mm)
+				flush_user_mapped_page(vma, page);
+		}
+		spin_unlock(&anon_vma->lock);
+	} else {
+		rcu_read_unlock();
+	}
+}
+
 void __flush_dcache_page(struct address_space *mapping, struct page *page)
 {
 	/*
@@ -141,20 +192,17 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 	struct mm_struct *mm = current->active_mm;
 	struct vm_area_struct *mpnt;
 	struct prio_tree_iter iter;
-	pgoff_t pgoff;
+	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 
 	/*
 	 * There are possible user space mappings of this page:
 	 * - VIVT cache: we need to also write back and invalidate all user
 	 *   data in the current VM view associated with this page.
 	 * - aliasing VIPT: we only need to find one mapping of this page.
+	 *   (we handle this separately)
 	 */
-	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-
 	flush_dcache_mmap_lock(mapping);
 	vma_prio_tree_foreach(mpnt, &iter, &mapping->i_mmap, pgoff, pgoff) {
-		unsigned long offset;
-
 		/*
 		 * If this VMA is not in our MM, we can ignore it.
 		 */
@@ -162,8 +210,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 			continue;
 		if (!(mpnt->vm_flags & VM_MAYSHARE))
 			continue;
-		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
-		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
+		flush_user_mapped_page(mpnt, page);
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
@@ -199,6 +246,8 @@ void flush_dcache_page(struct page *page)
 		__flush_dcache_page(mapping, page);
 		if (mapping && cache_is_vivt())
 			__flush_dcache_aliases(mapping, page);
+		if (PageAnon(page))
+			__flush_anon_mapping(page);
 	}
 }
 EXPORT_SYMBOL(flush_dcache_page);


-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply related	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2007-01-07 16:31 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-21 15:26 fuse, get_user_pages, flush_anon_page, aliasing caches and all that again Russell King
2006-12-21 15:53 ` Miklos Szeredi
2006-12-21 16:57   ` Russell King
2006-12-21 17:51     ` Miklos Szeredi
2006-12-21 21:04       ` Jan Engelhardt
2006-12-21 21:30         ` Miklos Szeredi
2007-01-01 15:04     ` James Bottomley
2006-12-21 17:17   ` Russell King
2006-12-21 17:55     ` Miklos Szeredi
2006-12-21 18:11       ` Russell King
2006-12-21 18:30         ` Miklos Szeredi
2006-12-21 18:55           ` Russell King
2006-12-21 19:05             ` Miklos Szeredi
2006-12-21 23:51               ` Randolph Chung
2006-12-22  8:43                 ` Russell King
2006-12-22 14:45                   ` Randolph Chung
2006-12-30 16:39     ` Russell King
2006-12-30 16:50       ` Russell King
2006-12-30 18:26         ` Linus Torvalds
2006-12-30 22:46           ` Russell King
2006-12-31  5:23             ` David Miller
2006-12-31  9:10               ` Miklos Szeredi
2006-12-31  9:45                 ` David Miller
2006-12-31  9:23               ` Russell King
2006-12-31  9:27                 ` Arjan van de Ven
2006-12-31  9:47                   ` David Miller
2006-12-31 10:00                     ` Russell King
2006-12-31 10:04                       ` David Miller
2006-12-31 12:24                       ` Miklos Szeredi
2006-12-31 17:37                         ` Russell King
2007-01-01 22:15                           ` Miklos Szeredi
2007-01-01 23:45                             ` Russell King
2007-01-02 19:40                               ` Dan Williams
2007-01-02 22:53                               ` James Bottomley
2007-01-02 23:19                                 ` David Miller
2007-01-02 23:34                                   ` James Bottomley
2007-01-03  0:20                                     ` David Miller
2007-01-03 14:16                                       ` Russell King
2007-01-03 15:00                                         ` James Bottomley
2007-01-03 15:09                                           ` Russell King
2007-01-07 16:09                                             ` James Bottomley
2007-01-07 16:30                                               ` Russell King
2006-12-31 20:40                         ` David Miller
2006-12-31 20:58                           ` Linus Torvalds
2006-12-31 21:12                             ` David Miller
2007-01-01 16:44                               ` James Bottomley
2007-01-01 23:04                                 ` David Miller
2007-01-01 23:23                                   ` James Bottomley
     [not found]                     ` <1167669252.5302.57.camel@mulgrave.il.steeleye.com>
2007-01-01 23:01                       ` David Miller
2007-01-01 23:17                         ` Russell King
2006-12-31  9:55                   ` Russell King
2006-12-31  9:46                 ` David Miller
2007-01-01 14:35           ` James Bottomley
2007-01-01 16:21             ` Russell King
2006-12-30 18:21       ` Linus Torvalds
2006-12-21 16:29 ` Arjan van de Ven
2006-12-21 17:35   ` Russell King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).