All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
@ 2009-09-08 18:27 James Bottomley
  2009-09-08 19:00 ` Russell King
  2009-10-13  1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig
  0 siblings, 2 replies; 20+ messages in thread
From: James Bottomley @ 2009-09-08 18:27 UTC (permalink / raw)
  To: Parisc List, Linux Filesystem Mailing List, linux-arch; +Cc: Christoph Hellwig

This bug was observed on parisc, but I would expect it to affect all
architectures with virtually indexed caches.

The inception of this problem is the changes we made to block and SCSI
to eliminate the special case path for kernel buffers.  This change
forced every I/O to go via the full scatter gather processing.  In this
way we thought we'd removed the restrictions about using vmalloc/vmap
areas for I/O from the kernel. XFS acually took advantage of this, hence
the problems.

Actually, if you look at the implementation of blk_rq_map_kern(), it
still won't accept vmalloc pages on most architectures because
virt_to_page() assumes an offset mapped page ... x86 actually has a bug
on for the vmalloc case if you enable DEBUG_VIRTUAL).  The only reason
xfs gets away with this is because it builds the vmalloc'd bio manually,
essentially open coding blk_rq_map_kern().

The problem comes because by the time we get to map scatter gather
lists, all we have is the page, we've lost the virtual address.  There's
a macro: sg_virt() which claims to recover the virtual address, but all
it really does is provide the offset map of the page physical address.
This means that sg_virt() returns a different address from the one the
page was actually used by if it's in a vmalloc/vmap area (because we
remapped the page within the kernel virtual address space).  This means
that for virtually indexed caches, we end up flushing the wrong page
alias ... and hence corrupting data because we do DMA with a possibly
dirty cache line set above the page.

The generic fix is simple:  flush the potentially dirty page along the
correct cache alias before feeding it into the block routines and losing
the alias address information.

The slight problem is that we don't have an API to handle this ...
flush_kernel_dcache_page() would be the correct one except that it only
takes a page as the argument, not the virtual address.  So, I propose as
part of this change to introduce a new API:  flush_kernel_dcache_addr()
which performs exactly the same as flush_kernel_dcache_page except that
it flushes through the provided virtual address (whether offset mapped
or mapped via vmalloc/vmap).

I'll send out the patch series as a reply to this email.

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-09-08 18:27 xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas James Bottomley
@ 2009-09-08 19:00 ` Russell King
  2009-09-08 19:11   ` James Bottomley
  2009-10-13  1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig
  1 sibling, 1 reply; 20+ messages in thread
From: Russell King @ 2009-09-08 19:00 UTC (permalink / raw)
  To: James Bottomley
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote:
> This bug was observed on parisc, but I would expect it to affect all
> architectures with virtually indexed caches.

I don't think your proposed solution will work for ARM with speculative
prefetching (iow, the latest ARM CPUs.)  If there is a mapping present,
it can be speculatively prefetched from at any time - the CPU designers
have placed no bounds on the amount of speculative prefetching which
may be present in a design.

What this means that for DMA, we will need to handle cache coherency
issues both before and after DMA.

If we're going to allow non-direct mapped (offset mapped in your parlence)
block IO, it makes it impossible to handle cache coherency after DMA
completion - although we can translate (via page table walks) from a
virtual address to a physical, and then to a bus address for DMA, going
back the other way is impossible since there could be many right answers.

What has been annoying me for a while about the current DMA API is that
drivers have to carry around all sorts of information for a DMA mapping,
whether the architecture needs it or not - and sometimes that information
is not what the architecture wants.  To this end, I've been thinking that
something more like:

	struct dma_mapping map;

	err = dma2_map_single(&map, buffer, size, direction);
	if (err)
		...

	addr = dma2_addr(&map);
	/* program controller */

	/* completion */
	dma2_unmap_single(&map);

with similar style interfaces for pages and so forth (scatterlists are
already arch-defined.)  Architectures define the contents of
struct dma_mapping - but it must contain at least the dma address.

What's the advantage of this?  It means that if an architecture needs to
handle cache issues after DMA on unmap via a virtual address, it can
ensure that the correct address is passed through all the way to the
unmap function.  This approach also relieves the driver writer from
having to carry around the direction, size and dma address themselves,
which means we don't need the DMA debug infrastructure to check that
drivers are doing these things correctly.

I seriously doubt, though, that we can revise the DMA API...

In your (and my) case, maybe struct scatterlist also needs to contain
the virtual address as well as the struct page, offset and length?


PS, ARM already does not allow anything but direct-mapped RAM addresses
for dma_map_single(), since we need to be able to translate virtual
addresses to physical for non-coherent L2 cache handling - L1 cache
needs handling via the virtual address and L2 via the physical address.


PPS, you're not the only architecture which has problems with XFS.  ARM
has a long standing issue with it too.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-09-08 19:00 ` Russell King
@ 2009-09-08 19:11   ` James Bottomley
  2009-09-08 20:16       ` Russell King
  0 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2009-09-08 19:11 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Tue, 2009-09-08 at 20:00 +0100, Russell King wrote:
> On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote:
> > This bug was observed on parisc, but I would expect it to affect all
> > architectures with virtually indexed caches.
> 
> I don't think your proposed solution will work for ARM with speculative
> prefetching (iow, the latest ARM CPUs.)  If there is a mapping present,
> it can be speculatively prefetched from at any time - the CPU designers
> have placed no bounds on the amount of speculative prefetching which
> may be present in a design.

The architecturally prescribed fix for this on parisc is to purge the
TLB entry as well.  Without a TLB entry, the CPU is forbidden from doing
speculative reads.  This obviously works only as long as the kernel
never touches the page during DMA, of course ...

Isn't this also true for arm?

> What this means that for DMA, we will need to handle cache coherency
> issues both before and after DMA.
> 
> If we're going to allow non-direct mapped (offset mapped in your parlence)
> block IO, it makes it impossible to handle cache coherency after DMA
> completion - although we can translate (via page table walks) from a
> virtual address to a physical, and then to a bus address for DMA, going
> back the other way is impossible since there could be many right answers.
> 
> What has been annoying me for a while about the current DMA API is that
> drivers have to carry around all sorts of information for a DMA mapping,
> whether the architecture needs it or not - and sometimes that information
> is not what the architecture wants.  To this end, I've been thinking that
> something more like:
> 
> 	struct dma_mapping map;
> 
> 	err = dma2_map_single(&map, buffer, size, direction);
> 	if (err)
> 		...
> 
> 	addr = dma2_addr(&map);
> 	/* program controller */
> 
> 	/* completion */
> 	dma2_unmap_single(&map);
> 
> with similar style interfaces for pages and so forth (scatterlists are
> already arch-defined.)  Architectures define the contents of
> struct dma_mapping - but it must contain at least the dma address.
> 
> What's the advantage of this?  It means that if an architecture needs to
> handle cache issues after DMA on unmap via a virtual address, it can
> ensure that the correct address is passed through all the way to the
> unmap function.  This approach also relieves the driver writer from
> having to carry around the direction, size and dma address themselves,
> which means we don't need the DMA debug infrastructure to check that
> drivers are doing these things correctly.
> 
> I seriously doubt, though, that we can revise the DMA API...

Actually, there's a more fundamental problem.  I did think of doing it
this way initially.  However, most of the dma_map..() cases come down
from block and have already lost all idea of what the virtual address
was and where it came from ... so there's an awful lot more work to do
to make them carry it through to dma_map...()

> In your (and my) case, maybe struct scatterlist also needs to contain
> the virtual address as well as the struct page, offset and length?
> 
> 
> PS, ARM already does not allow anything but direct-mapped RAM addresses
> for dma_map_single(), since we need to be able to translate virtual
> addresses to physical for non-coherent L2 cache handling - L1 cache
> needs handling via the virtual address and L2 via the physical address.
> 
> 
> PPS, you're not the only architecture which has problems with XFS.  ARM
> has a long standing issue with it too.

Well, the good news is that I can fix it to work on parisc.

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-09-08 19:11   ` James Bottomley
@ 2009-09-08 20:16       ` Russell King
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King @ 2009-09-08 20:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote:
> On Tue, 2009-09-08 at 20:00 +0100, Russell King wrote:
> > On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote:
> > > This bug was observed on parisc, but I would expect it to affect all
> > > architectures with virtually indexed caches.
> > 
> > I don't think your proposed solution will work for ARM with speculative
> > prefetching (iow, the latest ARM CPUs.)  If there is a mapping present,
> > it can be speculatively prefetched from at any time - the CPU designers
> > have placed no bounds on the amount of speculative prefetching which
> > may be present in a design.
> 
> The architecturally prescribed fix for this on parisc is to purge the
> TLB entry as well.  Without a TLB entry, the CPU is forbidden from doing
> speculative reads.  This obviously works only as long as the kernel
> never touches the page during DMA, of course ...
> 
> Isn't this also true for arm?

There appears to be nothing architected along those lines for ARM.
>From the architectural point of view, any "normal memory" mapping is
a candidate for speculative accesses provided access is permitted via
the page permissions.

In other words, if the CPU is permitted to access a memory page, it
is a candidate for speculative accesses.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
@ 2009-09-08 20:16       ` Russell King
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King @ 2009-09-08 20:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote:
> On Tue, 2009-09-08 at 20:00 +0100, Russell King wrote:
> > On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote:
> > > This bug was observed on parisc, but I would expect it to affect all
> > > architectures with virtually indexed caches.
> > 
> > I don't think your proposed solution will work for ARM with speculative
> > prefetching (iow, the latest ARM CPUs.)  If there is a mapping present,
> > it can be speculatively prefetched from at any time - the CPU designers
> > have placed no bounds on the amount of speculative prefetching which
> > may be present in a design.
> 
> The architecturally prescribed fix for this on parisc is to purge the
> TLB entry as well.  Without a TLB entry, the CPU is forbidden from doing
> speculative reads.  This obviously works only as long as the kernel
> never touches the page during DMA, of course ...
> 
> Isn't this also true for arm?

There appears to be nothing architected along those lines for ARM.
From the architectural point of view, any "normal memory" mapping is
a candidate for speculative accesses provided access is permitted via
the page permissions.

In other words, if the CPU is permitted to access a memory page, it
is a candidate for speculative accesses.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-09-08 20:16       ` Russell King
  (?)
@ 2009-09-08 20:39       ` James Bottomley
  2009-09-08 21:39         ` Russell King
  -1 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2009-09-08 20:39 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Tue, 2009-09-08 at 21:16 +0100, Russell King wrote:
> On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote:
> > On Tue, 2009-09-08 at 20:00 +0100, Russell King wrote:
> > > On Tue, Sep 08, 2009 at 01:27:49PM -0500, James Bottomley wrote:
> > > > This bug was observed on parisc, but I would expect it to affect all
> > > > architectures with virtually indexed caches.
> > > 
> > > I don't think your proposed solution will work for ARM with speculative
> > > prefetching (iow, the latest ARM CPUs.)  If there is a mapping present,
> > > it can be speculatively prefetched from at any time - the CPU designers
> > > have placed no bounds on the amount of speculative prefetching which
> > > may be present in a design.
> > 
> > The architecturally prescribed fix for this on parisc is to purge the
> > TLB entry as well.  Without a TLB entry, the CPU is forbidden from doing
> > speculative reads.  This obviously works only as long as the kernel
> > never touches the page during DMA, of course ...
> > 
> > Isn't this also true for arm?
> 
> There appears to be nothing architected along those lines for ARM.
> From the architectural point of view, any "normal memory" mapping is
> a candidate for speculative accesses provided access is permitted via
> the page permissions.
> 
> In other words, if the CPU is permitted to access a memory page, it
> is a candidate for speculative accesses.

So the parisc architectural feature is simply a statement of fact for VI
cache architectures: if you don't have a TLB entry for a page, you can't
do cache operations for it.  We have a software TLB interrupt and the
CPU can't interrupt for a speculation, so it's restricted to the
existing TLB entries in its cache for speculative move ins.

So now we know what the problem is, if arm can't operate this way,
what's your suggestion for fixing this ... I take it you have a DMA
coherence index like we do that flushes the cache on DMA ops?

James



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-09-08 20:39       ` James Bottomley
@ 2009-09-08 21:39         ` Russell King
  2009-09-09  3:14           ` James Bottomley
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King @ 2009-09-08 21:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Tue, Sep 08, 2009 at 08:39:12PM +0000, James Bottomley wrote:
> On Tue, 2009-09-08 at 21:16 +0100, Russell King wrote:
> > On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote:
> > > The architecturally prescribed fix for this on parisc is to purge the
> > > TLB entry as well.  Without a TLB entry, the CPU is forbidden from doing
> > > speculative reads.  This obviously works only as long as the kernel
> > > never touches the page during DMA, of course ...
> > > 
> > > Isn't this also true for arm?
> > 
> > There appears to be nothing architected along those lines for ARM.
> > From the architectural point of view, any "normal memory" mapping is
> > a candidate for speculative accesses provided access is permitted via
> > the page permissions.
> > 
> > In other words, if the CPU is permitted to access a memory page, it
> > is a candidate for speculative accesses.
> 
> So the parisc architectural feature is simply a statement of fact for VI
> cache architectures: if you don't have a TLB entry for a page, you can't
> do cache operations for it.

That is also true for ARM - you can't perform cache maintainence on a
page without there being a valid page table entry...

> We have a software TLB interrupt and the
> CPU can't interrupt for a speculation, so it's restricted to the
> existing TLB entries in its cache for speculative move ins.

though this is where we differ, since the hardware walks the page tables
and doesn't require any interrupts to do this.

> So now we know what the problem is, if arm can't operate this way,
> what's your suggestion for fixing this ... I take it you have a DMA
> coherence index like we do that flushes the cache on DMA ops?

DMA on ARM continues to be totally non-coherent with the caches.  There
is no hardware help with this.  So, with these speculative accessing
CPUs, we will need to do software cache maintainence both before _and_
after the DMA in the case of DMA from device.

Maintainence before the DMA is required to ensure that the cache
doesn't write out dirty cache lines to the region which is being
DMA'd to.  Maintainence after the DMA is needed to invalidate any
stale data, whether it be pre-existing or speculatively loaded.

What this means is that we need to have the correct virtual address
in the unmap operation to ensure that subsequent CPU reads access
the newly DMA'd data.

The alternative solution I can see is to ensure that subsystems which
do DMA from (eg) vmalloc'd regions are not selectable for ARM.

It's also worth noting that we do have this restriction already in
place across _all_ ARM CPUs for the DMA APIs which take virtual
addresses - we only accept direct mapped kernel addresses via
those APIs since we use virt_to_phys() for L2 cache maintainence.
Walking page tables, especially with high PTE support (ARM has
joined those architectures supporting highmem), sounds to me very
unfunny.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-09-08 21:39         ` Russell King
@ 2009-09-09  3:14           ` James Bottomley
  2009-09-09  3:17             ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley
                               ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: James Bottomley @ 2009-09-09  3:14 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Tue, 2009-09-08 at 22:39 +0100, Russell King wrote:
> On Tue, Sep 08, 2009 at 08:39:12PM +0000, James Bottomley wrote:
> > On Tue, 2009-09-08 at 21:16 +0100, Russell King wrote:
> > > On Tue, Sep 08, 2009 at 07:11:52PM +0000, James Bottomley wrote:
> > > > The architecturally prescribed fix for this on parisc is to purge the
> > > > TLB entry as well.  Without a TLB entry, the CPU is forbidden from doing
> > > > speculative reads.  This obviously works only as long as the kernel
> > > > never touches the page during DMA, of course ...
> > > > 
> > > > Isn't this also true for arm?
> > > 
> > > There appears to be nothing architected along those lines for ARM.
> > > From the architectural point of view, any "normal memory" mapping is
> > > a candidate for speculative accesses provided access is permitted via
> > > the page permissions.
> > > 
> > > In other words, if the CPU is permitted to access a memory page, it
> > > is a candidate for speculative accesses.
> > 
> > So the parisc architectural feature is simply a statement of fact for VI
> > cache architectures: if you don't have a TLB entry for a page, you can't
> > do cache operations for it.
> 
> That is also true for ARM - you can't perform cache maintainence on a
> page without there being a valid page table entry...
> 
> > We have a software TLB interrupt and the
> > CPU can't interrupt for a speculation, so it's restricted to the
> > existing TLB entries in its cache for speculative move ins.
> 
> though this is where we differ, since the hardware walks the page tables
> and doesn't require any interrupts to do this.
> 
> > So now we know what the problem is, if arm can't operate this way,
> > what's your suggestion for fixing this ... I take it you have a DMA
> > coherence index like we do that flushes the cache on DMA ops?
> 
> DMA on ARM continues to be totally non-coherent with the caches.  There
> is no hardware help with this.  So, with these speculative accessing
> CPUs, we will need to do software cache maintainence both before _and_
> after the DMA in the case of DMA from device.
> 
> Maintainence before the DMA is required to ensure that the cache
> doesn't write out dirty cache lines to the region which is being
> DMA'd to.  Maintainence after the DMA is needed to invalidate any
> stale data, whether it be pre-existing or speculatively loaded.
> 
> What this means is that we need to have the correct virtual address
> in the unmap operation to ensure that subsequent CPU reads access
> the newly DMA'd data.
> 
> The alternative solution I can see is to ensure that subsystems which
> do DMA from (eg) vmalloc'd regions are not selectable for ARM.
> 
> It's also worth noting that we do have this restriction already in
> place across _all_ ARM CPUs for the DMA APIs which take virtual
> addresses - we only accept direct mapped kernel addresses via
> those APIs since we use virt_to_phys() for L2 cache maintainence.
> Walking page tables, especially with high PTE support (ARM has
> joined those architectures supporting highmem), sounds to me very
> unfunny.

OK, so we can work with that.  This series of patches introduces both a
flush and an invalidate (one for pre and one for post).  With this, I'm
able to successfully use xfs filesystems on parisc (before, they mostly
refuse to mount because of journal corruption).

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
  2009-09-09  3:14           ` James Bottomley
@ 2009-09-09  3:17             ` James Bottomley
  2009-09-09  3:23               ` James Bottomley
  2009-09-09  3:18             ` [PATCH 2/5] parisc: add mm " James Bottomley
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2009-09-09  3:17 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Virtually Indexed architectures (which don't do automatic alias
resolution in their caches), we have to flush via the correct
virtual address to prepare pages for DMA.  On some architectures
(like arm) we cannot prevent the CPU from doing data movein along
the alias (and thus giving stale read data), so we not only have to
introduce a flush API to push dirty cache lines out, but also an invalidate
API to kill inconsistent cache lines that may have moved in before
DMA changed the data

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
 include/linux/highmem.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 211ff44..9719952 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page
 static inline void flush_kernel_dcache_page(struct page *page)
 {
 }
+static incline void flush_kernel_dcache_addr(void *vaddr)
+{
+}
+static incline void invalidate_kernel_dcache_addr(void *vaddr)
+{
+}
 #endif
 
 #include <asm/kmap_types.h>
-- 
1.6.3.3




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] parisc: add mm API for DMA to vmalloc/vmap areas
  2009-09-09  3:14           ` James Bottomley
  2009-09-09  3:17             ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley
@ 2009-09-09  3:18             ` James Bottomley
  2009-09-09  3:20             ` [PATCH 3/5] arm: " James Bottomley
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2009-09-09  3:18 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

We already have an API to flush a kernel page along an alias
address, so use it.  The TLB purge prevents the CPU from doing
speculative moveins on the flushed address, so we don't need to
implement and invalidate.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
 arch/parisc/include/asm/cacheflush.h |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 7243951..2536a00 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -90,6 +90,14 @@ static inline void flush_kernel_dcache_page(struct page *page)
 {
 	flush_kernel_dcache_page_addr(page_address(page));
 }
+static inline void flush_kernel_dcache_addr(void *addr)
+{
+	flush_kernel_dcache_page_addr(addr);
+}
+static inline void invalidate_kernel_dcache_addr(void *addr)
+{
+	/* nop .. the flush prevents move in until the page is touched */
+}
 
 #ifdef CONFIG_DEBUG_RODATA
 void mark_rodata_ro(void);
-- 
1.6.3.3




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] arm: add mm API for DMA to vmalloc/vmap areas
  2009-09-09  3:14           ` James Bottomley
  2009-09-09  3:17             ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley
  2009-09-09  3:18             ` [PATCH 2/5] parisc: add mm " James Bottomley
@ 2009-09-09  3:20             ` James Bottomley
  2009-09-09  3:21             ` [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages James Bottomley
  2009-09-09  3:21             ` [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley
  4 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2009-09-09  3:20 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

ARM cannot prevent cache movein, so this patch implements both the
flush and invalidate pieces of the API.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
 arch/arm/include/asm/cacheflush.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 1a711ea..1104ee9 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -436,6 +436,16 @@ static inline void flush_kernel_dcache_page(struct page *page)
 	if ((cache_is_vivt() || cache_is_vipt_aliasing()) && !PageHighMem(page))
 		__cpuc_flush_dcache_page(page_address(page));
 }
+static inline void flush_kernel_dcache_addr(void *addr)
+{
+	if ((cache_is_vivt() || cache_is_vipt_aliasing()))
+		__cpuc_flush_dcache_page(addr);
+}
+static inline void invalidate_kernel_dcache_addr(void *addr)
+{
+	if ((cache_is_vivt() || cache_is_vipt_aliasing()))
+		__cpuc_flush_dcache_page(addr);
+}
 
 #define flush_dcache_mmap_lock(mapping) \
 	spin_lock_irq(&(mapping)->tree_lock)
-- 
1.6.3.3




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages
  2009-09-09  3:14           ` James Bottomley
                               ` (2 preceding siblings ...)
  2009-09-09  3:20             ` [PATCH 3/5] arm: " James Bottomley
@ 2009-09-09  3:21             ` James Bottomley
  2009-09-09  3:21             ` [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley
  4 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2009-09-09  3:21 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

This updates bio_map_kern() to check for pages in the vmalloc address
range and call the new kernel flushing APIs if the are.  This should
allow any kernel user to pass a vmalloc/vmap area to block.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
 fs/bio.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 7673800..ea346b4 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1120,6 +1120,13 @@ void bio_unmap_user(struct bio *bio)
 
 static void bio_map_kern_endio(struct bio *bio, int err)
 {
+	void *kaddr = bio->bi_private;
+	if (is_vmalloc_addr(kaddr)) {
+		void *addr;
+		for (addr = kaddr; addr < kaddr + bio->bi_size;
+		     addr += PAGE_SIZE)
+			invalidate_kernel_dcache_addr(addr);
+	}
 	bio_put(bio);
 }
 
@@ -1138,9 +1145,12 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
 	if (!bio)
 		return ERR_PTR(-ENOMEM);
 
+	bio->bi_private = data;
+
 	offset = offset_in_page(kaddr);
 	for (i = 0; i < nr_pages; i++) {
 		unsigned int bytes = PAGE_SIZE - offset;
+		struct page *page;
 
 		if (len <= 0)
 			break;
@@ -1148,8 +1158,13 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
 		if (bytes > len)
 			bytes = len;
 
-		if (bio_add_pc_page(q, bio, virt_to_page(data), bytes,
-				    offset) < bytes)
+		if (is_vmalloc_addr(data)) {
+			flush_kernel_dcache_addr(data);
+			page = vmalloc_to_page(data);
+		} else
+			page = virt_to_page(data);
+
+		if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes)
 			break;
 
 		data += bytes;
-- 
1.6.3.3




^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures
  2009-09-09  3:14           ` James Bottomley
                               ` (3 preceding siblings ...)
  2009-09-09  3:21             ` [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages James Bottomley
@ 2009-09-09  3:21             ` James Bottomley
  4 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2009-09-09  3:21 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

xfs_buf.c includes what is essentially a hand rolled version of
blk_rq_map_kern().  In order to work properly with the vmalloc buffers
that xfs uses, this hand rolled routine must also implement the flushing
API for vmap/vmalloc areas.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
 fs/xfs/linux-2.6/xfs_buf.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 965df12..62ae977 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -1138,6 +1138,10 @@ xfs_buf_bio_end_io(
 	do {
 		struct page	*page = bvec->bv_page;
 
+		if (is_vmalloc_addr(bp->b_addr))
+			invalidate_kernel_dcache_addr(bp->b_addr +
+						      bvec->bv_offset);
+
 		ASSERT(!PagePrivate(page));
 		if (unlikely(bp->b_error)) {
 			if (bp->b_flags & XBF_READ)
@@ -1202,6 +1206,9 @@ _xfs_buf_ioapply(
 		bio->bi_end_io = xfs_buf_bio_end_io;
 		bio->bi_private = bp;
 
+		if (is_vmalloc_addr(bp->b_addr))
+			flush_kernel_dcache_addr(bp->b_addr);
+
 		bio_add_page(bio, bp->b_pages[0], PAGE_CACHE_SIZE, 0);
 		size = 0;
 
@@ -1228,6 +1235,9 @@ next_chunk:
 		if (nbytes > size)
 			nbytes = size;
 
+		if (is_vmalloc_addr(bp->b_addr))
+			flush_kernel_dcache_addr(bp->b_addr + PAGE_SIZE*map_i);
+
 		rbytes = bio_add_page(bio, bp->b_pages[map_i], nbytes, offset);
 		if (rbytes < nbytes)
 			break;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
  2009-09-09  3:17             ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley
@ 2009-09-09  3:23               ` James Bottomley
  2009-09-09  3:35                 ` Paul Mundt
  0 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2009-09-09  3:23 UTC (permalink / raw)
  To: Russell King
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

On Wed, 2009-09-09 at 03:17 +0000, James Bottomley wrote:
> On Virtually Indexed architectures (which don't do automatic alias
> resolution in their caches), we have to flush via the correct
> virtual address to prepare pages for DMA.  On some architectures
> (like arm) we cannot prevent the CPU from doing data movein along
> the alias (and thus giving stale read data), so we not only have to
> introduce a flush API to push dirty cache lines out, but also an invalidate
> API to kill inconsistent cache lines that may have moved in before
> DMA changed the data
> 
> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
> ---
>  include/linux/highmem.h |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 211ff44..9719952 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page
>  static inline void flush_kernel_dcache_page(struct page *page)
>  {
>  }
> +static incline void flush_kernel_dcache_addr(void *vaddr)
> +{
> +}
> +static incline void invalidate_kernel_dcache_addr(void *vaddr)
> +{
> +}
>  #endif

OK, so it's been pointed out to me that I didn't compile check this ...
just testing to see everyone is awake ...

James



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
  2009-09-09  3:23               ` James Bottomley
@ 2009-09-09  3:35                 ` Paul Mundt
  2009-09-09 14:34                   ` James Bottomley
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Mundt @ 2009-09-09  3:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Russell King, Parisc List, Linux Filesystem Mailing List,
	linux-arch, Christoph Hellwig

On Tue, Sep 08, 2009 at 10:23:21PM -0500, James Bottomley wrote:
> On Wed, 2009-09-09 at 03:17 +0000, James Bottomley wrote:
> > On Virtually Indexed architectures (which don't do automatic alias
> > resolution in their caches), we have to flush via the correct
> > virtual address to prepare pages for DMA.  On some architectures
> > (like arm) we cannot prevent the CPU from doing data movein along
> > the alias (and thus giving stale read data), so we not only have to
> > introduce a flush API to push dirty cache lines out, but also an invalidate
> > API to kill inconsistent cache lines that may have moved in before
> > DMA changed the data
> > 
> > Signed-off-by: James Bottomley <James.Bottomley@suse.de>
> > ---
> >  include/linux/highmem.h |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > index 211ff44..9719952 100644
> > --- a/include/linux/highmem.h
> > +++ b/include/linux/highmem.h
> > @@ -17,6 +17,12 @@ static inline void flush_anon_page(struct vm_area_struct *vma, struct page *page
> >  static inline void flush_kernel_dcache_page(struct page *page)
> >  {
> >  }
> > +static incline void flush_kernel_dcache_addr(void *vaddr)
> > +{
> > +}
> > +static incline void invalidate_kernel_dcache_addr(void *vaddr)
> > +{
> > +}
> >  #endif
> 
> OK, so it's been pointed out to me that I didn't compile check this ...
> just testing to see everyone is awake ...
> 
And here I thought it was a new gcc construct along the lines of
inline-if-so-inclined.. :-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
  2009-09-09  3:35                 ` Paul Mundt
@ 2009-09-09 14:34                   ` James Bottomley
  2009-09-10  0:24                     ` Paul Mundt
  0 siblings, 1 reply; 20+ messages in thread
From: James Bottomley @ 2009-09-09 14:34 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Russell King, Parisc List, Linux Filesystem Mailing List,
	linux-arch, Christoph Hellwig

On Wed, 2009-09-09 at 12:35 +0900, Paul Mundt wrote:
> And here I thought it was a new gcc construct along the lines of
> inline-if-so-inclined.. :-)

Actually, I missed the fact that sh also sets
ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, so you'll need implementations of
these functions too.  Does this look right?

James

---

diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index 4c5462d..db06611 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -48,6 +48,14 @@ static inline void flush_kernel_dcache_page(struct page *page)
 {
 	flush_dcache_page(page);
 }
+static inline void flush_kernel_dcache_addr(void *addr)
+{
+	__flush_invalidate_region(addr, PAGE_SIZE);
+}
+static inline void invalidate_kernel_dcache_addr(void *addr)
+{
+	__flush_invalidate_region(addr, PAGE_SIZE);
+}
 
 #if defined(CONFIG_CPU_SH4) && !defined(CONFIG_CACHE_OFF)
 extern void copy_to_user_page(struct vm_area_struct *vma,



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
  2009-09-09 14:34                   ` James Bottomley
@ 2009-09-10  0:24                     ` Paul Mundt
  2009-09-10  0:30                       ` James Bottomley
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Mundt @ 2009-09-10  0:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: Russell King, Parisc List, Linux Filesystem Mailing List,
	linux-arch, Christoph Hellwig

On Wed, Sep 09, 2009 at 09:34:38AM -0500, James Bottomley wrote:
> On Wed, 2009-09-09 at 12:35 +0900, Paul Mundt wrote:
> > And here I thought it was a new gcc construct along the lines of
> > inline-if-so-inclined.. :-)
> 
> Actually, I missed the fact that sh also sets
> ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, so you'll need implementations of
> these functions too.  Does this look right?
> 
Yes, I was planning on just wiring it up later, but this looks correct.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] mm: add coherence API for DMA to vmalloc/vmap areas
  2009-09-10  0:24                     ` Paul Mundt
@ 2009-09-10  0:30                       ` James Bottomley
  0 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2009-09-10  0:30 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Russell King, Parisc List, Linux Filesystem Mailing List,
	linux-arch, Christoph Hellwig

On Thu, 2009-09-10 at 09:24 +0900, Paul Mundt wrote:
> On Wed, Sep 09, 2009 at 09:34:38AM -0500, James Bottomley wrote:
> > On Wed, 2009-09-09 at 12:35 +0900, Paul Mundt wrote:
> > > And here I thought it was a new gcc construct along the lines of
> > > inline-if-so-inclined.. :-)
> > 
> > Actually, I missed the fact that sh also sets
> > ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, so you'll need implementations of
> > these functions too.  Does this look right?
> > 
> Yes, I was planning on just wiring it up later, but this looks correct.

Great, thanks!

Give me an ack and I'll take care of submitting it ... although it would
be nice to test it out with the xfs problem case.

James



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-09-08 18:27 xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas James Bottomley
  2009-09-08 19:00 ` Russell King
@ 2009-10-13  1:40 ` Christoph Hellwig
  2009-10-13  4:13   ` James Bottomley
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2009-10-13  1:40 UTC (permalink / raw)
  To: James Bottomley
  Cc: Parisc List, Linux Filesystem Mailing List, linux-arch,
	Christoph Hellwig

So what's going to happen with this patch series?  Can we expect it to
get merged one day?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas
  2009-10-13  1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig
@ 2009-10-13  4:13   ` James Bottomley
  0 siblings, 0 replies; 20+ messages in thread
From: James Bottomley @ 2009-10-13  4:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Parisc List, Linux Filesystem Mailing List, linux-arch

On Tue, 2009-10-13 at 03:40 +0200, Christoph Hellwig wrote:
> So what's going to happen with this patch series?  Can we expect it to
> get merged one day?

Well, we need it for parisc, so we can just push it through our tree.  I
was sort of hoping for confirmation that the other arch bits worked ...
but they can't be more broken than they are today, so I suppose the risk
is low.

James

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-10-13  4:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-08 18:27 xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas James Bottomley
2009-09-08 19:00 ` Russell King
2009-09-08 19:11   ` James Bottomley
2009-09-08 20:16     ` Russell King
2009-09-08 20:16       ` Russell King
2009-09-08 20:39       ` James Bottomley
2009-09-08 21:39         ` Russell King
2009-09-09  3:14           ` James Bottomley
2009-09-09  3:17             ` [PATCH 1/5] mm: add coherence API for DMA " James Bottomley
2009-09-09  3:23               ` James Bottomley
2009-09-09  3:35                 ` Paul Mundt
2009-09-09 14:34                   ` James Bottomley
2009-09-10  0:24                     ` Paul Mundt
2009-09-10  0:30                       ` James Bottomley
2009-09-09  3:18             ` [PATCH 2/5] parisc: add mm " James Bottomley
2009-09-09  3:20             ` [PATCH 3/5] arm: " James Bottomley
2009-09-09  3:21             ` [PATCH 4/5] block: permit I/O to vmalloc/vmap kernel pages James Bottomley
2009-09-09  3:21             ` [PATCH 5/5] xfs: fix xfs to work with Virtually Indexed architectures James Bottomley
2009-10-13  1:40 ` xfs failure on parisc (and presumably other VI cache systems) caused by I/O to vmalloc/vmap areas Christoph Hellwig
2009-10-13  4:13   ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.