linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] swapin flush cache bug
@ 2001-02-12 23:21 Marcelo Tosatti
  2001-02-13  9:50 ` Russell King
  2001-02-13 10:53 ` NIIBE Yutaka
  0 siblings, 2 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2001-02-12 23:21 UTC (permalink / raw)
  To: Linus Torvalds, Alan Cox; +Cc: NIIBE Yutaka, lkml


Hi,

Niibe Yutaka noted (and added an entry on the MM bugzilla system) that
cache flushing on do_swap_page() is buggy. Here: 

---
        struct page *page = lookup_swap_cache(entry);
        pte_t pte;

        if (!page) {
                lock_kernel();
                swapin_readahead(entry);
                page = read_swap_cache(entry);
                unlock_kernel();
                if (!page)
                        return -1;

                flush_page_to_ram(page);
                flush_icache_page(vma, page);
        }

        mm->rss++;
--

If lookup_swap_cache() finds a page in the swap cache, and that page was
in memory because of the swapin readahead, the cache is not flushed.

Here is a patch to fix the problem by always flushing the cache including
for pages in the swap cache:

--- linux.orig/mm/memory.c.orig       Thu Feb  8 10:52:20 2001
+++ linux/mm/memory.c    Thu Feb  8 10:54:07 2001
@@ -1033,12 +1033,12 @@
                unlock_kernel();
                if (!page)
                        return -1;
-
-               flush_page_to_ram(page);
-               flush_icache_page(vma, page);
        }
 
        mm->rss++;
+
+       flush_page_to_ram(page);
+       flush_icache_page(vma, page);
 
        pte = mk_pte(page, vma->vm_page_prot);




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://vger.kernel.org/lkml/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-12 23:21 [PATCH] swapin flush cache bug Marcelo Tosatti
@ 2001-02-13  9:50 ` Russell King
  2001-02-13 10:53 ` NIIBE Yutaka
  1 sibling, 0 replies; 23+ messages in thread
From: Russell King @ 2001-02-13  9:50 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Linus Torvalds, Alan Cox, NIIBE Yutaka, lkml

Marcelo Tosatti writes:
> If lookup_swap_cache() finds a page in the swap cache, and that page was
> in memory because of the swapin readahead, the cache is not flushed.
> 
> Here is a patch to fix the problem by always flushing the cache including
> for pages in the swap cache:

> -
> -               flush_page_to_ram(page);
> -               flush_icache_page(vma, page);
>         }
>  
>         mm->rss++;
> +
> +       flush_page_to_ram(page);
> +       flush_icache_page(vma, page);

Surely if the page is in the swap cache, we don't need the
flush_page_to_ram() because the data is already written to the page.  Yes,
there may be some reminents of it in the cache due to it being written
to disk via PIO.

Thinking about it some more - we have a process.  It used to contain page
P at address V.  We unmapped the page (and did the right thing with the
caches).  Now, something wants to access address V, so we pull the page
from the swap cache, and place page P back at address V.  We therefore
shouldn't need any cache manipulation at this point.

What was the problem?  The old code seems to behave well on a virtual
address indexed virtual address tagged cache.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-12 23:21 [PATCH] swapin flush cache bug Marcelo Tosatti
  2001-02-13  9:50 ` Russell King
@ 2001-02-13 10:53 ` NIIBE Yutaka
  2001-02-13 11:16   ` Russell King
                     ` (3 more replies)
  1 sibling, 4 replies; 23+ messages in thread
From: NIIBE Yutaka @ 2001-02-13 10:53 UTC (permalink / raw)
  To: Russell King; +Cc: Marcelo Tosatti, Linus Torvalds, Alan Cox, lkml

Russell King wrote:
 > What was the problem?  The old code seems to behave well on a virtual
 > address indexed virtual address tagged cache.

My case (SH-4) is: virtual address indexed, physical address tagged cache
(which has alias issue).

Suppose there's I/O to the physical page P asynchronously, and the
page is placed in the swap cache.  It remains cache entry, say,
indexed kernel virtual address K.  Then, process maps P at U.  U and K
(may) indexes differently.  The process will get the data from memory
(not the one in the cashe), if it's not flushed.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-13 10:53 ` NIIBE Yutaka
@ 2001-02-13 11:16   ` Russell King
  2001-02-13 11:26   ` Alan Cox
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Russell King @ 2001-02-13 11:16 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: Marcelo Tosatti, Linus Torvalds, Alan Cox, lkml

NIIBE Yutaka writes:
> My case (SH-4) is: virtual address indexed, physical address tagged cache
> (which has alias issue).

vivt caches have the same alias issue.

> Suppose there's I/O to the physical page P asynchronously, and the
> page is placed in the swap cache.

Unless someone else (Rik/DaveM) says otherwise, it is my understanding
that any IO for page P will only ever be a write to disk.  Therefore,
when you get a copy of the page from the swap cache, the physical memory
for that page is the same as it was when the process was using it last.

> It remains cache entry, say, indexed kernel virtual address K.  Then,
> process maps P at U.  U and K (may) indexes differently.  The process
> will get the data from memory (not the one in the cashe), if it's not
> flushed.

The data from memory will still be up to date though.  However, I agree
that you will end up with cache aliases.  I will also end up with cache
aliases.  The question now is, do these aliases really matter?

On my caches, the answer is no because they're not marked dirty, and
therefore will get dropped from the cache without writeback to memory.

If your cache doesn't write back clean cache data to memory, then you
should also behave well.

However, that said, someone more experienced with the Linux MM should
comment.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-13 10:53 ` NIIBE Yutaka
  2001-02-13 11:16   ` Russell King
@ 2001-02-13 11:26   ` Alan Cox
  2001-02-13 23:50   ` NIIBE Yutaka
  2001-02-14  2:08   ` NIIBE Yutaka
  3 siblings, 0 replies; 23+ messages in thread
From: Alan Cox @ 2001-02-13 11:26 UTC (permalink / raw)
  To: NIIBE Yutaka
  Cc: Russell King, Marcelo Tosatti, Linus Torvalds, Alan Cox, lkml

> Suppose there's I/O to the physical page P asynchronously, and the
> page is placed in the swap cache.  It remains cache entry, say,
> indexed kernel virtual address K.  Then, process maps P at U.  U and K
> (may) indexes differently.  The process will get the data from memory
> (not the one in the cashe), if it's not flushed.

Ok we need to handle that case a bit more intelligently so those flushes dont
get into other ports code paths. 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-13 10:53 ` NIIBE Yutaka
  2001-02-13 11:16   ` Russell King
  2001-02-13 11:26   ` Alan Cox
@ 2001-02-13 23:50   ` NIIBE Yutaka
  2001-02-14  2:08   ` NIIBE Yutaka
  3 siblings, 0 replies; 23+ messages in thread
From: NIIBE Yutaka @ 2001-02-13 23:50 UTC (permalink / raw)
  To: Russell King; +Cc: Marcelo Tosatti, Linus Torvalds, Alan Cox, lkml

Russell King wrote:
 > Unless someone else (Rik/DaveM) says otherwise, it is my understanding
 > that any IO for page P will only ever be a write to disk.  Therefore,
 > when you get a copy of the page from the swap cache, the physical memory
 > for that page is the same as it was when the process was using it last.
[...]
 > The data from memory will still be up to date though.  However, I agree
 > that you will end up with cache aliases.  I will also end up with cache
 > aliases.  The question now is, do these aliases really matter?
 > 
 > On my caches, the answer is no because they're not marked dirty, and
 > therefore will get dropped from the cache without writeback to memory.
 > 
 > If your cache doesn't write back clean cache data to memory, then you
 > should also behave well.

Yes, that's the difference.  It's write back cache, in my case.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-13 10:53 ` NIIBE Yutaka
                     ` (2 preceding siblings ...)
  2001-02-13 23:50   ` NIIBE Yutaka
@ 2001-02-14  2:08   ` NIIBE Yutaka
  2001-02-14 10:12     ` Marcelo Tosatti
  2001-06-27  0:51     ` NIIBE Yutaka
  3 siblings, 2 replies; 23+ messages in thread
From: NIIBE Yutaka @ 2001-02-14  2:08 UTC (permalink / raw)
  To: Alan Cox; +Cc: Russell King, Marcelo Tosatti, Linus Torvalds, lkml

Alan Cox wrote:
 > Ok we need to handle that case a bit more intelligently so those flushes dont
 > get into other ports code paths. 

Possibly at fs/buffer.c:end_buffer_io_async?

We need to flush the cache when I/O was READ or READA.  Is there any
way for end_buffer_io_async to distinguish which I/O (READ or WRITE)
has been done?

--------------------------------------
Problem with write-back cache.

(1) Page got swapped out

           Swap out
   [ Disk ] <---- P [ Page ]

(2) Page got swapped in asynchronously, possibly by read-ahead

           Swap in
   [ Disk ] ----> P [ Page ]
	          K

   The I/O from disk goes through kernel virtual address K.
   We have cache entries indexed by K.

(3) Page fault occurs at user space U

   [ Disk ]       P [ Page ] <----- U
		  K

   The control goes to do_swap_page, found the page at
   lookup_swap_cache.

   If K and U indexes differently, we have cache alias issues, 
   we need to flush the entries indexed by K.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-14  2:08   ` NIIBE Yutaka
@ 2001-02-14 10:12     ` Marcelo Tosatti
  2001-06-27  0:51     ` NIIBE Yutaka
  1 sibling, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2001-02-14 10:12 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: Alan Cox, Russell King, Linus Torvalds, lkml


On Wed, 14 Feb 2001, NIIBE Yutaka wrote:

> Alan Cox wrote:
>  > Ok we need to handle that case a bit more intelligently so those flushes dont
>  > get into other ports code paths. 
> 
> Possibly at fs/buffer.c:end_buffer_io_async?
> 
> We need to flush the cache when I/O was READ or READA.

Yet another thing (1) on end_buffer_io_async() to handle a case which is
only true for a specific user of it. Since the other special case handling
is for swap IO too, I think a separate IO end operation for swap would be
interesting.

(1) The current one is SetPageDecrAfter handling.

> Is there any way for end_buffer_io_async to distinguish which I/O (READ or WRITE)
> has been done?

Yes. If the buffer_head is uptodated (BH_Uptodate) then its a WRITE,
otherwise its a READ (this is only true before mark_buffer_uptodate() call
inside end_buffer_io_async(), of course). 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-02-14  2:08   ` NIIBE Yutaka
  2001-02-14 10:12     ` Marcelo Tosatti
@ 2001-06-27  0:51     ` NIIBE Yutaka
  2001-06-27 10:11       ` Marcelo Tosatti
  2001-06-28  0:07       ` NIIBE Yutaka
  1 sibling, 2 replies; 23+ messages in thread
From: NIIBE Yutaka @ 2001-06-27  0:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

Hello Marcelo, 

This is follow-up to the mail in February.  You may perhaps forget the
context, it's the bug of MM about cache flushing for swapped-in-pages.
I see this bug on SuperH port (SH-4).

I think that we have this issue on the machine whose flush_dcache_page()
is defined.  In current code, the cache aren't flushed for 
the asynchronously-swapped-in pages which is cached in swap cache.
This is the problem.

Marcelo Tosatti writes:
 > Yet another thing (1) on end_buffer_io_async() to handle a case which is
 > only true for a specific user of it. Since the other special case handling
 > is for swap IO too, I think a separate IO end operation for swap would be
 > interesting.
 > 
 > (1) The current one is SetPageDecrAfter handling.

How about this?  I've updated MM bugzilla already.

2001-06-26  NIIBE Yutaka  <gniibe@m17n.org>

	* include/linux/mm.h (PG_flush_after, PageFlushAfter,
	SetPageFlushAfter, PageTestandClearFlushAfter): New bit.
	* mm/page_io.c (rw_swap_page_base): Set flush-after bit.
	* fs/buffer.c (end_buffer_io_async): Implement flush-ing
	with PG_flush_after.

	* mm/memory.c (do_swap_page): Remove flush-ing the page.

diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/fs/buffer.c kernel/fs/buffer.c
--- v2.4.6-pre5/fs/buffer.c	Mon Jun 25 18:48:07 2001
+++ kernel/fs/buffer.c	Tue Jun 26 15:11:17 2001
@@ -831,6 +831,9 @@ static void end_buffer_io_async(struct b
 	if (PageTestandClearDecrAfter(page))
 		atomic_dec(&nr_async_pages);
 
+	if (PageTestandClearFlushAfter(page))
+		flush_dcache_page(page);
+
 	UnlockPage(page);
 
 	return;
diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/include/linux/mm.h kernel/include/linux/mm.h
--- v2.4.6-pre5/include/linux/mm.h	Mon Jun 25 18:48:09 2001
+++ kernel/include/linux/mm.h	Tue Jun 26 14:58:56 2001
@@ -282,6 +282,7 @@ typedef struct page {
 #define PG_inactive_clean	11
 #define PG_highmem		12
 #define PG_checked		13	/* kill me in 2.5.<early>. */
+#define PG_flush_after		14
 				/* bits 21-29 unused */
 #define PG_arch_1		30
 #define PG_reserved		31
@@ -364,6 +365,10 @@ static inline void set_page_dirty(struct
 
 #define SetPageReserved(page)		set_bit(PG_reserved, &(page)->flags)
 #define ClearPageReserved(page)		clear_bit(PG_reserved, &(page)->flags)
+
+#define PageFlushAfter(page)	test_bit(PG_flush_after, &(page)->flags)
+#define SetPageFlushAfter(page)	set_bit(PG_flush_after, &(page)->flags)
+#define PageTestandClearFlushAfter(page)	test_and_clear_bit(PG_flush_after, &(page)->flags)
 
 /*
  * Error return values for the *_nopage functions
diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/mm/memory.c kernel/mm/memory.c
--- v2.4.6-pre5/mm/memory.c	Mon Jun 25 18:48:10 2001
+++ kernel/mm/memory.c	Tue Jun 26 14:48:15 2001
@@ -1109,8 +1109,6 @@ static int do_swap_page(struct mm_struct
 			return -1;
 		}
 		wait_on_page(page);
-		flush_page_to_ram(page);
-		flush_icache_page(vma, page);
 	}
 
 	/*
diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/mm/page_io.c kernel/mm/page_io.c
--- v2.4.6-pre5/mm/page_io.c	Mon Apr 30 16:15:32 2001
+++ kernel/mm/page_io.c	Tue Jun 26 15:01:00 2001
@@ -50,6 +50,7 @@ static int rw_swap_page_base(int rw, swp
 
 	if (rw == READ) {
 		ClearPageUptodate(page);
+		SetPageFlushAfter(page);
 		kstat.pswpin++;
 	} else
 		kstat.pswpout++;
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-27  0:51     ` NIIBE Yutaka
@ 2001-06-27 10:11       ` Marcelo Tosatti
  2001-06-28  0:42         ` David S. Miller
  2001-06-28  0:07       ` NIIBE Yutaka
  1 sibling, 1 reply; 23+ messages in thread
From: Marcelo Tosatti @ 2001-06-27 10:11 UTC (permalink / raw)
  To: Stephen C. Tweedie, NIIBE Yutaka; +Cc: lkml



Hi, 

I think Stephen C. Tweedie has some considerations about the cache
flushing calls on do_swap_page().

Stephen? 


On Wed, 27 Jun 2001, NIIBE Yutaka wrote:

> Hello Marcelo, 
> 
> This is follow-up to the mail in February.  You may perhaps forget the
> context, it's the bug of MM about cache flushing for swapped-in-pages.
> I see this bug on SuperH port (SH-4).
> 
> I think that we have this issue on the machine whose flush_dcache_page()
> is defined.  In current code, the cache aren't flushed for 
> the asynchronously-swapped-in pages which is cached in swap cache.
> This is the problem.
> 
> Marcelo Tosatti writes:
>  > Yet another thing (1) on end_buffer_io_async() to handle a case which is
>  > only true for a specific user of it. Since the other special case handling
>  > is for swap IO too, I think a separate IO end operation for swap would be
>  > interesting.
>  > 
>  > (1) The current one is SetPageDecrAfter handling.
> 
> How about this?  I've updated MM bugzilla already.
> 
> 2001-06-26  NIIBE Yutaka  <gniibe@m17n.org>
> 
> 	* include/linux/mm.h (PG_flush_after, PageFlushAfter,
> 	SetPageFlushAfter, PageTestandClearFlushAfter): New bit.
> 	* mm/page_io.c (rw_swap_page_base): Set flush-after bit.
> 	* fs/buffer.c (end_buffer_io_async): Implement flush-ing
> 	with PG_flush_after.
> 
> 	* mm/memory.c (do_swap_page): Remove flush-ing the page.
> 
> diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/fs/buffer.c kernel/fs/buffer.c
> --- v2.4.6-pre5/fs/buffer.c	Mon Jun 25 18:48:07 2001
> +++ kernel/fs/buffer.c	Tue Jun 26 15:11:17 2001
> @@ -831,6 +831,9 @@ static void end_buffer_io_async(struct b
>  	if (PageTestandClearDecrAfter(page))
>  		atomic_dec(&nr_async_pages);
>  
> +	if (PageTestandClearFlushAfter(page))
> +		flush_dcache_page(page);
> +
>  	UnlockPage(page);
>  
>  	return;
> diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/include/linux/mm.h kernel/include/linux/mm.h
> --- v2.4.6-pre5/include/linux/mm.h	Mon Jun 25 18:48:09 2001
> +++ kernel/include/linux/mm.h	Tue Jun 26 14:58:56 2001
> @@ -282,6 +282,7 @@ typedef struct page {
>  #define PG_inactive_clean	11
>  #define PG_highmem		12
>  #define PG_checked		13	/* kill me in 2.5.<early>. */
> +#define PG_flush_after		14
>  				/* bits 21-29 unused */
>  #define PG_arch_1		30
>  #define PG_reserved		31
> @@ -364,6 +365,10 @@ static inline void set_page_dirty(struct
>  
>  #define SetPageReserved(page)		set_bit(PG_reserved, &(page)->flags)
>  #define ClearPageReserved(page)		clear_bit(PG_reserved, &(page)->flags)
> +
> +#define PageFlushAfter(page)	test_bit(PG_flush_after, &(page)->flags)
> +#define SetPageFlushAfter(page)	set_bit(PG_flush_after, &(page)->flags)
> +#define PageTestandClearFlushAfter(page)	test_and_clear_bit(PG_flush_after, &(page)->flags)
>  
>  /*
>   * Error return values for the *_nopage functions
> diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/mm/memory.c kernel/mm/memory.c
> --- v2.4.6-pre5/mm/memory.c	Mon Jun 25 18:48:10 2001
> +++ kernel/mm/memory.c	Tue Jun 26 14:48:15 2001
> @@ -1109,8 +1109,6 @@ static int do_swap_page(struct mm_struct
>  			return -1;
>  		}
>  		wait_on_page(page);
> -		flush_page_to_ram(page);
> -		flush_icache_page(vma, page);
>  	}
>  
>  	/*
> diff -ruNp --exclude=CVS --exclude=.cvsignore v2.4.6-pre5/mm/page_io.c kernel/mm/page_io.c
> --- v2.4.6-pre5/mm/page_io.c	Mon Apr 30 16:15:32 2001
> +++ kernel/mm/page_io.c	Tue Jun 26 15:01:00 2001
> @@ -50,6 +50,7 @@ static int rw_swap_page_base(int rw, swp
>  
>  	if (rw == READ) {
>  		ClearPageUptodate(page);
> +		SetPageFlushAfter(page);
>  		kstat.pswpin++;
>  	} else
>  		kstat.pswpout++;
> -- 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:07       ` NIIBE Yutaka
@ 2001-06-27 22:41         ` Marcelo Tosatti
  2001-06-28  0:23         ` Stephen C. Tweedie
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Marcelo Tosatti @ 2001-06-27 22:41 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: Stephen C. Tweedie, lkml



On Thu, 28 Jun 2001, NIIBE Yutaka wrote:

> Marcelo Tosatti wrote:
>  > I think Stephen C. Tweedie has some considerations about the cache
>  > flushing calls on do_swap_page().
> 
> Yup.  IIRC, he said that flushing cache at do_swap_page() (which I've
> tried at first) is not good, because it's the hot path and it causes
> another performance problem in apache or sendmail, where many
> processes share the pages in swap cache.

I guess he has some other comments about it... 

Again, Stephen ? 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-27  0:51     ` NIIBE Yutaka
  2001-06-27 10:11       ` Marcelo Tosatti
@ 2001-06-28  0:07       ` NIIBE Yutaka
  2001-06-27 22:41         ` Marcelo Tosatti
                           ` (3 more replies)
  1 sibling, 4 replies; 23+ messages in thread
From: NIIBE Yutaka @ 2001-06-28  0:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Stephen C. Tweedie, lkml

Marcelo Tosatti wrote:
 > I think Stephen C. Tweedie has some considerations about the cache
 > flushing calls on do_swap_page().

Yup.  IIRC, he said that flushing cache at do_swap_page() (which I've
tried at first) is not good, because it's the hot path and it causes
another performance problem in apache or sendmail, where many
processes share the pages in swap cache.

Then, the fix I sent yesterday is second try.  The flush is moved to
I/O routine, instead of do_swap_page().

Actually, I really wonder why current code causes no problem in other
architectures.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:07       ` NIIBE Yutaka
  2001-06-27 22:41         ` Marcelo Tosatti
@ 2001-06-28  0:23         ` Stephen C. Tweedie
  2001-06-28  0:47           ` David S. Miller
  2001-06-28  0:41         ` [PATCH] swapin flush cache bug NIIBE Yutaka
  2001-06-28  0:46         ` [PATCH] swapin flush cache bug David S. Miller
  3 siblings, 1 reply; 23+ messages in thread
From: Stephen C. Tweedie @ 2001-06-28  0:23 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: Marcelo Tosatti, Stephen C. Tweedie, lkml

Hi,

On Thu, Jun 28, 2001 at 09:07:52AM +0900, NIIBE Yutaka wrote:
> Marcelo Tosatti wrote:
>  > I think Stephen C. Tweedie has some considerations about the cache
>  > flushing calls on do_swap_page().
> 
> Yup.  IIRC, he said that flushing cache at do_swap_page() (which I've
> tried at first) is not good, because it's the hot path and it causes
> another performance problem in apache or sendmail, where many
> processes share the pages in swap cache.
> 
> Then, the fix I sent yesterday is second try.  The flush is moved to
> I/O routine, instead of do_swap_page().

I really want somebody who has worked on weird caching architectures
to look at it too, but I don't see that the new code works well.
First, don't we want to do a flush_page_to_ram() *before* starting the
swap IO?  There's no point dma'ing the swap page to ram if old, dirty
cache data is then going to be written back on top of that.

Secondly, the flushing of icache/dcache only needs to be done by the
time we come to use the page, so can be performed any time, before or
after the IO.  We're not going to be accessing the page during IO so
if we flush first we won't risk having stale cache data by the time
the IO completes.

So, why not just do the flushing before the IO?  Adding an async trap
to flush the page after swap IO seems the wrong approach: this should
be possible to fix synchronously.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:07       ` NIIBE Yutaka
  2001-06-27 22:41         ` Marcelo Tosatti
  2001-06-28  0:23         ` Stephen C. Tweedie
@ 2001-06-28  0:41         ` NIIBE Yutaka
  2001-06-28  1:04           ` NIIBE Yutaka
  2001-06-28  0:46         ` [PATCH] swapin flush cache bug David S. Miller
  3 siblings, 1 reply; 23+ messages in thread
From: NIIBE Yutaka @ 2001-06-28  0:41 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Marcelo Tosatti, lkml

Hello Stephen, 

Stephen C. Tweedie wrote:
 > First, don't we want to do a flush_page_to_ram() *before* starting the
 > swap IO?

Well, let me explain the issue.  It is the thing we need to do
flushing *after* I/O.

--------------------------
Problem with virtually indexed physically tagged write-back cache.

(1) Page got swapped out

           Swap out
   [ Page ] ----> [ Disk ]

(2) Page got swapped in asynchronously, possibly by read-ahead

           Swap in
   [ Page ] <---- [ Disk ]
	   K

   The I/O from disk goes through kernel virtual address K.
   We have cache entries indexed by K.

(3) Page fault occurs at user space U

   U ----> [ Page ] -----> [ Disk ]
		   K

   The control goes to do_swap_page, found the page at
   lookup_swap_cache.

   If K and U indexes differently, we have cache alias issues, we need
   to flush the entries indexed by K and let them go to memory.  Or else, 
   user space will see bogus data in Page.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-27 10:11       ` Marcelo Tosatti
@ 2001-06-28  0:42         ` David S. Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David S. Miller @ 2001-06-28  0:42 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: Marcelo Tosatti, Stephen C. Tweedie, lkml


NIIBE Yutaka writes:
 > Actually, I really wonder why current code causes no problem in other
 > architectures.

Probably because DMA mapping interfaces take care of all flushing
details after I/O is complete.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:07       ` NIIBE Yutaka
                           ` (2 preceding siblings ...)
  2001-06-28  0:41         ` [PATCH] swapin flush cache bug NIIBE Yutaka
@ 2001-06-28  0:46         ` David S. Miller
  3 siblings, 0 replies; 23+ messages in thread
From: David S. Miller @ 2001-06-28  0:46 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: NIIBE Yutaka, Marcelo Tosatti, lkml


Stephen C. Tweedie writes:
 > I really want somebody who has worked on weird caching architectures
 > to look at it too, but I don't see that the new code works well.
 > First, don't we want to do a flush_page_to_ram() *before* starting the
 > swap IO?  There's no point dma'ing the swap page to ram if old, dirty
 > cache data is then going to be written back on top of that.
 > 
 > Secondly, the flushing of icache/dcache only needs to be done by the
 > time we come to use the page, so can be performed any time, before or
 > after the IO.  We're not going to be accessing the page during IO so
 > if we flush first we won't risk having stale cache data by the time
 > the IO completes.
 > 
 > So, why not just do the flushing before the IO?  Adding an async trap
 > to flush the page after swap IO seems the wrong approach: this should
 > be possible to fix synchronously.

People are totally confusing two completely seperate issues here.

Flushing for I/O <--> CPU coherency.

Flushing for CPU <--> CPU coherency.

flush_page_to_ram and flush_dcache_page are _ONLY_ for the CPU<-->CPU
coherency issues, not for I/O<-->CPU issues.

The I/O<-->CPU issues need to be handled elsewhere, for example in the
PCI dma mapping interface implementation.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:23         ` Stephen C. Tweedie
@ 2001-06-28  0:47           ` David S. Miller
  2001-06-28  1:10             ` David S. Miller
  2001-06-29 14:18             ` NIIBE Yutaka
  0 siblings, 2 replies; 23+ messages in thread
From: David S. Miller @ 2001-06-28  0:47 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: Stephen C. Tweedie, Marcelo Tosatti, lkml


NIIBE Yutaka writes:
 > (2) Page got swapped in asynchronously, possibly by read-ahead
 > 
 >            Swap in
 >    [ Page ] <---- [ Disk ]
 > 	   K
 > 
 >    The I/O from disk goes through kernel virtual address K.
 >    We have cache entries indexed by K.

The I/O completion must flush the cache, not the VM subsystem.

You must implement cache flushing at the DMA tranfer end point
to fix the problem you are describing.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:41         ` [PATCH] swapin flush cache bug NIIBE Yutaka
@ 2001-06-28  1:04           ` NIIBE Yutaka
  2001-07-02 11:23             ` Cache issues NIIBE Yutaka
  0 siblings, 1 reply; 23+ messages in thread
From: NIIBE Yutaka @ 2001-06-28  1:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: Stephen C. Tweedie, Marcelo Tosatti, lkml

David S. Miller wrote:
 > The I/O completion must flush the cache, not the VM subsystem.
 > 
 > You must implement cache flushing at the DMA tranfer end point
 > to fix the problem you are describing.

Thanks a lot.  I understand now.  

Aha, that's the reason why we have __flush_dcache_range() in ide_insw
for Sparc64 implementation, isn't it?  I'll follow it for SuperH.

I'll close the entry for MM bugzilla, it's not MM bug.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:47           ` David S. Miller
@ 2001-06-28  1:10             ` David S. Miller
  2001-06-29 14:18             ` NIIBE Yutaka
  1 sibling, 0 replies; 23+ messages in thread
From: David S. Miller @ 2001-06-28  1:10 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: Stephen C. Tweedie, Marcelo Tosatti, lkml


NIIBE Yutaka writes:
 > Aha, that's the reason why we have __flush_dcache_range() in ide_insw
 > for Sparc64 implementation, isn't it?  I'll follow it for SuperH.

This is precisely correct.

 > I'll close the entry for MM bugzilla, it's not MM bug.

Great.

Later,
David S. Miller
davem@redhat.com



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] swapin flush cache bug
  2001-06-28  0:47           ` David S. Miller
  2001-06-28  1:10             ` David S. Miller
@ 2001-06-29 14:18             ` NIIBE Yutaka
  2001-07-02 22:47               ` Cache issues David S. Miller
  1 sibling, 1 reply; 23+ messages in thread
From: NIIBE Yutaka @ 2001-06-29 14:18 UTC (permalink / raw)
  To: David S. Miller, Stephen C. Tweedie, Marcelo Tosatti; +Cc: lkml

NIIBE Yutaka wrote:
 > I'll close the entry for MM bugzilla, it's not MM bug.

I've closed the entry.  It is SuperH cache flushing issue at I/O.

Then, thinking again, I think that my argument for do_swap_page() is
still valid.

	When the page is swapped in, the cache for the page is flushed
	__only if__ it's not found in swap cache.

I don't see any reason why we need to flush the cache here.

--- v2.4.6-pre5/mm/memory.c	Mon Jun 25 18:48:10 2001
+++ kernel/mm/memory.c	Tue Jun 26 14:48:15 2001
@@ -1109,8 +1109,6 @@ static int do_swap_page(struct mm_struct
 			return -1;
 		}
 		wait_on_page(page);
-		flush_page_to_ram(page);
-		flush_icache_page(vma, page);
 	}
 
 	/*
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Cache issues
  2001-06-28  1:04           ` NIIBE Yutaka
@ 2001-07-02 11:23             ` NIIBE Yutaka
  2001-07-03  0:04               ` NIIBE Yutaka
  0 siblings, 1 reply; 23+ messages in thread
From: NIIBE Yutaka @ 2001-07-02 11:23 UTC (permalink / raw)
  To: David S. Miller; +Cc: lkml

NIIBE Yutaka wrote:
 > I don't see any reason why we need to flush the cache here.
 > 
 > --- v2.4.6-pre5/mm/memory.c	Mon Jun 25 18:48:10 2001
 > +++ kernel/mm/memory.c	Tue Jun 26 14:48:15 2001
 > @@ -1109,8 +1109,6 @@ static int do_swap_page(struct mm_struct
 >  			return -1;
 >  		}
 >  		wait_on_page(page);
 > -		flush_page_to_ram(page);
 > -		flush_icache_page(vma, page);
 >  	}
 >  
 >  	/*

I've looked thorough all flush_page_to_ram and flush_icache_page calls.
If the architecture support follows the rule of Documentation/cachetlb.txt, 
I think that all the occurrences of flush_page_to_ram and
flush_icache_page are (almost) bogus now.

We have two issues yet:

(1) include/linux/highmem.h:memclear_highpage_flush
    We need to call flush_dcache_page here to remove flush_page_to_ram

(2) kernel/ptrace.c
    We need to call flush_dcache_page here too.
    Special care would be needed here.  I think that we cannot defer
    the flushing here.  There's the case where page->mapping ==
    &swapper_space, thus mapping->i_mmap == NULL 
    && mapping->i_mmap_shared == NULL.

Besides, flush_cache_page in mm/memory.c:{break_cow,do_wp_page}	are
redundant for SH-4.  SH-4's cache is direct mapped, virtually indexed
phisically tagged, so we don't need to flush anything.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Cache issues
  2001-06-29 14:18             ` NIIBE Yutaka
@ 2001-07-02 22:47               ` David S. Miller
  0 siblings, 0 replies; 23+ messages in thread
From: David S. Miller @ 2001-07-02 22:47 UTC (permalink / raw)
  To: NIIBE Yutaka; +Cc: lkml


NIIBE Yutaka writes:
 > I've looked thorough all flush_page_to_ram and flush_icache_page calls.
 > If the architecture support follows the rule of Documentation/cachetlb.txt, 
 > I think that all the occurrences of flush_page_to_ram and
 > flush_icache_page are (almost) bogus now.

Unfortunately several ports still use the old calls and do not make
use of flush_dcache_page etc.

So these older intefaces may not be arbitrarily removed in 2.4.x

The goal is to eventually remove them, this is true.  But it must
be done in 2.5.x at the earliest.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Cache issues
  2001-07-02 11:23             ` Cache issues NIIBE Yutaka
@ 2001-07-03  0:04               ` NIIBE Yutaka
  0 siblings, 0 replies; 23+ messages in thread
From: NIIBE Yutaka @ 2001-07-03  0:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: lkml

David S. Miller wrote:
 > So these older intefaces may not be arbitrarily removed in 2.4.x
 > 
 > The goal is to eventually remove them, this is true.  But it must
 > be done in 2.5.x at the earliest.

Yes, agreed.

I mean, it's almost ready for port maintainers to migrate new
interface (only two issues remain:
include/linux/highmem.h:memclear_highpage_flush and ptrace), as soon
as 2.5.x starts. 

For the time being, I let SH-4 port have null definitions of
flush_page_to_ram and flush_icache_page, and see how things are going.
-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2001-07-03  0:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-12 23:21 [PATCH] swapin flush cache bug Marcelo Tosatti
2001-02-13  9:50 ` Russell King
2001-02-13 10:53 ` NIIBE Yutaka
2001-02-13 11:16   ` Russell King
2001-02-13 11:26   ` Alan Cox
2001-02-13 23:50   ` NIIBE Yutaka
2001-02-14  2:08   ` NIIBE Yutaka
2001-02-14 10:12     ` Marcelo Tosatti
2001-06-27  0:51     ` NIIBE Yutaka
2001-06-27 10:11       ` Marcelo Tosatti
2001-06-28  0:42         ` David S. Miller
2001-06-28  0:07       ` NIIBE Yutaka
2001-06-27 22:41         ` Marcelo Tosatti
2001-06-28  0:23         ` Stephen C. Tweedie
2001-06-28  0:47           ` David S. Miller
2001-06-28  1:10             ` David S. Miller
2001-06-29 14:18             ` NIIBE Yutaka
2001-07-02 22:47               ` Cache issues David S. Miller
2001-06-28  0:41         ` [PATCH] swapin flush cache bug NIIBE Yutaka
2001-06-28  1:04           ` NIIBE Yutaka
2001-07-02 11:23             ` Cache issues NIIBE Yutaka
2001-07-03  0:04               ` NIIBE Yutaka
2001-06-28  0:46         ` [PATCH] swapin flush cache bug David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).