All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] aio: fix kernel bug when page is  temporally busy
@ 2007-02-09  4:29 Ananiev, Leonid I
  2007-02-09  4:35 ` Andrew Morton
  2007-02-09  9:54 ` Jiri Kosina
  0 siblings, 2 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-09  4:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-aio, Andrew Morton, Zach Brown, suparna, Chris Mason,
	Badari Pulavarty

>From Leonid Ananiev

Fix "Kernel BUG at fs/aio.c:509". Return EIOCBRETRY but not  EIO if page
is busy.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>

"Kernel BUG at fs/aio.c:509"
http://marc.theaimsgroup.com/?l=linux-kernel&m=117031052517746&w=2 
is present in 2.6.20 as well as 2.6.19.
The investigation shows that the bug's panic occurs when aio_complete()
is called with argument ret=EIO which is returned from
invalidate_inode_pages2_range() because the page is temporally busy.
Call Trace:
 [<ffffffff802573c3>] invalidate_inode_pages2_range+0x236/0x26b
 [<ffffffff802af77b>] ext3_direct_IO+0x16c/0x19e
 [<ffffffff802adc1d>] ext3_get_block+0x0/0xe2
 [<ffffffff802518c0>] generic_file_direct_IO+0xb9/0xcf
 [<ffffffff8025193d>] generic_file_direct_write+0x67/0x10e
 [<ffffffff80252308>] __generic_file_aio_write_nolock+0x2d6/0x3fe
 [<ffffffff802082c0>] __switch_to+0x26f/0x27e
 [<ffffffff8047b8e9>] thread_return+0x0/0xd8
 [<ffffffff80252497>] generic_file_aio_write+0x67/0xc7
 [<ffffffff802ab2cc>] ext3_file_write+0x0/0x8f
 [<ffffffff802ab2e2>] ext3_file_write+0x16/0x8f
 [<ffffffff802ab2cc>] ext3_file_write+0x0/0x8f
 [<ffffffff80283f61>] aio_rw_vect_retry+0x72/0x14f
 [<ffffffff80284ad9>] aio_run_iocb+0xe6/0x187
 [<ffffffff802853e2>] io_submit_one+0x296/0x2e3
 [<ffffffff80285979>] sys_io_submit+0x9b/0x108
 [<ffffffff80229b49>] default_wake_function+0x0/0xe
 [<ffffffff8020983e>] system_call+0x7e/0x83

As a result aio_complete() is called while it should not be.
When IO is finished  aio_complete() is called once more
and iocb->ki_users becomes negative.

Taking into account aio.c lines in aio_run_iocb()
        if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
                aio_complete(iocb, ret, 0);

proposed patch makes function invalidate_inode_pages2_range()
to return EIOCBRETRY but not  EIO if page is busy.
The patch is tested on 2.6.20.

--- linux-2.6.20/mm/truncate.c  2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20p/mm/truncate.c 2007-02-08 11:38:11.000000000 -0800
@@ -366,7 +366,7 @@ static int do_launder_page(struct addres
  * Any pages which are found to be mapped into pagetables are unmapped
prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2_range(struct address_space *mapping,
                                  pgoff_t start, pgoff_t end)
@@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
                        }
                        ret = do_launder_page(mapping, page);
                        if (ret == 0 &&
!invalidate_complete_page2(mapping, page))
-                               ret = -EIO;
+                               ret = -EIOCBRETRY;
                        unlock_page(page);
                }
                pagevec_release(&pvec);
@@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  * Any pages which are found to be mapped into pagetables are unmapped
prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2(struct address_space *mapping)
 {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  4:29 [PATCH] aio: fix kernel bug when page is temporally busy Ananiev, Leonid I
@ 2007-02-09  4:35 ` Andrew Morton
  2007-02-09  5:41   ` Ananiev, Leonid I
  2007-02-09  9:54 ` Jiri Kosina
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2007-02-09  4:35 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: linux-kernel, linux-aio, Zach Brown, suparna, Chris Mason,
	Badari Pulavarty

On Fri, 9 Feb 2007 07:29:31 +0300 "Ananiev, Leonid I" <leonid.i.ananiev@intel.com> wrote:

> >From Leonid Ananiev
> 
> Fix "Kernel BUG at fs/aio.c:509". Return EIOCBRETRY but not  EIO if page
> is busy.
> 
> Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
> 
> "Kernel BUG at fs/aio.c:509"
> http://marc.theaimsgroup.com/?l=linux-kernel&m=117031052517746&w=2 
> is present in 2.6.20 as well as 2.6.19.
> The investigation shows that the bug's panic occurs when aio_complete()
> is called with argument ret=EIO which is returned from
> invalidate_inode_pages2_range() because the page is temporally busy.
> Call Trace:
>  [<ffffffff802573c3>] invalidate_inode_pages2_range+0x236/0x26b
>  [<ffffffff802af77b>] ext3_direct_IO+0x16c/0x19e
>  [<ffffffff802adc1d>] ext3_get_block+0x0/0xe2
>  [<ffffffff802518c0>] generic_file_direct_IO+0xb9/0xcf
>  [<ffffffff8025193d>] generic_file_direct_write+0x67/0x10e
>  [<ffffffff80252308>] __generic_file_aio_write_nolock+0x2d6/0x3fe
>  [<ffffffff802082c0>] __switch_to+0x26f/0x27e
>  [<ffffffff8047b8e9>] thread_return+0x0/0xd8
>  [<ffffffff80252497>] generic_file_aio_write+0x67/0xc7
>  [<ffffffff802ab2cc>] ext3_file_write+0x0/0x8f
>  [<ffffffff802ab2e2>] ext3_file_write+0x16/0x8f
>  [<ffffffff802ab2cc>] ext3_file_write+0x0/0x8f
>  [<ffffffff80283f61>] aio_rw_vect_retry+0x72/0x14f
>  [<ffffffff80284ad9>] aio_run_iocb+0xe6/0x187
>  [<ffffffff802853e2>] io_submit_one+0x296/0x2e3
>  [<ffffffff80285979>] sys_io_submit+0x9b/0x108
>  [<ffffffff80229b49>] default_wake_function+0x0/0xe
>  [<ffffffff8020983e>] system_call+0x7e/0x83
> 
> As a result aio_complete() is called while it should not be.
> When IO is finished  aio_complete() is called once more
> and iocb->ki_users becomes negative.
> 
> Taking into account aio.c lines in aio_run_iocb()
>         if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
>                 aio_complete(iocb, ret, 0);
> 
> proposed patch makes function invalidate_inode_pages2_range()
> to return EIOCBRETRY but not  EIO if page is busy.
> The patch is tested on 2.6.20.
> 
> --- linux-2.6.20/mm/truncate.c  2007-02-04 10:44:54.000000000 -0800
> +++ linux-2.6.20p/mm/truncate.c 2007-02-08 11:38:11.000000000 -0800
> @@ -366,7 +366,7 @@ static int do_launder_page(struct addres
>   * Any pages which are found to be mapped into pagetables are unmapped
> prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2_range(struct address_space *mapping,
>                                   pgoff_t start, pgoff_t end)
> @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
>                         }
>                         ret = do_launder_page(mapping, page);
>                         if (ret == 0 &&
> !invalidate_complete_page2(mapping, page))
> -                               ret = -EIO;
> +                               ret = -EIOCBRETRY;
>                         unlock_page(page);
>                 }
>                 pagevec_release(&pvec);
> @@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
>   * Any pages which are found to be mapped into pagetables are unmapped
> prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2(struct address_space *mapping)
>  {

The patch is wildly wordwrapped and has had it tabs replaced with spaces.

invalidate_inode_pages2() has other callers.  I suspect with this change
we'll end up leaking EIOCBRETRY back to userspace.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  4:35 ` Andrew Morton
@ 2007-02-09  5:41   ` Ananiev, Leonid I
  2007-02-09  5:52     ` Andrew Morton
  2007-02-09  7:16     ` Suparna Bhattacharya
  0 siblings, 2 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-09  5:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-aio, Zach Brown, suparna, Chris Mason,
	Badari Pulavarty


> invalidate_inode_pages2() has other callers.  I suspect with this
change
> we'll end up leaking EIOCBRETRY back to userspace.

EIOCBRETRY is used and caught already in do_sync_read() and
do_sync_readv_writev().

Below fixed patch against kernel 2.6.20.

>From Leonid Ananiev

Fix kernel bug when IO page is temporally busy:
invalidate_inode_pages2() returns EIOCBRETRY but not  EIO..

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
---
--- linux-2.6.20/mm/truncate.c	2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20p/mm/truncate.c	2007-02-08 11:38:11.000000000 -0800
@@ -366,7 +366,7 @@ static int do_launder_page(struct addres
  * Any pages which are found to be mapped into pagetables are unmapped
prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2_range(struct address_space *mapping,
 				  pgoff_t start, pgoff_t end)
@@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
 			}
 			ret = do_launder_page(mapping, page);
 			if (ret == 0 &&
!invalidate_complete_page2(mapping, page))
-				ret = -EIO;
+				ret = -EIOCBRETRY;
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
@@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  * Any pages which are found to be mapped into pagetables are unmapped
prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2(struct address_space *mapping)
 {

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  5:41   ` Ananiev, Leonid I
@ 2007-02-09  5:52     ` Andrew Morton
  2007-02-12 22:52       ` Ananiev, Leonid I
  2007-02-12 23:21       ` Ananiev, Leonid I
  2007-02-09  7:16     ` Suparna Bhattacharya
  1 sibling, 2 replies; 31+ messages in thread
From: Andrew Morton @ 2007-02-09  5:52 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: linux-kernel, linux-aio, Zach Brown, suparna, Chris Mason,
	Badari Pulavarty

On Fri, 9 Feb 2007 08:41:41 +0300 "Ananiev, Leonid I" <leonid.i.ananiev@intel.com> wrote:

> 
> > invalidate_inode_pages2() has other callers.  I suspect with this
> change
> > we'll end up leaking EIOCBRETRY back to userspace.
> 
> EIOCBRETRY is used and caught already in do_sync_read() and
> do_sync_readv_writev().

To pick one example:

	nfs_follow_link
	->nfs_revalidate_mapping_nolock
	  ->nfs_invalidate_mapping_nolock
	    ->invalidate_inode_pages2

so that, I assume, affects open(), unlink(), etc.


> Below fixed patch against kernel 2.6.20.

The tab->spaces issue is fixed, but it's still all wordwrapped.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  5:41   ` Ananiev, Leonid I
  2007-02-09  5:52     ` Andrew Morton
@ 2007-02-09  7:16     ` Suparna Bhattacharya
  2007-02-09  9:52       ` Ananiev, Leonid I
  1 sibling, 1 reply; 31+ messages in thread
From: Suparna Bhattacharya @ 2007-02-09  7:16 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: Andrew Morton, linux-kernel, linux-aio, Zach Brown, Chris Mason,
	Badari Pulavarty

On Fri, Feb 09, 2007 at 08:41:41AM +0300, Ananiev, Leonid I wrote:
> > invalidate_inode_pages2() has other callers.  I suspect with this
> change
> > we'll end up leaking EIOCBRETRY back to userspace.
>  
> EIOCBRETRY is used and caught already in do_sync_read() and
> do_sync_readv_writev().
>  
> Below fixed patch against kernel 2.6.20.

I agree with Andrew, this doesn't look like the right solution.

EIOCBRETRY isn't the same as EAGAIN or "try again later". EIOCBRETRY means
"I have queued up an action to notify (wakeup) the callback to retry the
operation once data is ready".

Regards
Suparna

>  
> >From Leonid Ananiev
>  
> Fix kernel bug when IO page is temporally busy:
> invalidate_inode_pages2() returns EIOCBRETRY but not  EIO..
>  
> Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
> ---
> --- linux-2.6.20/mm/truncate.c	2007-02-04 10:44:54.000000000 -0800
> +++ linux-2.6.20p/mm/truncate.c	2007-02-08 11:38:11.000000000 -0800
> @@ -366,7 +366,7 @@ static int do_launder_page(struct addres
>   * Any pages which are found to be mapped into pagetables are unmapped
> prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2_range(struct address_space *mapping,
>  				  pgoff_t start, pgoff_t end)
> @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
>  			}
>  			ret = do_launder_page(mapping, page);
>  			if (ret == 0 &&
> !invalidate_complete_page2(mapping, page))
> -				ret = -EIO;
> +				ret = -EIOCBRETRY;
>  			unlock_page(page);
>  		}
>  		pagevec_release(&pvec);
> @@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
>   * Any pages which are found to be mapped into pagetables are unmapped
> prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2(struct address_space *mapping)
>  {
>  
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  7:16     ` Suparna Bhattacharya
@ 2007-02-09  9:52       ` Ananiev, Leonid I
  2007-02-09 10:11         ` Jiri Kosina
  2007-02-10 18:05         ` Ken Chen
  0 siblings, 2 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-09  9:52 UTC (permalink / raw)
  To: suparna
  Cc: Andrew Morton, linux-kernel, linux-aio, Zach Brown, Chris Mason,
	Badari Pulavarty

I have used EIOCBRETRY in the patch to minimize source code modification
only.
It is notable that EIOCBRETRY is never set in kernel, but tested only.

EAGAIN - means that you may want to recall the same function with the
same argument. But user have not to recall it just now. He may want to
free or process some other his resources before recall for success.
EIOCBRETRY - means that you may not recall the same function with the
same argument because an argument has no sense already in cuurnt
context. The result is in processing already. You may want to recall
much higher level or external function to get the common result. It is
means that inernal caller should ignore or let through errno to its
caller but do not recall the function which has returned this errno.

A lot of errno's have different meaning in different functions or
contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant
set.

Leonid

-----Original Message-----
From: Suparna Bhattacharya [mailto:suparna@in.ibm.com] 
Sent: Friday, February 09, 2007 10:16 AM
To: Ananiev, Leonid I
Cc: Andrew Morton; linux-kernel@vger.kernel.org; linux-aio; Zach Brown;
Chris Mason; Badari Pulavarty
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy

On Fri, Feb 09, 2007 at 08:41:41AM +0300, Ananiev, Leonid I wrote:
> > invalidate_inode_pages2() has other callers.  I suspect with this
> change
> > we'll end up leaking EIOCBRETRY back to userspace.
>  
> EIOCBRETRY is used and caught already in do_sync_read() and
> do_sync_readv_writev().
>  
> Below fixed patch against kernel 2.6.20.

I agree with Andrew, this doesn't look like the right solution.

EIOCBRETRY isn't the same as EAGAIN or "try again later". EIOCBRETRY
means
"I have queued up an action to notify (wakeup) the callback to retry the
operation once data is ready".

Regards
Suparna

>  
> >From Leonid Ananiev
>  
> Fix kernel bug when IO page is temporally busy:
> invalidate_inode_pages2() returns EIOCBRETRY but not  EIO..
>  
> Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
> ---
> --- linux-2.6.20/mm/truncate.c	2007-02-04 10:44:54.000000000
-0800
> +++ linux-2.6.20p/mm/truncate.c	2007-02-08 11:38:11.000000000
-0800
> @@ -366,7 +366,7 @@ static int do_launder_page(struct addres
>   * Any pages which are found to be mapped into pagetables are
unmapped
> prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2_range(struct address_space *mapping,
>  				  pgoff_t start, pgoff_t end)
> @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
>  			}
>  			ret = do_launder_page(mapping, page);
>  			if (ret == 0 &&
> !invalidate_complete_page2(mapping, page))
> -				ret = -EIO;
> +				ret = -EIOCBRETRY;
>  			unlock_page(page);
>  		}
>  		pagevec_release(&pvec);
> @@ -440,7 +440,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
>   * Any pages which are found to be mapped into pagetables are
unmapped
> prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2(struct address_space *mapping)
>  {
>  
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  4:29 [PATCH] aio: fix kernel bug when page is temporally busy Ananiev, Leonid I
  2007-02-09  4:35 ` Andrew Morton
@ 2007-02-09  9:54 ` Jiri Kosina
  2007-02-09 10:14   ` Andrew Morton
  1 sibling, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2007-02-09  9:54 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: linux-kernel, linux-aio, Andrew Morton, Zach Brown, suparna,
	Chris Mason, Badari Pulavarty, Jan Kara

On Fri, 9 Feb 2007, Ananiev, Leonid I wrote:

> Fix "Kernel BUG at fs/aio.c:509". Return EIOCBRETRY but not  EIO if page
> is busy.

I am currently also hunting in this area, and I think there is something 
strange in generic_file_aio_read(). This seems strange:

	for (seg = 0; seg < nr_segs; seg++) {
		read_descriptor_t desc;

		desc.written = 0;
		desc.arg.buf = iov[seg].iov_base;
		desc.count = iov[seg].iov_len;
		if (desc.count == 0)
			continue;
		desc.error = 0;
                        
		do_generic_file_read(filp,ppos,&desc,file_read_actor);
		retval += desc.written;
		if (desc.error) {
			retval = retval ?: desc.error;
			break;
		}
	}

So this code propagates desc.error back to caller _only if_ all of the 
previous segments had desc.written == 0. Which is bogus - we need to 
propagate -EIOCBRETRY as soon as any of the do_generic_file_read() 
returned it, so that the caller is aware of requests being queued.

I think something like this would be appropriate

[PATCH] AIO: handle return value from do_generic_file_read() properly

generic_file_aio_read() doesn't always propagate error return value from 
do_generic_file_read(). Namely handling of -EIOCBRETRY is important, because
aio_run_iocb() relies on this return value being always properly propagated
to it.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

--- 

 mm/filemap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 8332c77..5ce76ce 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
 			do_generic_file_read(filp,ppos,&desc,file_read_actor);
 			retval += desc.written;
 			if (desc.error) {
-				retval = retval ?: desc.error;
+				retval = desc.error;
 				break;
 			}
 		}

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  9:52       ` Ananiev, Leonid I
@ 2007-02-09 10:11         ` Jiri Kosina
  2007-02-10 18:05         ` Ken Chen
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Kosina @ 2007-02-09 10:11 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: suparna, Andrew Morton, linux-kernel, linux-aio, Zach Brown,
	Chris Mason, Badari Pulavarty, Jan Kara

On Fri, 9 Feb 2007, Ananiev, Leonid I wrote:

> I have used EIOCBRETRY in the patch to minimize source code modification 
> only. It is notable that EIOCBRETRY is never set in kernel, but tested 
> only.

There is indeed something strange in aio in 2.6.20 (and maybe older) - for 
example do_generic_mapping_read() gets stuck inside lock_page() in the 
situation when the page is not up-to-date, instead of returning properl 
error to aio_run_cb().

Or does it work in some other way and I get it wrong?

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  9:54 ` Jiri Kosina
@ 2007-02-09 10:14   ` Andrew Morton
  2007-02-09 10:40     ` Jiri Kosina
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2007-02-09 10:14 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Ananiev, Leonid I, linux-kernel, linux-aio, Zach Brown, suparna,
	Chris Mason, Badari Pulavarty, Jan Kara

On Fri, 9 Feb 2007 10:54:36 +0100 (CET) Jiri Kosina <jikos@jikos.cz> wrote:

> @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
>  			do_generic_file_read(filp,ppos,&desc,file_read_actor);
>  			retval += desc.written;
>  			if (desc.error) {
> -				retval = retval ?: desc.error;
> +				retval = desc.error;
>  				break;
>  			}

Nope.  On error the read() syscall must return the number of bytes which
were successfully read.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09 10:14   ` Andrew Morton
@ 2007-02-09 10:40     ` Jiri Kosina
  2007-02-09 11:05       ` Suparna Bhattacharya
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2007-02-09 10:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ananiev, Leonid I, linux-kernel, linux-aio, Zach Brown, suparna,
	Chris Mason, Badari Pulavarty, Jan Kara

On Fri, 9 Feb 2007, Andrew Morton wrote:

> > @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
> >  			do_generic_file_read(filp,ppos,&desc,file_read_actor);
> >  			retval += desc.written;
> >  			if (desc.error) {
> > -				retval = retval ?: desc.error;
> > +				retval = desc.error;
> >  				break;
> >  			}
> Nope.  On error the read() syscall must return the number of bytes which
> were successfully read.

You are right. 

In current mainline this even is not a problem, because noone seems to be 
setting the error values to EIOCBRETRY. But it still stinks a bit, as 
there are tests for EIOCBRETRY.

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09 10:40     ` Jiri Kosina
@ 2007-02-09 11:05       ` Suparna Bhattacharya
  2007-02-09 11:18         ` Ananiev, Leonid I
  2007-02-09 17:02         ` Zach Brown
  0 siblings, 2 replies; 31+ messages in thread
From: Suparna Bhattacharya @ 2007-02-09 11:05 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andrew Morton, Ananiev, Leonid I, linux-kernel, linux-aio,
	Zach Brown, Chris Mason, Badari Pulavarty, Jan Kara

On Fri, Feb 09, 2007 at 11:40:27AM +0100, Jiri Kosina wrote:
> On Fri, 9 Feb 2007, Andrew Morton wrote:
>  
> > > @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
> > >  			do_generic_file_read(filp,ppos,&desc,file_read_actor);
> > >  			retval += desc.written;
> > >  			if (desc.error) {
> > > -				retval = retval ?: desc.error;
> > > +				retval = desc.error;
> > >  				break;
> > >  			}
> > Nope.  On error the read() syscall must return the number of bytes which
> > were successfully read.
>  
> You are right. 
>  
> In current mainline this even is not a problem, because noone seems to be 
> setting the error values to EIOCBRETRY. But it still stinks a bit, as 
> there are tests for EIOCBRETRY.

We'd want retries to act only on the remaining bytes which haven't been
transferred as yet, so even in the EIOCBRETRY case the right thing to do
is to return the number of bytes that were successfully read without
blocking. The high level AIO code (see aio_rw_vect_rety) has the ability
to handle this.

So this is still correct.

Is there a real bug that you are seeing here ?

Regards
Suparna

>  
> -- 
> Jiri Kosina
>  
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09 11:05       ` Suparna Bhattacharya
@ 2007-02-09 11:18         ` Ananiev, Leonid I
  2007-02-09 17:02         ` Zach Brown
  1 sibling, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-09 11:18 UTC (permalink / raw)
  To: suparna, Jiri Kosina
  Cc: Andrew Morton, linux-kernel, linux-aio, Zach Brown, Chris Mason,
	Badari Pulavarty, Jan Kara

>>noone seems to be setting the error values to EIOCBRETRY
> We'd want retries to act only on the remaining bytes which haven't
been
> transferred as yet, so even in the EIOCBRETRY...

OK. But any variable is not set to EIOCBRETRY  in 2.6.14, 2.6.19,
2.6.20.
Is it correct?

Leonid 

-----Original Message-----
From: Suparna Bhattacharya [mailto:suparna@in.ibm.com] 
Sent: Friday, February 09, 2007 2:06 PM
To: Jiri Kosina
Cc: Andrew Morton; Ananiev, Leonid I; linux-kernel@vger.kernel.org;
linux-aio; Zach Brown; Chris Mason; Badari Pulavarty; Jan Kara
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy

On Fri, Feb 09, 2007 at 11:40:27AM +0100, Jiri Kosina wrote:
> On Fri, 9 Feb 2007, Andrew Morton wrote:
>  
> > > @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb,
const struct iovec *iov,
> > >
do_generic_file_read(filp,ppos,&desc,file_read_actor);
> > >  			retval += desc.written;
> > >  			if (desc.error) {
> > > -				retval = retval ?: desc.error;
> > > +				retval = desc.error;
> > >  				break;
> > >  			}
> > Nope.  On error the read() syscall must return the number of bytes
which
> > were successfully read.
>  
> You are right. 
>  
> In current mainline this even is not a problem, because noone seems to
be 
> setting the error values to EIOCBRETRY. But it still stinks a bit, as 
> there are tests for EIOCBRETRY.

We'd want retries to act only on the remaining bytes which haven't been
transferred as yet, so even in the EIOCBRETRY case the right thing to do
is to return the number of bytes that were successfully read without
blocking. The high level AIO code (see aio_rw_vect_rety) has the ability
to handle this.

So this is still correct.

Is there a real bug that you are seeing here ?

Regards
Suparna

>  
> -- 
> Jiri Kosina
>  
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09 11:05       ` Suparna Bhattacharya
  2007-02-09 11:18         ` Ananiev, Leonid I
@ 2007-02-09 17:02         ` Zach Brown
  1 sibling, 0 replies; 31+ messages in thread
From: Zach Brown @ 2007-02-09 17:02 UTC (permalink / raw)
  To: suparna
  Cc: Jiri Kosina, Andrew Morton, Ananiev, Leonid I, linux-kernel,
	linux-aio, Chris Mason, Badari Pulavarty, Jan Kara


On Feb 9, 2007, at 6:05 AM, Suparna Bhattacharya wrote:

> On Fri, Feb 09, 2007 at 11:40:27AM +0100, Jiri Kosina wrote:
>> On Fri, 9 Feb 2007, Andrew Morton wrote:
>>
>>>> @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb,  
>>>> const struct iovec *iov,
>>>>  			do_generic_file_read(filp,ppos,&desc,file_read_actor);
>>>>  			retval += desc.written;
>>>>  			if (desc.error) {
>>>> -				retval = retval ?: desc.error;
>>>> +				retval = desc.error;

I was worried about this too.

> blocking. The high level AIO code (see aio_rw_vect_rety) has the  
> ability
> to handle this.

I had missed this, and yeah, that's some level of comfort.

But I'm not convinced we can guarantee that's safe.  The positive  
return code that aio_rw_vect_retry() sees is telling it that some IO  
has completed and, arguably, that no more IO is in flight.  If we  
return partial progress from generic_file_aio_read() while we have an  
iocb in a wait queue then we are adding yet another invariant.  That  
while an iocb is pending from a previous call down the call chain, we  
can't return a non-aio negative error.  Doing so would cause fs/aio.c  
to complete while there is still an iocb on a wait queue from a  
previous retry attempt.  Right?

I also noticed something just now while poking around these paths to  
see if I could get the start of generic_file_aio_read() to fail when  
it had previously succeeded.  What's to stop another task from racing  
to set O_DIRECT between retries?

That sounds like a pretty hilarious way to get a read retry to fail  
due to buffer misalignment while a previously buffered instance of it  
is still in flight.  Hi-larious.

In thinking about this a discussing it with Chris a bit, I wonder if  
the right fix isn't to refuse changing O_DIRECT via setfl() once any  
IO paths have started on the filp.  Something like:

	filp->frozen_flags |= O_DIRECT

at the start of paths and check it in setfl()?

Are we similarly worried about O_APPEND?

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-09  9:52       ` Ananiev, Leonid I
  2007-02-09 10:11         ` Jiri Kosina
@ 2007-02-10 18:05         ` Ken Chen
  2007-02-10 18:17           ` Ananiev, Leonid I
                             ` (3 more replies)
  1 sibling, 4 replies; 31+ messages in thread
From: Ken Chen @ 2007-02-10 18:05 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: suparna, Andrew Morton, linux-kernel, linux-aio, Zach Brown,
	Chris Mason, Badari Pulavarty

On 2/9/07, Ananiev, Leonid I <leonid.i.ananiev@intel.com> wrote:
> I have used EIOCBRETRY in the patch to minimize source code modification
> only.
> [...]
> A lot of errno's have different meaning in different functions or
> contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant
> set.

I also think the original patch is wrong.  It might shut up kernel
panic by eliminate double calls to aio_complete(), but it will
silently introduce data corruption.

If invalidate_inode_pages2_range() says it can not invalidate pages,
while dio to the same file offset range is in flight, something is
really wrong there.  In generic_file_direct_IO, the function
explicitly flushes all dirty pages and wait on them before submits
DIO.

So any error value returned from invalidate_inode_pages2_range() has
to be taken seriously in the direct IO submit path instead of dropping
it to the floor.

- Ken

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-10 18:05         ` Ken Chen
@ 2007-02-10 18:17           ` Ananiev, Leonid I
  2007-02-10 18:27           ` Ananiev, Leonid I
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-10 18:17 UTC (permalink / raw)
  To: Ken Chen
  Cc: suparna, Andrew Morton, linux-kernel, linux-aio, Zach Brown,
	Chris Mason, Badari Pulavarty

But if page is busy as invalidate_inode_pages2_range() says because of
bh_count>0 then aio_complet() is not called from aio_run_iocb() and next
retry() will get the iocb result.

Leonid

-----Original Message-----
From: Ken Chen [mailto:kenchen@google.com] 
Sent: Saturday, February 10, 2007 9:05 PM
To: Ananiev, Leonid I
Cc: suparna@in.ibm.com; Andrew Morton; linux-kernel@vger.kernel.org;
linux-aio; Zach Brown; Chris Mason; Badari Pulavarty
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy

On 2/9/07, Ananiev, Leonid I <leonid.i.ananiev@intel.com> wrote:
> I have used EIOCBRETRY in the patch to minimize source code
modification
> only.
> [...]
> A lot of errno's have different meaning in different functions or
> contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant
> set.

I also think the original patch is wrong.  It might shut up kernel
panic by eliminate double calls to aio_complete(), but it will
silently introduce data corruption.

If invalidate_inode_pages2_range() says it can not invalidate pages,
while dio to the same file offset range is in flight, something is
really wrong there.  In generic_file_direct_IO, the function
explicitly flushes all dirty pages and wait on them before submits
DIO.

So any error value returned from invalidate_inode_pages2_range() has
to be taken seriously in the direct IO submit path instead of dropping
it to the floor.

- Ken

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-10 18:05         ` Ken Chen
  2007-02-10 18:17           ` Ananiev, Leonid I
@ 2007-02-10 18:27           ` Ananiev, Leonid I
  2007-02-10 21:57           ` Ananiev, Leonid I
  2007-02-15  9:16           ` Ananiev, Leonid I
  3 siblings, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-10 18:27 UTC (permalink / raw)
  To: Ken Chen
  Cc: suparna, Andrew Morton, linux-kernel, linux-aio, Zach Brown,
	Chris Mason, Badari Pulavarty

> If invalidate_inode_pages2_range() says it can not invalidate pages,
> while dio to the same file offset range is in flight, something is
> really wrong there.

If invalidate_inode_pages2_range() says it can not invalidate pages
It means that soft_irq does completing now on other cpu.
Next retry() will see a IO well completed.
Leonid

-----Original Message-----
From: Ken Chen [mailto:kenchen@google.com] 
Sent: Saturday, February 10, 2007 9:05 PM
To: Ananiev, Leonid I
Cc: suparna@in.ibm.com; Andrew Morton; linux-kernel@vger.kernel.org;
linux-aio; Zach Brown; Chris Mason; Badari Pulavarty
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy

On 2/9/07, Ananiev, Leonid I <leonid.i.ananiev@intel.com> wrote:
> I have used EIOCBRETRY in the patch to minimize source code
modification
> only.
> [...]
> A lot of errno's have different meaning in different functions or
> contexts. EAGAIN could be used instated of EIOCBRETRY for irredundant
> set.

I also think the original patch is wrong.  It might shut up kernel
panic by eliminate double calls to aio_complete(), but it will
silently introduce data corruption.

If invalidate_inode_pages2_range() says it can not invalidate pages,
while dio to the same file offset range is in flight, something is
really wrong there.  In generic_file_direct_IO, the function
explicitly flushes all dirty pages and wait on them before submits
DIO.

So any error value returned from invalidate_inode_pages2_range() has
to be taken seriously in the direct IO submit path instead of dropping
it to the floor.

- Ken

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-10 18:05         ` Ken Chen
  2007-02-10 18:17           ` Ananiev, Leonid I
  2007-02-10 18:27           ` Ananiev, Leonid I
@ 2007-02-10 21:57           ` Ananiev, Leonid I
  2007-02-15  9:16           ` Ananiev, Leonid I
  3 siblings, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-10 21:57 UTC (permalink / raw)
  To: Ken Chen
  Cc: suparna, Andrew Morton, linux-kernel, linux-aio, Zach Brown,
	Chris Mason, Badari Pulavarty

Ken Chen writes:
> So any error value returned from invalidate_inode_pages2_range() has
> to be taken seriously in the direct IO submit path instead of dropping
> it to the floor.

If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch 
"aio: fix kernel bug when page is temporally busy"
serts then do_sync_read/write() will not drop IO submit but will retry
it:
        for (;;) {
                ret = filp->f_op->aio_read(&kiocb, &iov, 1,
kiocb.ki_pos);
                if (ret != -EIOCBRETRY)
                        break;
                wait_on_retry_sync_kiocb(&kiocb);
        }

And do_sync_read/write() will not return EIO if page is busy as it does
now, befor patching.
Curretly do_sync_read/write() tests for EIOCBRETRY. But EIOCBRETRY is
not set ever.

Leonid

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  5:52     ` Andrew Morton
@ 2007-02-12 22:52       ` Ananiev, Leonid I
  2007-02-12 23:21       ` Ananiev, Leonid I
  1 sibling, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-12 22:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-aio

Andrew,
You wrote on Friday, February 09, 2007 8:53 AM
> invalidate_inode_pages2() has other callers.  I suspect with this
change
> we'll end up leaking EIOCBRETRY back to userspace.

The path is modified so that invalidate_inode_pages2() returns EIO as
earlier.
could you consider modified patch
The patch against 2.6.20.

Long story: The kernel panic is happening after hours of AIO benchmark
running in mcp.
First of all it was found that the kernel panic happens if IO error is
reported.
But later it was found that the actual reason is not in real IO error
but in a busy page.
While the current CPU tests if IO is completed it happens that another
CPU
at the same time processes IO completion in soft_irq.
The considered buffer page is busy now by second CPU and
invalidate_inode_pages2_range() returns EIO in this case.
First CPU reports EIO to caller ; completes IO and frees control block
in aio_complete().
Second CPU frees the same control block once more. 
The patch makes invalidate_inode_pages2_range() to return EIOCBRETRY
which is tested just in aio_run_iocb(). It retries IO competition check
if EIOCBRETRY is got.
EIOCBRETRY is tested in do_sync_read/write() functions as well.
And direct IO competition will be retested "instead of dropping it to
the floor".

>From Leonid Ananiev

Fix kernel bug when IO page is temporally busy:
invalidate_inode_pages2_range() returns EIOCBRETRY but not  EIO.
invalidate_inode_pages2() returns EIO as earlier.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
---
--- linux-2.6.20/mm/truncate.c	2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20p/mm/truncate.c	2007-02-08 22:56:52.000000000 -0800
@@ -366,7 +366,7 @@ static int do_launder_page(struct addres
  * Any pages which are found to be mapped into pagetables are unmapped
prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2_range(struct address_space *mapping,
 				  pgoff_t start, pgoff_t end)
@@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
 			}
 			ret = do_launder_page(mapping, page);
 			if (ret == 0 &&
!invalidate_complete_page2(mapping, page))
-				ret = -EIO;
+				ret = -EIOCBRETRY;
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
@@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  */
 int invalidate_inode_pages2(struct address_space *mapping)
 {
-	return invalidate_inode_pages2_range(mapping, 0, -1);
+	int ret =  invalidate_inode_pages2_range(mapping, 0, -1);
+	return (ret < 0)?-EIO:ret;
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is  temporally busy
  2007-02-09  5:52     ` Andrew Morton
  2007-02-12 22:52       ` Ananiev, Leonid I
@ 2007-02-12 23:21       ` Ananiev, Leonid I
  1 sibling, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-12 23:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-aio

Andrew,
You wrote on Friday, February 09, 2007 8:53 AM
> invalidate_inode_pages2() has other callers.  I suspect with this
change
> we'll end up leaking EIOCBRETRY back to userspace.

The path is modified so that invalidate_inode_pages2() returns EIO as
earlier.
could you consider modified patch
The patch against 2.6.20.

Long story: The kernel panic is happening after hours of AIO benchmark
running in mcp.
First of all it was found that the kernel panic happens if IO error is
reported.
But later it was found that the actual reason is not in real IO error
but in a busy page.
While the current CPU tests if IO is completed it happens that another
CPU
at the same time processes IO completion in soft_irq.
The considered buffer page is busy now by second CPU and
invalidate_inode_pages2_range() returns EIO in this case.
First CPU reports EIO to caller ; completes IO and frees control block
in aio_complete().
Second CPU frees the same control block once more. 
The patch makes invalidate_inode_pages2_range() to return EIOCBRETRY
which is tested just in aio_run_iocb(). It retries IO competition check
if EIOCBRETRY is got.
EIOCBRETRY is tested in do_sync_read/write() functions as well.
And direct IO competition will be retested "instead of dropping it to
the floor".

>From Leonid Ananiev

Fix kernel bug when IO page is temporally busy:
invalidate_inode_pages2_range() returns EIOCBRETRY but not  EIO.
invalidate_inode_pages2() returns EIO as earlier.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
---
--- linux-2.6.20/mm/truncate.c	2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20p/mm/truncate.c	2007-02-08 22:56:52.000000000 -0800
@@ -366,7 +366,7 @@ static int do_launder_page(struct addres
  * Any pages which are found to be mapped into pagetables are unmapped
prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2_range(struct address_space *mapping,
 				  pgoff_t start, pgoff_t end)
@@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
 			}
 			ret = do_launder_page(mapping, page);
 			if (ret == 0 &&
!invalidate_complete_page2(mapping, page))
-				ret = -EIO;
+				ret = -EIOCBRETRY;
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
@@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  */
 int invalidate_inode_pages2(struct address_space *mapping)
 {
-	return invalidate_inode_pages2_range(mapping, 0, -1);
+	int ret =  invalidate_inode_pages2_range(mapping, 0, -1);
+	return (ret < 0)?-EIO:ret;
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-10 18:05         ` Ken Chen
                             ` (2 preceding siblings ...)
  2007-02-10 21:57           ` Ananiev, Leonid I
@ 2007-02-15  9:16           ` Ananiev, Leonid I
  2007-02-15 18:25             ` Zach Brown
  3 siblings, 1 reply; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-15  9:16 UTC (permalink / raw)
  To: Ken Chen
  Cc: suparna, Andrew Morton, linux-kernel, linux-aio, Zach Brown, Chris Mason

Ken Chen wrote
> It might shut up kernel
> panic by eliminate double calls to aio_complete(), but it will
> silently introduce data corruption.

I had got kernel panic after an hour of aiostress running.
After patching I have not got aiostress massage 
"verify error, file %s offset %Lu contents (offset:bad:good):\n"
during 5 hour aiostress running with 'verify' option.
Looking closely into aiostress.c 
ftp://ftp.suse.com/pub/people/mason/utils/aio-stress.c
we can see that this program may write in random mode THE SAME
contents to the same file offset asynchronously from different
buffers and do not cure about it. Does Ken consider that kernel
panic is the best way to prevent data corruption in such kind
of programs?

> So any error value returned from invalidate_inode_pages2_range() has
> to be taken seriously in the direct IO submit path instead of dropping
> it to the floor.
If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch 
"aio: fix kernel bug when page is temporally busy"
sets then do_sync_read/write() will not drop IO submit but will retry
it:
for (;;) {
	ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
	if (ret != -EIOCBRETRY)
		break;
	wait_on_retry_sync_kiocb(&kiocb);
	}
And do_sync_read/write() will not return EIO if page is busy
as it does now, before patching.

Ken Chen wrote:
> I also think the original patch is wrong.
What do you mean saying 'also'?

Leonid


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-15  9:16           ` Ananiev, Leonid I
@ 2007-02-15 18:25             ` Zach Brown
  2007-02-15 19:11               ` Ananiev, Leonid I
  0 siblings, 1 reply; 31+ messages in thread
From: Zach Brown @ 2007-02-15 18:25 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio, Chris Mason

> If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch
> "aio: fix kernel bug when page is temporally busy"

Sorry Leonid, this patch is not safe.

It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be  
called.  This can lead to operations hanging, both AIO and calls that  
come through do_sync_{read,write}.

It overwrites -EIOCBQUEUED, leading to an aio_complete() while a  
retry is happening.  This can lead to reference count confusion.   
Double-frees, referencing freed memory, that kind of thing.  This  
isn't a new problem.  The current code that overwrites with -EIO has  
this problem.  But moving to -EIOCBRETRY does introduce new behaviour  
of aio_complete() and the retry path racing.

I'll have a candidate patch to address the problem of EIO being  
raised on the way back up from a path which has returned -EIOCBQUEUED.

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-15 18:25             ` Zach Brown
@ 2007-02-15 19:11               ` Ananiev, Leonid I
  2007-02-15 19:22                 ` Zach Brown
  0 siblings, 1 reply; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-15 19:11 UTC (permalink / raw)
  To: Zach Brown
  Cc: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio, Chris Mason

> It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be  
> called.  This can lead to operations hanging

If EIOCBRETRY then generic_file_aio_write() will be recalled for the
same iocb.

> It overwrites -EIOCBQUEUED, leading to an aio_complete() while a  
> retry is happening.

EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call:
        if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
                aio_complete(iocb, ret, 0);

> This can lead to reference count confusion.
But just reference count confusion was deleted by patch. Isn't it?

Leonid

-----Original Message-----
From: Zach Brown [mailto:zach.brown@oracle.com] 
Sent: Thursday, February 15, 2007 9:25 PM
To: Ananiev, Leonid I
Cc: Ken Chen; suparna@in.ibm.com; Andrew Morton;
linux-kernel@vger.kernel.org; linux-aio; Chris Mason
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy

> If invalidate_inode_pages2_range() will return EIOCBRETRY as the patch
> "aio: fix kernel bug when page is temporally busy"

Sorry Leonid, this patch is not safe.

It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be  
called.  This can lead to operations hanging, both AIO and calls that  
come through do_sync_{read,write}.

It overwrites -EIOCBQUEUED, leading to an aio_complete() while a  
retry is happening.  This can lead to reference count confusion.   
Double-frees, referencing freed memory, that kind of thing.  This  
isn't a new problem.  The current code that overwrites with -EIO has  
this problem.  But moving to -EIOCBRETRY does introduce new behaviour  
of aio_complete() and the retry path racing.

I'll have a candidate patch to address the problem of EIO being  
raised on the way back up from a path which has returned -EIOCBQUEUED.

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-15 19:11               ` Ananiev, Leonid I
@ 2007-02-15 19:22                 ` Zach Brown
  2007-02-15 21:06                   ` Ananiev, Leonid I
  2007-02-15 23:32                   ` Ananiev, Leonid I
  0 siblings, 2 replies; 31+ messages in thread
From: Zach Brown @ 2007-02-15 19:22 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio, Chris Mason


On Feb 15, 2007, at 11:11 AM, Ananiev, Leonid I wrote:

>> It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be
>> called.  This can lead to operations hanging
>
> If EIOCBRETRY then generic_file_aio_write() will be recalled for the
> same iocb.

Only if kick_iocb() is called.  It won't be called if i_i_p2_r() was  
the only thing to return -EIOCBRETRY.

>
>> It overwrites -EIOCBQUEUED, leading to an aio_complete() while a
>> retry is happening.
>
> EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call:

Not by fs/aio.c, but *by the place that originated -EIOCBQUEUED*.   
Later.  After IO has completed.  see fs/direct-io.c:dio_bio_end_aio().

This is what -EIOCBQUEUED means!  It's a promise to call aio_complete 
() in the future.

Have you read the giant comment over the definition of struct kiocb  
in include/linux/aio.h?

>> This can lead to reference count confusion.
> But just reference count confusion was deleted by patch. Isn't it?

Sorry, I don't understand what you're trying to ask here.

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-15 19:22                 ` Zach Brown
@ 2007-02-15 21:06                   ` Ananiev, Leonid I
  2007-02-15 23:32                   ` Ananiev, Leonid I
  1 sibling, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-15 21:06 UTC (permalink / raw)
  To: Zach Brown
  Cc: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio, Chris Mason



-----Original Message-----
From: Zach Brown [mailto:zach.brown@oracle.com] 
Sent: Thursday, February 15, 2007 10:23 PM
To: Ananiev, Leonid I
Cc: Ken Chen; suparna@in.ibm.com; Andrew Morton;
linux-kernel@vger.kernel.org; linux-aio; Chris Mason
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy


On Feb 15, 2007, at 11:11 AM, Ananiev, Leonid I wrote:

>> It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be
>> called.  This can lead to operations hanging
>
> If EIOCBRETRY then generic_file_aio_write() will be recalled for the
> same iocb.

Only if kick_iocb() is called.  It won't be called if i_i_p2_r() was  
the only thing to return -EIOCBRETRY.

>
>> It overwrites -EIOCBQUEUED, leading to an aio_complete() while a
>> retry is happening.
>
> EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call:

Not by fs/aio.c, but *by the place that originated -EIOCBQUEUED*.   
Later.  After IO has completed.  see fs/direct-io.c:dio_bio_end_aio().

This is what -EIOCBQUEUED means!  It's a promise to call aio_complete 
() in the future.

Have you read the giant comment over the definition of struct kiocb  
in include/linux/aio.h?

>> This can lead to reference count confusion.
> But just reference count confusion was deleted by patch. Isn't it?

Sorry, I don't understand what you're trying to ask here.

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-15 19:22                 ` Zach Brown
  2007-02-15 21:06                   ` Ananiev, Leonid I
@ 2007-02-15 23:32                   ` Ananiev, Leonid I
  2007-02-16  0:01                     ` Zach Brown
  1 sibling, 1 reply; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-15 23:32 UTC (permalink / raw)
  To: Zach Brown
  Cc: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio, Chris Mason

>> If EIOCBRETRY then generic_file_aio_write() will be recalled for the
>> same iocb.
> Only if kick_iocb() is called.  It won't be called if i_i_p2_r() was  
> the only thing to return -EIOCBRETRY.
It is not need to call kick_iocb()
for generic_file_aio_write() calling.
It is recalled without any wakeup waiting:
        for (;;) {
                ret = filp->f_op->aio_write(&kiocb, &iov, 1,
kiocb.ki_pos);
                if (ret != -EIOCBRETRY)
                        break;
                wait_on_retry_sync_kiocb(&kiocb);
        }
Note: wait_on_retry_sync_kiocb() does not wait.
That is for dio. For aio iocb generic_file_aio_write() call
is required from ki_run_list while next io_submit or
read_events() is called.
So when an IO hang may happen?

>> It overwrites -EIOCBQUEUED
Do you mean that there is one more kernel bug which
overwrites -EIOCBQUEUED by any errno or number of bytes and this
new value is returned to caller as an IO result
while IO is not finished yet.

The proposed patch does not crate this bug if any.
It actually fixes a kernel panic bag when iocb.users count becomes
incorrect. The bag " Kernel BUG at fs/aio.c:509" is there because
aio_run_iocb() have not a chance to differ real EIO and
EIO which is actually means EAGAYN or EIOCBRETRY.
I'me sure the patch changes source code in correct direction:
to differentiate that two kinds of EIOs.

> Have you read the giant comment over the definition of struct kiocb  
> in include/linux/aio.h?
I have read. But compiler has not: it did not create an object code for
* If ki_retry returns -EIOCBRETRY ...

>>> This can lead to reference count confusion.
>> But just reference count confusion was deleted by patch. Isn't it?
>Sorry, I don't understand what you're trying to ask here.
One of reference count iocb.users confusion is deleted by the patch.
I'm not sure that there is other bag.

At least I have not see IO hang while testing.
It is interesting that I've not seen any EIOCBQUEUED returned
to aio_run_iocb() during 5 hours aiostress running.
Does it mean that EIOCBQUEUED is always reset and never returned?

Leonid

-----Original Message-----
From: Zach Brown [mailto:zach.brown@oracle.com] 
Sent: Thursday, February 15, 2007 10:23 PM
To: Ananiev, Leonid I
Cc: Ken Chen; suparna@in.ibm.com; Andrew Morton;
linux-kernel@vger.kernel.org; linux-aio; Chris Mason
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy


On Feb 15, 2007, at 11:11 AM, Ananiev, Leonid I wrote:

>> It returns -EIOCBRETRY without guaranteeing that kick_iocb() will be
>> called.  This can lead to operations hanging
>
> If EIOCBRETRY then generic_file_aio_write() will be recalled for the
> same iocb.

Only if kick_iocb() is called.  It won't be called if i_i_p2_r() was  
the only thing to return -EIOCBRETRY.

>
>> It overwrites -EIOCBQUEUED, leading to an aio_complete() while a
>> retry is happening.
>
> EIOCBQUEUED or EIOCBRETRY does not lead to aio_complete() call:

Not by fs/aio.c, but *by the place that originated -EIOCBQUEUED*.   
Later.  After IO has completed.  see fs/direct-io.c:dio_bio_end_aio().

This is what -EIOCBQUEUED means!  It's a promise to call aio_complete 
() in the future.

Have you read the giant comment over the definition of struct kiocb  
in include/linux/aio.h?

>> This can lead to reference count confusion.
> But just reference count confusion was deleted by patch. Isn't it?

Sorry, I don't understand what you're trying to ask here.

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-15 23:32                   ` Ananiev, Leonid I
@ 2007-02-16  0:01                     ` Zach Brown
  2007-02-16 12:18                       ` Ananiev, Leonid I
  0 siblings, 1 reply; 31+ messages in thread
From: Zach Brown @ 2007-02-16  0:01 UTC (permalink / raw)
  To: Ananiev, Leonid I
  Cc: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio, Chris Mason


On Feb 15, 2007, at 3:32 PM, Ananiev, Leonid I wrote:

>>> If EIOCBRETRY then generic_file_aio_write() will be recalled for the
>>> same iocb.
>> Only if kick_iocb() is called.  It won't be called if i_i_p2_r() was
>> the only thing to return -EIOCBRETRY.
> It is not need to call kick_iocb()
> for generic_file_aio_write() calling.
> It is recalled without any wakeup waiting:
>         for (;;) {
>                 ret = filp->f_op->aio_write(&kiocb, &iov, 1,
> kiocb.ki_pos);
>                 if (ret != -EIOCBRETRY)
>                         break;
>                 wait_on_retry_sync_kiocb(&kiocb);
>         }
> Note: wait_on_retry_sync_kiocb() does not wait.

Yes it does.  It will not return until kiocbSetKicked() is called,  
and that is only called from kick_iocb().

>>> It overwrites -EIOCBQUEUED
> Do you mean that there is one more kernel bug which
> overwrites -EIOCBQUEUED by any errno or number of bytes and this
> new value is returned to caller as an IO result
> while IO is not finished yet.
>
> The proposed patch does not crate this bug if any.

Right, and I said that in the mail you're quoting.

> It actually fixes a kernel panic bag when iocb.users count becomes
> incorrect. The bag " Kernel BUG at fs/aio.c:509" is there because
> aio_run_iocb() have not a chance to differ real EIO and
> EIO which is actually means EAGAYN or EIOCBRETRY.

Yes, I understand the bug you're trying to fix.  You're introducing  
other bugs with the patch.  It will not be merged.

> It is interesting that I've not seen any EIOCBQUEUED returned
> to aio_run_iocb() during 5 hours aiostress running.

What arguments are you running aio-stress with?  -EIOCBQUEUED is only  
used for O_DIRECT, and then only in certain circumstances.

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-16  0:01                     ` Zach Brown
@ 2007-02-16 12:18                       ` Ananiev, Leonid I
  0 siblings, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-16 12:18 UTC (permalink / raw)
  To: Zach Brown
  Cc: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio, Chris Mason

Zach Brown wrote:
> It will not return until kiocbSetKicked() is called,  
> and that is only called from kick_iocb().
There is no test or wait of Kicked in considered
for (;;) aio_write() loop.

Zach Brown wrote:
>> The proposed patch does not crate this bug if any.
> Right, and I said that in the mail you're quoting.
...
> You're introducing other bugs with the patch.
Could you list other introduced bugs?

>> It is interesting that I've not seen any EIOCBQUEUED returned
>> to aio_run_iocb() during 5 hours aiostress running.
> What arguments are you running aio-stress with?  -EIOCBQUEUED is only

> used for O_DIRECT
I wrote in the vary first mail that the panic is appearing in
random write O_DIRECT aio-stress running. Other aio-stress modes
where tested after patching as well.

> and then only in certain circumstances.
Looking closely into sources we can see that
EIOCBQUEUED never may be returned to aio_run_iocb().

include/linux/aio.h says
* If ki_retry returns -EIOCBRETRY ...
Could you point source line which "returns -EIOCBRETRY"?

Leonid
-----Original Message-----
From: Zach Brown [mailto:zach.brown@oracle.com] 
Sent: Friday, February 16, 2007 3:01 AM
To: Ananiev, Leonid I
Cc: Ken Chen; suparna@in.ibm.com; Andrew Morton;
linux-kernel@vger.kernel.org; linux-aio; Chris Mason
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy


On Feb 15, 2007, at 3:32 PM, Ananiev, Leonid I wrote:

>>> If EIOCBRETRY then generic_file_aio_write() will be recalled for the
>>> same iocb.
>> Only if kick_iocb() is called.  It won't be called if i_i_p2_r() was
>> the only thing to return -EIOCBRETRY.
> It is not need to call kick_iocb()
> for generic_file_aio_write() calling.
> It is recalled without any wakeup waiting:
>         for (;;) {
>                 ret = filp->f_op->aio_write(&kiocb, &iov, 1,
> kiocb.ki_pos);
>                 if (ret != -EIOCBRETRY)
>                         break;
>                 wait_on_retry_sync_kiocb(&kiocb);
>         }
> Note: wait_on_retry_sync_kiocb() does not wait.

Yes it does.  It will not return until kiocbSetKicked() is called,  
and that is only called from kick_iocb().

>>> It overwrites -EIOCBQUEUED
> Do you mean that there is one more kernel bug which
> overwrites -EIOCBQUEUED by any errno or number of bytes and this
> new value is returned to caller as an IO result
> while IO is not finished yet.
>
> The proposed patch does not crate this bug if any.

Right, and I said that in the mail you're quoting.

> It actually fixes a kernel panic bag when iocb.users count becomes
> incorrect. The bag " Kernel BUG at fs/aio.c:509" is there because
> aio_run_iocb() have not a chance to differ real EIO and
> EIO which is actually means EAGAYN or EIOCBRETRY.

Yes, I understand the bug you're trying to fix.  You're introducing  
other bugs with the patch.  It will not be merged.

> It is interesting that I've not seen any EIOCBQUEUED returned
> to aio_run_iocb() during 5 hours aiostress running.

What arguments are you running aio-stress with?  -EIOCBQUEUED is only  
used for O_DIRECT, and then only in certain circumstances.

- z

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-15  3:30 ` Andrew Morton
@ 2007-02-15  5:26   ` Ananiev, Leonid I
  0 siblings, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-15  5:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

> Please, tell us what problem this is fixing so that we can look into
> alternative solutions.
This patch fixes a kernel panic "Kernel BUG at fs/aio.c:509"
http://marc.theaimsgroup.com/?l=linux-kernel&m=117031052517746&w=2
First of all it was found that the kernel panic happens after
IO error reporting. But later it was found that the actual
reason is not in real IO error but in a busy page.
EIO is returned if page has unstable state while IO completion
result is processing.
If aio_run_iocb() as well as in do_sync_read/write() get EIO result
it is considered as IO is finished and IO control block
could be free; result is reported to caller.
I suppose that aio_run_iocb() as well as in do_sync_read/write()
should have a chance to differ real EIO and EIO which is actually
means EAGAYN. That functions test for EIOCBRETRY meaning EAGAYN purpose.
do_sync_read/write() retries IO as well as aio:
        for (;;) {
                ret = filp->f_op->aio_read(&kiocb, &iov, 1,
kiocb.ki_pos);
                if (ret != -EIOCBRETRY)
                        break;
                wait_on_retry_sync_kiocb(&kiocb);
        }
By the way EIOCBRETRY is never actually set in the kernel. So what is
the
purpose of previous lines?

More details about EIO returning:
aio_complete: res=-5 iocb=ffff81002b38f080 ki_users=2 
[<ffffffff802573c3>] invalidate_inode_pages2_range+0x236/0x26b
[<ffffffff802af77b>] ext3_direct_IO+0x16c/0x19e
[<ffffffff802adc1d>] ext3_get_block+0x0/0xe2
[<ffffffff802518c0>] generic_file_direct_IO+0xb9/0xcf
[<ffffffff8025193d>] generic_file_direct_write+0x67/0x10e
[<ffffffff80252308>] __generic_file_aio_write_nolock+0x2d6/0x3fe
[<ffffffff802082c0>] __switch_to+0x26f/0x27e
[<ffffffff8047b8e9>] thread_return+0x0/0xd8
[<ffffffff80252497>] generic_file_aio_write+0x67/0xc7
[<ffffffff802ab2cc>] ext3_file_write+0x0/0x8f
[<ffffffff802ab2e2>] ext3_file_write+0x16/0x8f
[<ffffffff802ab2cc>] ext3_file_write+0x0/0x8f
[<ffffffff80283f61>] aio_rw_vect_retry+0x72/0x14f
[<ffffffff80284ad9>] aio_run_iocb+0xe6/0x187
[<ffffffff802853e2>] io_submit_one+0x296/0x2e3
[<ffffffff80285979>] sys_io_submit+0x9b/0x108
[<ffffffff80229b49>] default_wake_function+0x0/0xe
[<ffffffff8020983e>] system_call+0x7e/0x83
Call Trace:
<IRQ> [<ffffffff802848c3>] aio_complete+0x5c/0x18c
[<ffffffff80290f1a>] finished_one_bio+0xac/0xf3
[<ffffffff80290ff6>] dio_bio_complete+0x95/0xaa
[<ffffffff80291131>] dio_bio_end_aio+0x20/0x25
[<ffffffff80306fb7>] __end_that_request_first+0x10e/0x3fd
[<ffffffff803b0798>] scsi_end_request+0x27/0xc9
[<ffffffff803b092a>] scsi_io_completion+0xf0/0x2c5
[<ffffffff803d0407>] ahd_done+0x537/0x580
[<ffffffff803d40f2>] sd_rw_intr+0x182/0x1ad
[<ffffffff80308db7>] blk_done_softirq+0x5c/0x6a
[<ffffffff80234d56>] __do_softirq+0x55/0xc3
[<ffffffff80219ebc>] ack_apic_level+0x37/0x4b
[<ffffffff8020a96c>] call_softirq+0x1c/0x28
[<ffffffff8020c5bf>] do_softirq+0x2c/0x7d
[<ffffffff8020c6b8>] do_IRQ+0xa8/0xc8
[<ffffffff80208031>] mwait_idle+0x0/0x20
[<ffffffff80209d61>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff80208030>] mwait_idle_with_hints+0x44/0x45
[<ffffffff8020803d>] mwait_idle+0xc/0x20
[<ffffffff80208bc3>] cpu_idle+0x8b/0xb0
[<ffffffff802178e3>] start_secondary+0x462/0x471


-----Original Message-----
From: Andrew Morton [mailto:akpm@linux-foundation.org] 
Sent: Thursday, February 15, 2007 6:31 AM
To: Ananiev, Leonid I
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] aio: fix kernel bug when page is temporally busy

On Wed, 14 Feb 2007 20:51:33 +0300 "Ananiev, Leonid I"
<leonid.i.ananiev@intel.com> wrote:

> Fix kernel bug when IO page is temporally busy:
> invalidate_inode_pages2_range() returns EIOCBRETRY but not  EIO.
> invalidate_inode_pages2() returns EIO as earlier.
> 
> Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
> ---
> --- linux-2.6.20/mm/truncate.c  2007-02-04 10:44:54.000000000 -0800
> +++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.000000000 -0800
> @@ -366,7 +366,7 @@ static int do_launder_page(struct addres
>   * Any pages which are found to be mapped into pagetables are
unmapped prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2_range(struct address_space *mapping,
>                                   pgoff_t start, pgoff_t end)
> @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
>                         }
>                         ret = do_launder_page(mapping, page);
>                         if (ret == 0 &&
!invalidate_complete_page2(mapping, page))
> -                               ret = -EIO;
> +                               ret = -EIOCBRETRY;
>                         unlock_page(page);
>                 }
>                 pagevec_release(&pvec);
> @@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
>   */
>  int invalidate_inode_pages2(struct address_space *mapping)
>  {
> -       return invalidate_inode_pages2_range(mapping, 0, -1);
> +       int ret =  invalidate_inode_pages2_range(mapping, 0, -1);
> +       return (ret < 0)?-EIO:ret;
>  }
>  EXPORT_SYMBOL_GPL(invalidate_inode_pages2);

If someone later uses invalidate_inode_pages2_range() elsewhere, they're
going to need to know to convert -EIOCBRETRY into -EIO, if they weren't
called by aio.  Or something.

Please, tell us what problem this is fixing so that we can look into
alternative solutions.

For example, one acceptable-but-ugly solution might be to do:

static inline int invalidate_inode_pages2_range(struct address_space
*mapping,
				pgoff_t start, pgoff_t end)
{
	return __invalidate_inode_pages2_range(mapping, start, end, 0);
}

static inline int invalidate_inode_pages2_range_for_aio(struct
address_space *mapping,
				pgoff_t start, pgoff_t end)
{
	return __invalidate_inode_pages2_range(mapping, start, end, 1);
}

and to then use invalidate_inode_pages2_range_for_aio() from the
appropriate callsite.

But without a complete description of the bug which this is fixing, it's
hard to say how practical such an approach would be.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] aio: fix kernel bug when page is temporally busy
  2007-02-14 17:51 Ananiev, Leonid I
@ 2007-02-15  3:30 ` Andrew Morton
  2007-02-15  5:26   ` Ananiev, Leonid I
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2007-02-15  3:30 UTC (permalink / raw)
  To: Ananiev, Leonid I; +Cc: linux-kernel

On Wed, 14 Feb 2007 20:51:33 +0300 "Ananiev, Leonid I" <leonid.i.ananiev@intel.com> wrote:

> Fix kernel bug when IO page is temporally busy:
> invalidate_inode_pages2_range() returns EIOCBRETRY but not  EIO.
> invalidate_inode_pages2() returns EIO as earlier.
> 
> Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
> ---
> --- linux-2.6.20/mm/truncate.c  2007-02-04 10:44:54.000000000 -0800
> +++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.000000000 -0800
> @@ -366,7 +366,7 @@ static int do_launder_page(struct addres
>   * Any pages which are found to be mapped into pagetables are unmapped prior to
>   * invalidation.
>   *
> - * Returns -EIO if any pages could not be invalidated.
> + * Returns -EIOCBRETRY if any pages could not be invalidated.
>   */
>  int invalidate_inode_pages2_range(struct address_space *mapping,
>                                   pgoff_t start, pgoff_t end)
> @@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
>                         }
>                         ret = do_launder_page(mapping, page);
>                         if (ret == 0 && !invalidate_complete_page2(mapping, page))
> -                               ret = -EIO;
> +                               ret = -EIOCBRETRY;
>                         unlock_page(page);
>                 }
>                 pagevec_release(&pvec);
> @@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
>   */
>  int invalidate_inode_pages2(struct address_space *mapping)
>  {
> -       return invalidate_inode_pages2_range(mapping, 0, -1);
> +       int ret =  invalidate_inode_pages2_range(mapping, 0, -1);
> +       return (ret < 0)?-EIO:ret;
>  }
>  EXPORT_SYMBOL_GPL(invalidate_inode_pages2);

If someone later uses invalidate_inode_pages2_range() elsewhere, they're
going to need to know to convert -EIOCBRETRY into -EIO, if they weren't
called by aio.  Or something.

Please, tell us what problem this is fixing so that we can look into
alternative solutions.

For example, one acceptable-but-ugly solution might be to do:

static inline int invalidate_inode_pages2_range(struct address_space *mapping,
				pgoff_t start, pgoff_t end)
{
	return __invalidate_inode_pages2_range(mapping, start, end, 0);
}

static inline int invalidate_inode_pages2_range_for_aio(struct address_space *mapping,
				pgoff_t start, pgoff_t end)
{
	return __invalidate_inode_pages2_range(mapping, start, end, 1);
}

and to then use invalidate_inode_pages2_range_for_aio() from the
appropriate callsite.

But without a complete description of the bug which this is fixing, it's
hard to say how practical such an approach would be.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] aio: fix kernel bug when page is temporally busy
@ 2007-02-14 17:51 Ananiev, Leonid I
  2007-02-15  3:30 ` Andrew Morton
  0 siblings, 1 reply; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-14 17:51 UTC (permalink / raw)
  To: akpm, linux-kernel

-wrap lines are fixed. Sorry.

>From Leonid Ananiev

Fix kernel bug if IO page is temporally busy:
invalidate_inode_pages2_range() returns EIOCBRETRY but not  EIO.
invalidate_inode_pages2() returns EIO as earlier.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
---
--- linux-2.6.20/mm/truncate.c  2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.000000000 -0800
@@ -366,7 +366,7 @@ static int do_launder_page(struct addres
  * Any pages which are found to be mapped into pagetables are unmapped prior to
  * invalidation.
  *This is path is modified so that invalidate_inode_pages2()
returns EIO as earlier.
Wrap lines are fixed

- From Leonid Ananiev

Fix kernel bug when IO page is temporally busy:
invalidate_inode_pages2_range() returns EIOCBRETRY but not  EIO.
invalidate_inode_pages2() returns EIO as earlier.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
---
--- linux-2.6.20/mm/truncate.c  2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20p/mm/truncate.c 2007-02-08 22:56:52.000000000 -0800
@@ -366,7 +366,7 @@ static int do_launder_page(struct addres
  * Any pages which are found to be mapped into pagetables are unmapped prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2_range(struct address_space *mapping,
                                  pgoff_t start, pgoff_t end)
@@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
                        }
                        ret = do_launder_page(mapping, page);
                        if (ret == 0 && !invalidate_complete_page2(mapping, page))
-                               ret = -EIO;
+                               ret = -EIOCBRETRY;
                        unlock_page(page);
                }
                pagevec_release(&pvec);
@@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  */
 int invalidate_inode_pages2(struct address_space *mapping)
 {
-       return invalidate_inode_pages2_range(mapping, 0, -1);
+       int ret =  invalidate_inode_pages2_range(mapping, 0, -1);
+       return (ret < 0)?-EIO:ret;
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2);

- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2_range(struct address_space *mapping,
                                  pgoff_t start, pgoff_t end)
@@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
                        }
                        ret = do_launder_page(mapping, page);
                        if (ret == 0 && !invalidate_complete_page2(mapping, page))
-                               ret = -EIO;
+                               ret = -EIOCBRETRY;
                        unlock_page(page);
                }
                pagevec_release(&pvec);
@@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  */
 int invalidate_inode_pages2(struct address_space *mapping)
 {
-       return invalidate_inode_pages2_range(mapping, 0, -1);
+       int ret =  invalidate_inode_pages2_range(mapping, 0, -1);
+       return (ret < 0)?-EIO:ret;
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2);




^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH] aio: fix kernel bug when page is temporally busy
@ 2007-02-10 19:36 Ananiev, Leonid I
  0 siblings, 0 replies; 31+ messages in thread
From: Ananiev, Leonid I @ 2007-02-10 19:36 UTC (permalink / raw)
  To: Ken Chen, suparna, Andrew Morton, linux-kernel, linux-aio,
	Zach Brown, Chris Mason, Badari Pulavarty

> If invalidate_inode_pages2_range() says it can not invalidate pages,
> while dio to the same file offset range is in flight, something is
> really wrong there.

If invalidate_inode_pages2_range() says it can not invalidate pages
It means that soft_irq does completing IO now on other cpu.
Next retry() call in aio_run_iocb() will see the IO well completed.

The patch is updated: invalidate_inode_pages2() returns EIO as earlier
for
nfs and other.
But  invalidate_inode_pages2_range() returns EIOCBRETRY for aio and dio.

The patch against 2.6.20.

>From Leonid Ananiev

Fix kernel bug when IO page is temporally busy:
 invalidate_inode_pages2_range() returns EIOCBRETRY but not  EIO.

Signed-off-by: Leonid Ananiev <leonid.i.ananiev@intel.com>
---
--- linux-2.6.20/mm/truncate.c	2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.20p/mm/truncate.c	2007-02-08 22:56:52.000000000 -0800
@@ -366,7 +366,7 @@ static int do_launder_page(struct addres
  * Any pages which are found to be mapped into pagetables are unmapped
prior to
  * invalidation.
  *
- * Returns -EIO if any pages could not be invalidated.
+ * Returns -EIOCBRETRY if any pages could not be invalidated.
  */
 int invalidate_inode_pages2_range(struct address_space *mapping,
 				  pgoff_t start, pgoff_t end)
@@ -423,7 +423,7 @@ int invalidate_inode_pages2_range(struct
 			}
 			ret = do_launder_page(mapping, page);
 			if (ret == 0 &&
!invalidate_complete_page2(mapping, page))
-				ret = -EIO;
+				ret = -EIOCBRETRY;
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
@@ -444,6 +444,7 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  */
 int invalidate_inode_pages2(struct address_space *mapping)
 {
-	return invalidate_inode_pages2_range(mapping, 0, -1);
+	int ret =  invalidate_inode_pages2_range(mapping, 0, -1);
+	return (ret < 0)?-EIO:ret;
 }
 EXPORT_SYMBOL_GPL(invalidate_inode_pages2);

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-02-16 12:20 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-09  4:29 [PATCH] aio: fix kernel bug when page is temporally busy Ananiev, Leonid I
2007-02-09  4:35 ` Andrew Morton
2007-02-09  5:41   ` Ananiev, Leonid I
2007-02-09  5:52     ` Andrew Morton
2007-02-12 22:52       ` Ananiev, Leonid I
2007-02-12 23:21       ` Ananiev, Leonid I
2007-02-09  7:16     ` Suparna Bhattacharya
2007-02-09  9:52       ` Ananiev, Leonid I
2007-02-09 10:11         ` Jiri Kosina
2007-02-10 18:05         ` Ken Chen
2007-02-10 18:17           ` Ananiev, Leonid I
2007-02-10 18:27           ` Ananiev, Leonid I
2007-02-10 21:57           ` Ananiev, Leonid I
2007-02-15  9:16           ` Ananiev, Leonid I
2007-02-15 18:25             ` Zach Brown
2007-02-15 19:11               ` Ananiev, Leonid I
2007-02-15 19:22                 ` Zach Brown
2007-02-15 21:06                   ` Ananiev, Leonid I
2007-02-15 23:32                   ` Ananiev, Leonid I
2007-02-16  0:01                     ` Zach Brown
2007-02-16 12:18                       ` Ananiev, Leonid I
2007-02-09  9:54 ` Jiri Kosina
2007-02-09 10:14   ` Andrew Morton
2007-02-09 10:40     ` Jiri Kosina
2007-02-09 11:05       ` Suparna Bhattacharya
2007-02-09 11:18         ` Ananiev, Leonid I
2007-02-09 17:02         ` Zach Brown
2007-02-10 19:36 Ananiev, Leonid I
2007-02-14 17:51 Ananiev, Leonid I
2007-02-15  3:30 ` Andrew Morton
2007-02-15  5:26   ` Ananiev, Leonid I

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.