All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
@ 2017-02-10  3:53 Eryu Guan
  2017-02-10  4:20 ` Eryu Guan
  2017-02-10  7:14 ` Michal Hocko
  0 siblings, 2 replies; 11+ messages in thread
From: Eryu Guan @ 2017-02-10  3:53 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, mhocko

Hi,

I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
kernel, and git bisect pointed the first bad commit to

commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
Author: Michal Hocko <mhocko@suse.com>
Date:   Fri Feb 3 13:13:26 2017 -0800

    fs: break out of iomap_file_buffered_write on fatal signals

    Tetsuo has noticed that an OOM stress test which performs large write
    requests can cause the full memory reserves depletion.  He has tracked
    this down to the following path
....

It's the sb_fdblocks field reports inconsistency:
...
Phase 2 - using internal log   
        - zero log...
        - scan filesystem freespace and inode maps...
sb_fdblocks 3367765, counted 3367863
        - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
        - found root inode chunk
...

And it can be reproduced almost 100% with all XFS test configurations
(e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
bother pasting my detailed test and host configs, if more info is needed
please let me know).

Thanks,
Eryu

P.S. full xfs_repair -n output

*** xfs_repair -n output ***   
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log   
        - zero log...
        - scan filesystem freespace and inode maps...
sb_fdblocks 3367765, counted 3367863
        - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - 11:37:41: scanning agi unlinked lists - 16 of 16 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 15
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - 11:37:42: process known inodes and inode discovery - 13760 of 13760 inodes done
        - process newly discovered inodes...
        - 11:37:42: process newly discovered inodes - 16 of 16 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 11:37:42: setting up duplicate extent list - 16 of 16 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - 11:37:42: check for inodes claiming duplicate blocks - 13760 of 13760 inodes done
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
        - 11:37:43: verify and correct link counts - 16 of 16 allocation groups done
No modify flag set, skipping filesystem flush and exiting.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-10  3:53 [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test Eryu Guan
@ 2017-02-10  4:20 ` Eryu Guan
  2017-02-10  7:14 ` Michal Hocko
  1 sibling, 0 replies; 11+ messages in thread
From: Eryu Guan @ 2017-02-10  4:20 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, mhocko

On Fri, Feb 10, 2017 at 11:53:48AM +0800, Eryu Guan wrote:
> Hi,
> 
> I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6

Seems generic/269 reproduces it easily too. I also saw xfs/167 fail on
ppc64 host because of this issue, but rarely.

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-10  3:53 [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test Eryu Guan
  2017-02-10  4:20 ` Eryu Guan
@ 2017-02-10  7:14 ` Michal Hocko
  2017-02-10  8:02   ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-02-10  7:14 UTC (permalink / raw)
  To: Eryu Guan; +Cc: linux-xfs, linux-fsdevel

On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> Hi,
> 
> I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> kernel, and git bisect pointed the first bad commit to
> 
> commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 3 13:13:26 2017 -0800
> 
>     fs: break out of iomap_file_buffered_write on fatal signals
> 
>     Tetsuo has noticed that an OOM stress test which performs large write
>     requests can cause the full memory reserves depletion.  He has tracked
>     this down to the following path
> ....
> 
> It's the sb_fdblocks field reports inconsistency:
> ...
> Phase 2 - using internal log   
>         - zero log...
>         - scan filesystem freespace and inode maps...
> sb_fdblocks 3367765, counted 3367863
>         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
>         - found root inode chunk
> ...
> 
> And it can be reproduced almost 100% with all XFS test configurations
> (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> bother pasting my detailed test and host configs, if more info is needed
> please let me know).

The patch can lead to short writes when the task is killed. Was there
any OOM killer triggered during the test? If not who is killing the
task? I will try to reproduce later today.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-10  7:14 ` Michal Hocko
@ 2017-02-10  8:02   ` Michal Hocko
  2017-02-10  9:31     ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-02-10  8:02 UTC (permalink / raw)
  To: Eryu Guan; +Cc: linux-xfs, linux-fsdevel

On Fri 10-02-17 08:14:18, Michal Hocko wrote:
> On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> > Hi,
> > 
> > I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> > corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> > kernel, and git bisect pointed the first bad commit to
> > 
> > commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> > Author: Michal Hocko <mhocko@suse.com>
> > Date:   Fri Feb 3 13:13:26 2017 -0800
> > 
> >     fs: break out of iomap_file_buffered_write on fatal signals
> > 
> >     Tetsuo has noticed that an OOM stress test which performs large write
> >     requests can cause the full memory reserves depletion.  He has tracked
> >     this down to the following path
> > ....
> > 
> > It's the sb_fdblocks field reports inconsistency:
> > ...
> > Phase 2 - using internal log   
> >         - zero log...
> >         - scan filesystem freespace and inode maps...
> > sb_fdblocks 3367765, counted 3367863
> >         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
> >         - found root inode chunk
> > ...
> > 
> > And it can be reproduced almost 100% with all XFS test configurations
> > (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> > bother pasting my detailed test and host configs, if more info is needed
> > please let me know).
> 
> The patch can lead to short writes when the task is killed. Was there
> any OOM killer triggered during the test? If not who is killing the
> task? I will try to reproduce later today.

I have checked both tests and they are killing the test but none of them
seems to be using SIGKILL. The patch should make a difference only for
fatal signal (aka SIGKILL). Is there any other part that can do SIGKILL
except for the OOM killer?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-10  8:02   ` Michal Hocko
@ 2017-02-10  9:31     ` Michal Hocko
  2017-02-11  6:02       ` Eryu Guan
  2017-02-20 14:25       ` Michal Hocko
  0 siblings, 2 replies; 11+ messages in thread
From: Michal Hocko @ 2017-02-10  9:31 UTC (permalink / raw)
  To: Eryu Guan; +Cc: linux-xfs, linux-fsdevel, Christoph Hellwig

[CC Christoph]

On Fri 10-02-17 09:02:10, Michal Hocko wrote:
> On Fri 10-02-17 08:14:18, Michal Hocko wrote:
> > On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> > > Hi,
> > > 
> > > I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> > > corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> > > kernel, and git bisect pointed the first bad commit to
> > > 
> > > commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> > > Author: Michal Hocko <mhocko@suse.com>
> > > Date:   Fri Feb 3 13:13:26 2017 -0800
> > > 
> > >     fs: break out of iomap_file_buffered_write on fatal signals
> > > 
> > >     Tetsuo has noticed that an OOM stress test which performs large write
> > >     requests can cause the full memory reserves depletion.  He has tracked
> > >     this down to the following path
> > > ....
> > > 
> > > It's the sb_fdblocks field reports inconsistency:
> > > ...
> > > Phase 2 - using internal log   
> > >         - zero log...
> > >         - scan filesystem freespace and inode maps...
> > > sb_fdblocks 3367765, counted 3367863
> > >         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
> > >         - found root inode chunk
> > > ...
> > > 
> > > And it can be reproduced almost 100% with all XFS test configurations
> > > (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> > > bother pasting my detailed test and host configs, if more info is needed
> > > please let me know).
> > 
> > The patch can lead to short writes when the task is killed. Was there
> > any OOM killer triggered during the test? If not who is killing the
> > task? I will try to reproduce later today.
> 
> I have checked both tests and they are killing the test but none of them
> seems to be using SIGKILL. The patch should make a difference only for
> fatal signal (aka SIGKILL). Is there any other part that can do SIGKILL
> except for the OOM killer?

I have done
diff --git a/fs/dax.c b/fs/dax.c
index c45598b912e1..4de00ceaf73d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1032,6 +1032,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		ssize_t map_len;
 
 		if (fatal_signal_pending(current)) {
+			WARN(1, "dax_iomap_actor\n");
 			ret = -EINTR;
 			break;
 		}
diff --git a/fs/iomap.c b/fs/iomap.c
index a51cb4c07d4d..00019e2cdad3 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -114,8 +114,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 
 	BUG_ON(pos + len > iomap->offset + iomap->length);
 
-	if (fatal_signal_pending(current))
+	if (fatal_signal_pending(current)) {
+		WARN(1, "iomap_write_begin\n");
 		return -EINTR;
+	}
 
 	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
 	if (!page)

and it triggered
[  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.087297] WARNING: CPU: 1 PID: 4899 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.150542] WARNING: CPU: 2 PID: 4857 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.153653] WARNING: CPU: 3 PID: 4889 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.154413] WARNING: CPU: 0 PID: 4929 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.154734] WARNING: CPU: 3 PID: 4934 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.162743] WARNING: CPU: 0 PID: 4813 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.163282] WARNING: CPU: 1 PID: 4891 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.163820] WARNING: CPU: 0 PID: 4932 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.169112] WARNING: CPU: 0 PID: 4923 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.182816] WARNING: CPU: 0 PID: 4892 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.321991] WARNING: CPU: 3 PID: 4872 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c

always from
[  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
[  135.057092] iomap_write_begin
[  135.057556] Modules linked in: xfs libcrc32c
[  135.058191] CPU: 2 PID: 4894 Comm: fsstress Tainted: G        W       4.10.0-rc6-test1-00149-gd1908f52557b-dirty #1064
[  135.060349] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[  135.060865] Call Trace:
[  135.060865]  dump_stack+0x68/0x92
[  135.060865]  __warn+0xc2/0xdd
[  135.060865]  warn_slowpath_fmt+0x4b/0x53
[  135.060865]  iomap_write_begin+0x72/0x12c
[  135.060865]  iomap_write_actor+0x99/0x161
[  135.060865]  iomap_apply+0x9e/0xec
[  135.060865]  ? iomap_write_end+0x62/0x62
[  135.060865]  iomap_file_buffered_write+0x5b/0x7f
[  135.060865]  ? iomap_write_end+0x62/0x62
[  135.060865]  xfs_file_buffered_aio_write+0x187/0x2b4 [xfs]
[  135.060865]  xfs_file_write_iter+0x93/0x11c [xfs]
[  135.060865]  __vfs_write+0xcc/0xf5
[  135.060865]  vfs_write+0x100/0x1a9
[  135.060865]  SyS_write+0x51/0x8e

So somebody had to send SIGKILL to fsstress. Anyway, I am wondering
whether this is really a regression. xfs_file_buffered_aio_write used to
call generic_perform_write which does the same thing.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-10  9:31     ` Michal Hocko
@ 2017-02-11  6:02       ` Eryu Guan
  2017-02-11  6:33         ` Michal Hocko
  2017-02-20 14:25       ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2017-02-11  6:02 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-xfs, linux-fsdevel, Christoph Hellwig

On Fri, Feb 10, 2017 at 10:31:31AM +0100, Michal Hocko wrote:
> [CC Christoph]
> 
> On Fri 10-02-17 09:02:10, Michal Hocko wrote:
> > On Fri 10-02-17 08:14:18, Michal Hocko wrote:
> > > On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> > > > Hi,
> > > > 
> > > > I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> > > > corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> > > > kernel, and git bisect pointed the first bad commit to
> > > > 
> > > > commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> > > > Author: Michal Hocko <mhocko@suse.com>
> > > > Date:   Fri Feb 3 13:13:26 2017 -0800
> > > > 
> > > >     fs: break out of iomap_file_buffered_write on fatal signals
> > > > 
> > > >     Tetsuo has noticed that an OOM stress test which performs large write
> > > >     requests can cause the full memory reserves depletion.  He has tracked
> > > >     this down to the following path
> > > > ....
> > > > 
> > > > It's the sb_fdblocks field reports inconsistency:
> > > > ...
> > > > Phase 2 - using internal log   
> > > >         - zero log...
> > > >         - scan filesystem freespace and inode maps...
> > > > sb_fdblocks 3367765, counted 3367863
> > > >         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
> > > >         - found root inode chunk
> > > > ...
> > > > 
> > > > And it can be reproduced almost 100% with all XFS test configurations
> > > > (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> > > > bother pasting my detailed test and host configs, if more info is needed
> > > > please let me know).
> > > 
> > > The patch can lead to short writes when the task is killed. Was there
> > > any OOM killer triggered during the test? If not who is killing the
> > > task? I will try to reproduce later today.
> > 
> > I have checked both tests and they are killing the test but none of them
> > seems to be using SIGKILL. The patch should make a difference only for
> > fatal signal (aka SIGKILL). Is there any other part that can do SIGKILL
> > except for the OOM killer?

No, I'm not aware of any other part in fstests harness could send
SIGKILL.

I also noticed xfs/274 xfs/275 failed due to the same sb_fdblocks
inconsistency (xfs_1k_rmap test config), which have no kill/killall
involved at all.

BTW, I'm using latest xfsprogs for-next branch in all my testing, in
case it matters.

> 
> I have done

Thanks for looking into this! This issue seems to cause random fstests
failures on xfs.

> diff --git a/fs/dax.c b/fs/dax.c
> index c45598b912e1..4de00ceaf73d 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1032,6 +1032,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		ssize_t map_len;
>  
>  		if (fatal_signal_pending(current)) {
> +			WARN(1, "dax_iomap_actor\n");
>  			ret = -EINTR;
>  			break;
>  		}
> diff --git a/fs/iomap.c b/fs/iomap.c
> index a51cb4c07d4d..00019e2cdad3 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -114,8 +114,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  
>  	BUG_ON(pos + len > iomap->offset + iomap->length);
>  
> -	if (fatal_signal_pending(current))
> +	if (fatal_signal_pending(current)) {
> +		WARN(1, "iomap_write_begin\n");
>  		return -EINTR;
> +	}
>  
>  	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
>  	if (!page)
> 
> and it triggered
> [  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.087297] WARNING: CPU: 1 PID: 4899 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.150542] WARNING: CPU: 2 PID: 4857 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.153653] WARNING: CPU: 3 PID: 4889 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.154413] WARNING: CPU: 0 PID: 4929 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.154734] WARNING: CPU: 3 PID: 4934 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.162743] WARNING: CPU: 0 PID: 4813 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.163282] WARNING: CPU: 1 PID: 4891 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.163820] WARNING: CPU: 0 PID: 4932 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.169112] WARNING: CPU: 0 PID: 4923 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.182816] WARNING: CPU: 0 PID: 4892 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.321991] WARNING: CPU: 3 PID: 4872 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> 
> always from
> [  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.057092] iomap_write_begin
> [  135.057556] Modules linked in: xfs libcrc32c
> [  135.058191] CPU: 2 PID: 4894 Comm: fsstress Tainted: G        W       4.10.0-rc6-test1-00149-gd1908f52557b-dirty #1064
> [  135.060349] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
> [  135.060865] Call Trace:
> [  135.060865]  dump_stack+0x68/0x92
> [  135.060865]  __warn+0xc2/0xdd
> [  135.060865]  warn_slowpath_fmt+0x4b/0x53
> [  135.060865]  iomap_write_begin+0x72/0x12c
> [  135.060865]  iomap_write_actor+0x99/0x161
> [  135.060865]  iomap_apply+0x9e/0xec
> [  135.060865]  ? iomap_write_end+0x62/0x62
> [  135.060865]  iomap_file_buffered_write+0x5b/0x7f
> [  135.060865]  ? iomap_write_end+0x62/0x62
> [  135.060865]  xfs_file_buffered_aio_write+0x187/0x2b4 [xfs]
> [  135.060865]  xfs_file_write_iter+0x93/0x11c [xfs]
> [  135.060865]  __vfs_write+0xcc/0xf5
> [  135.060865]  vfs_write+0x100/0x1a9
> [  135.060865]  SyS_write+0x51/0x8e
> 
> So somebody had to send SIGKILL to fsstress. Anyway, I am wondering
> whether this is really a regression. xfs_file_buffered_aio_write used to
> call generic_perform_write which does the same thing.

Maybe it just uncovered some existing bug? Anyway, a reliable reproduced
filesystem metadata inconsistency does smell like a bug.

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-11  6:02       ` Eryu Guan
@ 2017-02-11  6:33         ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2017-02-11  6:33 UTC (permalink / raw)
  To: Eryu Guan; +Cc: linux-xfs, linux-fsdevel, Christoph Hellwig

On Sat 11-02-17 14:02:04, Eryu Guan wrote:
> On Fri, Feb 10, 2017 at 10:31:31AM +0100, Michal Hocko wrote:
> > [CC Christoph]
> > 
> > On Fri 10-02-17 09:02:10, Michal Hocko wrote:
> > > On Fri 10-02-17 08:14:18, Michal Hocko wrote:
> > > > On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> > > > > Hi,
> > > > > 
> > > > > I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> > > > > corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> > > > > kernel, and git bisect pointed the first bad commit to
> > > > > 
> > > > > commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> > > > > Author: Michal Hocko <mhocko@suse.com>
> > > > > Date:   Fri Feb 3 13:13:26 2017 -0800
> > > > > 
> > > > >     fs: break out of iomap_file_buffered_write on fatal signals
> > > > > 
> > > > >     Tetsuo has noticed that an OOM stress test which performs large write
> > > > >     requests can cause the full memory reserves depletion.  He has tracked
> > > > >     this down to the following path
> > > > > ....
> > > > > 
> > > > > It's the sb_fdblocks field reports inconsistency:
> > > > > ...
> > > > > Phase 2 - using internal log   
> > > > >         - zero log...
> > > > >         - scan filesystem freespace and inode maps...
> > > > > sb_fdblocks 3367765, counted 3367863
> > > > >         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
> > > > >         - found root inode chunk
> > > > > ...
> > > > > 
> > > > > And it can be reproduced almost 100% with all XFS test configurations
> > > > > (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> > > > > bother pasting my detailed test and host configs, if more info is needed
> > > > > please let me know).
> > > > 
> > > > The patch can lead to short writes when the task is killed. Was there
> > > > any OOM killer triggered during the test? If not who is killing the
> > > > task? I will try to reproduce later today.
> > > 
> > > I have checked both tests and they are killing the test but none of them
> > > seems to be using SIGKILL. The patch should make a difference only for
> > > fatal signal (aka SIGKILL). Is there any other part that can do SIGKILL
> > > except for the OOM killer?
> 
> No, I'm not aware of any other part in fstests harness could send
> SIGKILL.

hmm, maybe this is a result of the group_exit which sends SIGKILL to
other threads (zap_other_threads)
 
[...]
> > So somebody had to send SIGKILL to fsstress. Anyway, I am wondering
> > whether this is really a regression. xfs_file_buffered_aio_write used to
> > call generic_perform_write which does the same thing.
> 
> Maybe it just uncovered some existing bug?

maybe

> Anyway, a reliable reproduced filesystem metadata inconsistency does
> smell like a bug.

definitely! Unfortunately I am going to disappear for week. Will be back
on 20th. Anyway, I believe iomap_file_buffered_write and its callers
_should_ be able to handle short reads. EINTR is not the only way how
can this happen. ENOMEM would be another.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-10  9:31     ` Michal Hocko
  2017-02-11  6:02       ` Eryu Guan
@ 2017-02-20 14:25       ` Michal Hocko
  2017-02-20 14:58         ` Brian Foster
  1 sibling, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-02-20 14:25 UTC (permalink / raw)
  To: Eryu Guan
  Cc: linux-xfs, linux-fsdevel, Christoph Hellwig, Jan Kara,
	Darrick J. Wong, Brian Foster

I have discussed that shortly with Jack and he suspects that xfs
specific part of the iomap callbacks doesn't cancel reservations.
Let's CC more xfs people. The thread starts
http://lkml.kernel.org/r/20170210035348.GA7075@eguan.usersys.redhat.com

On Fri 10-02-17 10:31:31, Michal Hocko wrote:
> [CC Christoph]
> 
> On Fri 10-02-17 09:02:10, Michal Hocko wrote:
> > On Fri 10-02-17 08:14:18, Michal Hocko wrote:
> > > On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> > > > Hi,
> > > > 
> > > > I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> > > > corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> > > > kernel, and git bisect pointed the first bad commit to
> > > > 
> > > > commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> > > > Author: Michal Hocko <mhocko@suse.com>
> > > > Date:   Fri Feb 3 13:13:26 2017 -0800
> > > > 
> > > >     fs: break out of iomap_file_buffered_write on fatal signals
> > > > 
> > > >     Tetsuo has noticed that an OOM stress test which performs large write
> > > >     requests can cause the full memory reserves depletion.  He has tracked
> > > >     this down to the following path
> > > > ....
> > > > 
> > > > It's the sb_fdblocks field reports inconsistency:
> > > > ...
> > > > Phase 2 - using internal log   
> > > >         - zero log...
> > > >         - scan filesystem freespace and inode maps...
> > > > sb_fdblocks 3367765, counted 3367863
> > > >         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
> > > >         - found root inode chunk
> > > > ...
> > > > 
> > > > And it can be reproduced almost 100% with all XFS test configurations
> > > > (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> > > > bother pasting my detailed test and host configs, if more info is needed
> > > > please let me know).
> > > 
> > > The patch can lead to short writes when the task is killed. Was there
> > > any OOM killer triggered during the test? If not who is killing the
> > > task? I will try to reproduce later today.
> > 
> > I have checked both tests and they are killing the test but none of them
> > seems to be using SIGKILL. The patch should make a difference only for
> > fatal signal (aka SIGKILL). Is there any other part that can do SIGKILL
> > except for the OOM killer?
> 
> I have done
> diff --git a/fs/dax.c b/fs/dax.c
> index c45598b912e1..4de00ceaf73d 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1032,6 +1032,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		ssize_t map_len;
>  
>  		if (fatal_signal_pending(current)) {
> +			WARN(1, "dax_iomap_actor\n");
>  			ret = -EINTR;
>  			break;
>  		}
> diff --git a/fs/iomap.c b/fs/iomap.c
> index a51cb4c07d4d..00019e2cdad3 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -114,8 +114,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  
>  	BUG_ON(pos + len > iomap->offset + iomap->length);
>  
> -	if (fatal_signal_pending(current))
> +	if (fatal_signal_pending(current)) {
> +		WARN(1, "iomap_write_begin\n");
>  		return -EINTR;
> +	}
>  
>  	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
>  	if (!page)
> 
> and it triggered
> [  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.087297] WARNING: CPU: 1 PID: 4899 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.150542] WARNING: CPU: 2 PID: 4857 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.153653] WARNING: CPU: 3 PID: 4889 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.154413] WARNING: CPU: 0 PID: 4929 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.154734] WARNING: CPU: 3 PID: 4934 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.162743] WARNING: CPU: 0 PID: 4813 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.163282] WARNING: CPU: 1 PID: 4891 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.163820] WARNING: CPU: 0 PID: 4932 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.169112] WARNING: CPU: 0 PID: 4923 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.182816] WARNING: CPU: 0 PID: 4892 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.321991] WARNING: CPU: 3 PID: 4872 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> 
> always from
> [  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> [  135.057092] iomap_write_begin
> [  135.057556] Modules linked in: xfs libcrc32c
> [  135.058191] CPU: 2 PID: 4894 Comm: fsstress Tainted: G        W       4.10.0-rc6-test1-00149-gd1908f52557b-dirty #1064
> [  135.060349] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
> [  135.060865] Call Trace:
> [  135.060865]  dump_stack+0x68/0x92
> [  135.060865]  __warn+0xc2/0xdd
> [  135.060865]  warn_slowpath_fmt+0x4b/0x53
> [  135.060865]  iomap_write_begin+0x72/0x12c
> [  135.060865]  iomap_write_actor+0x99/0x161
> [  135.060865]  iomap_apply+0x9e/0xec
> [  135.060865]  ? iomap_write_end+0x62/0x62
> [  135.060865]  iomap_file_buffered_write+0x5b/0x7f
> [  135.060865]  ? iomap_write_end+0x62/0x62
> [  135.060865]  xfs_file_buffered_aio_write+0x187/0x2b4 [xfs]
> [  135.060865]  xfs_file_write_iter+0x93/0x11c [xfs]
> [  135.060865]  __vfs_write+0xcc/0xf5
> [  135.060865]  vfs_write+0x100/0x1a9
> [  135.060865]  SyS_write+0x51/0x8e
> 
> So somebody had to send SIGKILL to fsstress. Anyway, I am wondering
> whether this is really a regression. xfs_file_buffered_aio_write used to
> call generic_perform_write which does the same thing.
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-20 14:25       ` Michal Hocko
@ 2017-02-20 14:58         ` Brian Foster
  2017-02-21  4:14           ` Eryu Guan
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2017-02-20 14:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Eryu Guan, linux-xfs, linux-fsdevel, Christoph Hellwig, Jan Kara,
	Darrick J. Wong

On Mon, Feb 20, 2017 at 03:25:21PM +0100, Michal Hocko wrote:
> I have discussed that shortly with Jack and he suspects that xfs
> specific part of the iomap callbacks doesn't cancel reservations.
> Let's CC more xfs people. The thread starts
> http://lkml.kernel.org/r/20170210035348.GA7075@eguan.usersys.redhat.com
> 

Have you tried commit fa7f138 ("xfs: clear delalloc and cache on buffered
write failure"), currently in for-next?

Brian

> On Fri 10-02-17 10:31:31, Michal Hocko wrote:
> > [CC Christoph]
> > 
> > On Fri 10-02-17 09:02:10, Michal Hocko wrote:
> > > On Fri 10-02-17 08:14:18, Michal Hocko wrote:
> > > > On Fri 10-02-17 11:53:48, Eryu Guan wrote:
> > > > > Hi,
> > > > > 
> > > > > I was testing 4.10-rc7 kernel and noticed that xfs_repair reported XFS
> > > > > corruption after fstests xfs/297 test. This didn't happen with 4.10-rc6
> > > > > kernel, and git bisect pointed the first bad commit to
> > > > > 
> > > > > commit d1908f52557b3230fbd63c0429f3b4b748bf2b6d
> > > > > Author: Michal Hocko <mhocko@suse.com>
> > > > > Date:   Fri Feb 3 13:13:26 2017 -0800
> > > > > 
> > > > >     fs: break out of iomap_file_buffered_write on fatal signals
> > > > > 
> > > > >     Tetsuo has noticed that an OOM stress test which performs large write
> > > > >     requests can cause the full memory reserves depletion.  He has tracked
> > > > >     this down to the following path
> > > > > ....
> > > > > 
> > > > > It's the sb_fdblocks field reports inconsistency:
> > > > > ...
> > > > > Phase 2 - using internal log   
> > > > >         - zero log...
> > > > >         - scan filesystem freespace and inode maps...
> > > > > sb_fdblocks 3367765, counted 3367863
> > > > >         - 11:37:41: scanning filesystem freespace - 16 of 16 allocation groups done
> > > > >         - found root inode chunk
> > > > > ...
> > > > > 
> > > > > And it can be reproduced almost 100% with all XFS test configurations
> > > > > (e.g. xfs_4k xfs_2k_reflink), on all test hosts I tried (so I didn't
> > > > > bother pasting my detailed test and host configs, if more info is needed
> > > > > please let me know).
> > > > 
> > > > The patch can lead to short writes when the task is killed. Was there
> > > > any OOM killer triggered during the test? If not who is killing the
> > > > task? I will try to reproduce later today.
> > > 
> > > I have checked both tests and they are killing the test but none of them
> > > seems to be using SIGKILL. The patch should make a difference only for
> > > fatal signal (aka SIGKILL). Is there any other part that can do SIGKILL
> > > except for the OOM killer?
> > 
> > I have done
> > diff --git a/fs/dax.c b/fs/dax.c
> > index c45598b912e1..4de00ceaf73d 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1032,6 +1032,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  		ssize_t map_len;
> >  
> >  		if (fatal_signal_pending(current)) {
> > +			WARN(1, "dax_iomap_actor\n");
> >  			ret = -EINTR;
> >  			break;
> >  		}
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index a51cb4c07d4d..00019e2cdad3 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -114,8 +114,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >  
> >  	BUG_ON(pos + len > iomap->offset + iomap->length);
> >  
> > -	if (fatal_signal_pending(current))
> > +	if (fatal_signal_pending(current)) {
> > +		WARN(1, "iomap_write_begin\n");
> >  		return -EINTR;
> > +	}
> >  
> >  	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
> >  	if (!page)
> > 
> > and it triggered
> > [  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.087297] WARNING: CPU: 1 PID: 4899 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.150542] WARNING: CPU: 2 PID: 4857 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.153653] WARNING: CPU: 3 PID: 4889 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.154413] WARNING: CPU: 0 PID: 4929 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.154734] WARNING: CPU: 3 PID: 4934 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.162743] WARNING: CPU: 0 PID: 4813 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.163282] WARNING: CPU: 1 PID: 4891 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.163820] WARNING: CPU: 0 PID: 4932 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.169112] WARNING: CPU: 0 PID: 4923 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.182816] WARNING: CPU: 0 PID: 4892 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.321991] WARNING: CPU: 3 PID: 4872 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > 
> > always from
> > [  135.055682] WARNING: CPU: 2 PID: 4894 at fs/iomap.c:118 iomap_write_begin+0x72/0x12c
> > [  135.057092] iomap_write_begin
> > [  135.057556] Modules linked in: xfs libcrc32c
> > [  135.058191] CPU: 2 PID: 4894 Comm: fsstress Tainted: G        W       4.10.0-rc6-test1-00149-gd1908f52557b-dirty #1064
> > [  135.060349] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
> > [  135.060865] Call Trace:
> > [  135.060865]  dump_stack+0x68/0x92
> > [  135.060865]  __warn+0xc2/0xdd
> > [  135.060865]  warn_slowpath_fmt+0x4b/0x53
> > [  135.060865]  iomap_write_begin+0x72/0x12c
> > [  135.060865]  iomap_write_actor+0x99/0x161
> > [  135.060865]  iomap_apply+0x9e/0xec
> > [  135.060865]  ? iomap_write_end+0x62/0x62
> > [  135.060865]  iomap_file_buffered_write+0x5b/0x7f
> > [  135.060865]  ? iomap_write_end+0x62/0x62
> > [  135.060865]  xfs_file_buffered_aio_write+0x187/0x2b4 [xfs]
> > [  135.060865]  xfs_file_write_iter+0x93/0x11c [xfs]
> > [  135.060865]  __vfs_write+0xcc/0xf5
> > [  135.060865]  vfs_write+0x100/0x1a9
> > [  135.060865]  SyS_write+0x51/0x8e
> > 
> > So somebody had to send SIGKILL to fsstress. Anyway, I am wondering
> > whether this is really a regression. xfs_file_buffered_aio_write used to
> > call generic_perform_write which does the same thing.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> -- 
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-20 14:58         ` Brian Foster
@ 2017-02-21  4:14           ` Eryu Guan
  2017-02-21  8:13             ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Eryu Guan @ 2017-02-21  4:14 UTC (permalink / raw)
  To: Brian Foster
  Cc: Michal Hocko, linux-xfs, linux-fsdevel, Christoph Hellwig,
	Jan Kara, Darrick J. Wong

On Mon, Feb 20, 2017 at 09:58:19AM -0500, Brian Foster wrote:
> On Mon, Feb 20, 2017 at 03:25:21PM +0100, Michal Hocko wrote:
> > I have discussed that shortly with Jack and he suspects that xfs
> > specific part of the iomap callbacks doesn't cancel reservations.
> > Let's CC more xfs people. The thread starts
> > http://lkml.kernel.org/r/20170210035348.GA7075@eguan.usersys.redhat.com
> > 
> 
> Have you tried commit fa7f138 ("xfs: clear delalloc and cache on buffered
> write failure"), currently in for-next?

I cherry-picked this commit to 4.10-rc8 tree and xfs/297 xfs/305 all
passed for me without fs corruption. They used to fail for me reliably
with stock 4.10-rc8 kernel. Thanks all for looking at this!

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test
  2017-02-21  4:14           ` Eryu Guan
@ 2017-02-21  8:13             ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2017-02-21  8:13 UTC (permalink / raw)
  To: Brian Foster
  Cc: Eryu Guan, linux-xfs, linux-fsdevel, Christoph Hellwig, Jan Kara,
	Darrick J. Wong

On Tue 21-02-17 12:14:30, Eryu Guan wrote:
> On Mon, Feb 20, 2017 at 09:58:19AM -0500, Brian Foster wrote:
> > On Mon, Feb 20, 2017 at 03:25:21PM +0100, Michal Hocko wrote:
> > > I have discussed that shortly with Jack and he suspects that xfs
> > > specific part of the iomap callbacks doesn't cancel reservations.
> > > Let's CC more xfs people. The thread starts
> > > http://lkml.kernel.org/r/20170210035348.GA7075@eguan.usersys.redhat.com
> > > 
> > 
> > Have you tried commit fa7f138 ("xfs: clear delalloc and cache on buffered
> > write failure"), currently in for-next?
> 
> I cherry-picked this commit to 4.10-rc8 tree and xfs/297 xfs/305 all
> passed for me without fs corruption. They used to fail for me reliably
> with stock 4.10-rc8 kernel. Thanks all for looking at this!

Brian, could you make sure that your commit gets to stable (at least) 4.8+
because that is where d1908f52557b ("fs: break out of
iomap_file_buffered_write on fatal signals") will be backported to.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-02-21  8:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-10  3:53 [BUG 4.10-rc7] sb_fdblocks inconsistency in xfs/297 test Eryu Guan
2017-02-10  4:20 ` Eryu Guan
2017-02-10  7:14 ` Michal Hocko
2017-02-10  8:02   ` Michal Hocko
2017-02-10  9:31     ` Michal Hocko
2017-02-11  6:02       ` Eryu Guan
2017-02-11  6:33         ` Michal Hocko
2017-02-20 14:25       ` Michal Hocko
2017-02-20 14:58         ` Brian Foster
2017-02-21  4:14           ` Eryu Guan
2017-02-21  8:13             ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.