Livelock when running xfstests generic/127 on ext4 with 3.15

All of lore.kernel.org
 help / color / mirror / Atom feed

* Livelock when running xfstests generic/127 on ext4 with 3.15
@ 2014-06-20 17:53 Matthew Wilcox
  2014-06-25 13:12 ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2014-06-20 17:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: linux-fsdevel

I didn't see this with 3.14, but I'm not sure what's changed.

When running generic/127, fsx ends up taking 30% CPU time with a kthread
taking 70% CPU time for hours.  It might be making forward progress,
but if it is, it's incredibly slow.

I can usually catch fsx waiting for the kthread:

# ./check generic/127
FSTYP         -- ext4
PLATFORM      -- Linux/x86_64 walter 3.15.0
MKFS_OPTIONS  -- /dev/ram1
MOUNT_OPTIONS -- -o acl,user_xattr /dev/ram1 /mnt/ram1

generic/127 19s ...

$ sudo cat /proc/4795/stack 
[<ffffffff8120bee9>] writeback_inodes_sb_nr+0xa9/0xe0
[<ffffffff8120bfae>] try_to_writeback_inodes_sb_nr+0x5e/0x80
[<ffffffff8120bff5>] try_to_writeback_inodes_sb+0x25/0x30
[<ffffffffa01bae2a>] ext4_nonda_switch+0x8a/0x90 [ext4]
[<ffffffffa01c49a5>] ext4_page_mkwrite+0x265/0x440 [ext4]
[<ffffffff811936ed>] do_page_mkwrite+0x3d/0x70
[<ffffffff81195887>] do_wp_page+0x627/0x770
[<ffffffff811981a1>] handle_mm_fault+0x781/0xf00
[<ffffffff815a8996>] __do_page_fault+0x186/0x570
[<ffffffff815a8da2>] do_page_fault+0x22/0x30
[<ffffffff815a5038>] page_fault+0x28/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

My setup is a 1GB ram disk:

modprobe brd rd_size=1048576 rd_nr=2

local.config:

TEST_DEV=/dev/ram0
TEST_DIR=/mnt/ram0
SCRATCH_DEV=/dev/ram1
SCRATCH_MNT=/mnt/ram1

Hardware is an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4GB RAM,
in case it matters.  But I think what matters is that I'm running it on
a "tiny" 1GB filesystem, since this code is only invoked whenever the
number of dirty clusters is large relative to the number of free clusters.

df shows:
/dev/ram1         999320     1284    929224   1% /mnt/ram1
/dev/ram0         999320   646088    284420  70% /mnt/ram0

So it's not *unreasonably* full.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Livelock when running xfstests generic/127 on ext4 with 3.15
  2014-06-20 17:53 Livelock when running xfstests generic/127 on ext4 with 3.15 Matthew Wilcox
@ 2014-06-25 13:12 ` Jan Kara
  2014-06-25 14:17   ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2014-06-25 13:12 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-ext4, linux-fsdevel

On Fri 20-06-14 13:53:22, Matthew Wilcox wrote:
> 
> I didn't see this with 3.14, but I'm not sure what's changed.
> 
> When running generic/127, fsx ends up taking 30% CPU time with a kthread
> taking 70% CPU time for hours.  It might be making forward progress,
> but if it is, it's incredibly slow.
> 
> I can usually catch fsx waiting for the kthread:
> 
> # ./check generic/127
> FSTYP         -- ext4
> PLATFORM      -- Linux/x86_64 walter 3.15.0
> MKFS_OPTIONS  -- /dev/ram1
> MOUNT_OPTIONS -- -o acl,user_xattr /dev/ram1 /mnt/ram1
> 
> generic/127 19s ...
> 
> $ sudo cat /proc/4795/stack 
> [<ffffffff8120bee9>] writeback_inodes_sb_nr+0xa9/0xe0
> [<ffffffff8120bfae>] try_to_writeback_inodes_sb_nr+0x5e/0x80
> [<ffffffff8120bff5>] try_to_writeback_inodes_sb+0x25/0x30
> [<ffffffffa01bae2a>] ext4_nonda_switch+0x8a/0x90 [ext4]
> [<ffffffffa01c49a5>] ext4_page_mkwrite+0x265/0x440 [ext4]
  Hum, apparently you are running out of space on the test partition. And
that is known to make ext4 extraordinarily slow...

								Honza

> [<ffffffff811936ed>] do_page_mkwrite+0x3d/0x70
> [<ffffffff81195887>] do_wp_page+0x627/0x770
> [<ffffffff811981a1>] handle_mm_fault+0x781/0xf00
> [<ffffffff815a8996>] __do_page_fault+0x186/0x570
> [<ffffffff815a8da2>] do_page_fault+0x22/0x30
> [<ffffffff815a5038>] page_fault+0x28/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> 
> My setup is a 1GB ram disk:
> 
> modprobe brd rd_size=1048576 rd_nr=2
> 
> local.config:
> 
> TEST_DEV=/dev/ram0
> TEST_DIR=/mnt/ram0
> SCRATCH_DEV=/dev/ram1
> SCRATCH_MNT=/mnt/ram1
> 
> 
> Hardware is an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4GB RAM,
> in case it matters.  But I think what matters is that I'm running it on
> a "tiny" 1GB filesystem, since this code is only invoked whenever the
> number of dirty clusters is large relative to the number of free clusters.
> 
> df shows:
> /dev/ram1         999320     1284    929224   1% /mnt/ram1
> /dev/ram0         999320   646088    284420  70% /mnt/ram0
> 
> So it's not *unreasonably* full.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Livelock when running xfstests generic/127 on ext4 with 3.15
  2014-06-25 13:12 ` Jan Kara
@ 2014-06-25 14:17   ` Matthew Wilcox
  2014-06-25 14:44     ` Jan Kara
  2014-06-25 14:45     ` Theodore Ts'o
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Wilcox @ 2014-06-25 14:17 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, linux-fsdevel

On Wed, Jun 25, 2014 at 03:12:24PM +0200, Jan Kara wrote:
> > $ sudo cat /proc/4795/stack 
> > [<ffffffff8120bee9>] writeback_inodes_sb_nr+0xa9/0xe0
> > [<ffffffff8120bfae>] try_to_writeback_inodes_sb_nr+0x5e/0x80
> > [<ffffffff8120bff5>] try_to_writeback_inodes_sb+0x25/0x30
> > [<ffffffffa01bae2a>] ext4_nonda_switch+0x8a/0x90 [ext4]
> > [<ffffffffa01c49a5>] ext4_page_mkwrite+0x265/0x440 [ext4]
>   Hum, apparently you are running out of space on the test partition. And
> that is known to make ext4 extraordinarily slow...

Okay ... but why is it so much worse in 3.15 than 3.14?

And does ext4 think of "running out of space" as a percentage
free, or an absolute number of blocks remaining?  From the code in
ext4_nonda_switch(), it seems to be the former, although maybe excessive
fragmentation has caused ext4 to think it's running out of space?

> > My setup is a 1GB ram disk:
> > 
> > modprobe brd rd_size=1048576 rd_nr=2
> > 
> > local.config:
> > 
> > TEST_DEV=/dev/ram0
> > TEST_DIR=/mnt/ram0
> > SCRATCH_DEV=/dev/ram1
> > SCRATCH_MNT=/mnt/ram1
> > 
> > 
> > Hardware is an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4GB RAM,
> > in case it matters.  But I think what matters is that I'm running it on
> > a "tiny" 1GB filesystem, since this code is only invoked whenever the
> > number of dirty clusters is large relative to the number of free clusters.
> > 
> > df shows:
> > /dev/ram1         999320     1284    929224   1% /mnt/ram1
> > /dev/ram0         999320   646088    284420  70% /mnt/ram0
> > 
> > So it's not *unreasonably* full.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Livelock when running xfstests generic/127 on ext4 with 3.15
  2014-06-25 14:17   ` Matthew Wilcox
@ 2014-06-25 14:44     ` Jan Kara
  2014-06-25 14:45     ` Theodore Ts'o
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2014-06-25 14:44 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-ext4, linux-fsdevel

On Wed 25-06-14 10:17:56, Matthew Wilcox wrote:
> On Wed, Jun 25, 2014 at 03:12:24PM +0200, Jan Kara wrote:
> > > $ sudo cat /proc/4795/stack 
> > > [<ffffffff8120bee9>] writeback_inodes_sb_nr+0xa9/0xe0
> > > [<ffffffff8120bfae>] try_to_writeback_inodes_sb_nr+0x5e/0x80
> > > [<ffffffff8120bff5>] try_to_writeback_inodes_sb+0x25/0x30
> > > [<ffffffffa01bae2a>] ext4_nonda_switch+0x8a/0x90 [ext4]
> > > [<ffffffffa01c49a5>] ext4_page_mkwrite+0x265/0x440 [ext4]
> >   Hum, apparently you are running out of space on the test partition. And
> > that is known to make ext4 extraordinarily slow...
> 
> Okay ... but why is it so much worse in 3.15 than 3.14?
  Is it really a difference between kernels? Didn't just the partition get
more full? If it is really just a kernel difference I don't have a good
explanation... Bisecting that down would be useful...

> And does ext4 think of "running out of space" as a percentage
> free, or an absolute number of blocks remaining?  From the code in
> ext4_nonda_switch(), it seems to be the former, although maybe excessive
> fragmentation has caused ext4 to think it's running out of space?
  We start forcing writeback (and waiting for it) when the amount of free
space is less than twice the amount of delayed-allocated blocks which are
not yet written out.

								Honza

> > > My setup is a 1GB ram disk:
> > > 
> > > modprobe brd rd_size=1048576 rd_nr=2
> > > 
> > > local.config:
> > > 
> > > TEST_DEV=/dev/ram0
> > > TEST_DIR=/mnt/ram0
> > > SCRATCH_DEV=/dev/ram1
> > > SCRATCH_MNT=/mnt/ram1
> > > 
> > > 
> > > Hardware is an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4GB RAM,
> > > in case it matters.  But I think what matters is that I'm running it on
> > > a "tiny" 1GB filesystem, since this code is only invoked whenever the
> > > number of dirty clusters is large relative to the number of free clusters.
> > > 
> > > df shows:
> > > /dev/ram1         999320     1284    929224   1% /mnt/ram1
> > > /dev/ram0         999320   646088    284420  70% /mnt/ram0
> > > 
> > > So it's not *unreasonably* full.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > -- 
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Livelock when running xfstests generic/127 on ext4 with 3.15
  2014-06-25 14:17   ` Matthew Wilcox
  2014-06-25 14:44     ` Jan Kara
@ 2014-06-25 14:45     ` Theodore Ts'o
  1 sibling, 0 replies; 5+ messages in thread
From: Theodore Ts'o @ 2014-06-25 14:45 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-ext4, linux-fsdevel

On Wed, Jun 25, 2014 at 10:17:56AM -0400, Matthew Wilcox wrote:
> 
> Okay ... but why is it so much worse in 3.15 than 3.14?
> 
> And does ext4 think of "running out of space" as a percentage
> free, or an absolute number of blocks remaining?  From the code in
> ext4_nonda_switch(), it seems to be the former, although maybe excessive
> fragmentation has caused ext4 to think it's running out of space?

When the blocks that were allocated using delayed allocation exceeds
50% of the free space, we initiate writeback.  When delalloc blocks
exceeds 66% of the free space, we fall back to nodelalloc, which among
other things, means blocks are allocated for each write system call,
and we also have to add and remove the inode from on the orphan inode
list so that if we crash in the middle of the write system call, we
don't end up exposing stale data.

We did have a change to the orphan inode code to improve scalability,
so that could have been a possible cause; but that happened after
3.15, so that can't be it.  The other possibility is that there's
simply a chance in the writeback code that is changing how
aggressively we start writeback when we exceed the 50% threshold, so
that we end up switching into nonda mode more often.

Any chance you can run generic/127 under perf so we can see where
we're spending all of our CPU time?  The other thing I can imagine
doing is to add tracepoint when whenver we drop into nonda mode, so we
can see if that's happening more often under 3.15 versus 3.14.

    	   	  	    	       	     - Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-06-25 14:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-20 17:53 Livelock when running xfstests generic/127 on ext4 with 3.15 Matthew Wilcox
2014-06-25 13:12 ` Jan Kara
2014-06-25 14:17   ` Matthew Wilcox
2014-06-25 14:44     ` Jan Kara
2014-06-25 14:45     ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.