All of lore.kernel.org
 help / color / mirror / Atom feed
* Please hammer my for-linus branch
@ 2012-07-01  1:22 Chris Mason
  2012-07-02 14:10 ` xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch) David Sterba
  2012-07-02 20:17 ` Please hammer my for-linus branch Chris Mason
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Mason @ 2012-07-01  1:22 UTC (permalink / raw)
  To: linux-btrfs

Hi everyone,

I've got a nice set of fixes from Josef, Jan, Ilya and others in my
for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

Some of the changes are fixes for the tree logging code, so I ran some
extra crash runs against them Friday night.

I ended up with a new crash in the tree log directory deletion replay
code, so I didn't send out the pull request to Linus.

It isn't clear yet if the new crash is because I was testing differently
or if it is a regression.  I'm nailing it down this weekend, but please
give my for-linus a shot.

-chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
  2012-07-01  1:22 Please hammer my for-linus branch Chris Mason
@ 2012-07-02 14:10 ` David Sterba
  2012-07-02 14:34   ` David Sterba
  2012-07-02 20:17 ` Please hammer my for-linus branch Chris Mason
  1 sibling, 1 reply; 6+ messages in thread
From: David Sterba @ 2012-07-02 14:10 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2359 bytes --]

Hi,

I'm seeing a machine lockup in xfstests/224, logs attached. Friday's
xfstests round with 3.5-rc4 was ok, all tests passed.

The 'dd' processes are in D-state with this stacktraces

 5597 pts/0    D+     0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.8 bs=4k conv=notrunc
[<ffffffffa001bb3e>] reserve_metadata_bytes+0x33e/0x8f0 [btrfs]
[<ffffffffa001cd64>] btrfs_delalloc_reserve_metadata+0x134/0x3b0 [btrfs]
[<ffffffffa001d16b>] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs]
[<ffffffffa004132b>] __btrfs_buffered_write+0x17b/0x380 [btrfs]
[<ffffffffa0041783>] btrfs_file_aio_write+0x253/0x4e0 [btrfs]
[<ffffffff81144892>] do_sync_write+0xe2/0x120
[<ffffffff8114519e>] vfs_write+0xce/0x190
[<ffffffff811454e4>] sys_write+0x54/0xa0
[<ffffffff818b4fa9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

and (not sure if there are more)

 5666 pts/0    D+     0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.6 bs=4k conv=notrunc
[<ffffffffa001bb3e>] reserve_metadata_bytes+0x33e/0x8f0 [btrfs]
[<ffffffffa001c56a>] btrfs_block_rsv_add+0x3a/0x60 [btrfs]
[<ffffffffa003155e>] start_transaction+0x26e/0x330 [btrfs]
[<ffffffffa0031903>] btrfs_start_transaction+0x13/0x20 [btrfs]
[<ffffffffa003cae0>] btrfs_dirty_inode+0xb0/0xe0 [btrfs]
[<ffffffffa003cdad>] btrfs_update_time+0xcd/0x180 [btrfs]
[<ffffffffa00416f8>] btrfs_file_aio_write+0x1c8/0x4e0 [btrfs]
[<ffffffff81144892>] do_sync_write+0xe2/0x120
[<ffffffff8114519e>] vfs_write+0xce/0x190
[<ffffffff811454e4>] sys_write+0x54/0xa0
[<ffffffff818b4fa9>] system_call_fastpath+0x16/0x1b

all btrfs kernel threads are idle.

Mount options: -o space_cache
Mkfs: fresh, default options

# btrfs fi df /mnt/a2
System: total=4.00MiB, used=4.00KiB
Data+Metadata: total=1020.00MiB, used=987.32MiB

[meanwhile]

While grabbing lockdep stats the test respawned

224 236s ...    [14:57:42] [15:46:56] 2954s

but there was no disk activity, I wonder if touching /proc/lockdep or
/proc/lock_stat is affecting this.

Finishing this report anyway, and will redo the tests again.

Looking again into the logs, the first process snapshot (only D-state
processes) is much longer than process snapshot of containing all,
unfortuntelly I don't have timestamps recorded, but this suggests that it's
very slowly going on, so slowly that I considered it stalled looking at the
io graphs.


david

[-- Attachment #2: for-linus-hung-224-all.txt.gz --]
[-- Type: application/octet-stream, Size: 6081 bytes --]

[-- Attachment #3: for-linus-hung-224-D.txt.gz --]
[-- Type: application/octet-stream, Size: 5625 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
  2012-07-02 14:10 ` xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch) David Sterba
@ 2012-07-02 14:34   ` David Sterba
  2012-07-02 16:10     ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2012-07-02 14:34 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Mon, Jul 02, 2012 at 04:10:52PM +0200, David Sterba wrote:
> Finishing this report anyway, and will redo the tests again.
> 
> Looking again into the logs, the first process snapshot (only D-state
> processes) is much longer than process snapshot of containing all,
> unfortuntelly I don't have timestamps recorded, but this suggests that it's
> very slowly going on, so slowly that I considered it stalled looking at the
> io graphs.

Fresh build, reboot, and single xfstests/224 run:

during first ~20 seconds, there's high write activity, ie. file setup,
then it goes to a "few tens-to-hundreds of KB every 4 seconds". Cpu is idle,
sample output from dstat:

----total-cpu-usage---- --dsk/sda9- ---system--
usr sys idl wai hiq siq| read  writ| int   csw
  1   1  99   0   0   0|   0     0 | 923  1856
  0   1  98   0   1   0|   0  8192B| 904  2796
  0   1  99   0   0   0|   0     0 | 945  1914
  1   1  98   0   0   0|   0     0 | 899  1849
  1   1  98   0   0   1|   0     0 | 906  1848
  0   3  97   0   0   0|   0    20k| 901  3740
  0   0 100   0   0   0|   0     0 | 905  1851
  1   1  98   0   0   1|   0     0 | 946  1917
  0   1  99   0   0   0|   0     0 | 904  1858
  0   1  99   0   0   0|   0  8192B| 907  2805
  1   1  98   0   0   1|   0     0 | 891  1836
  0   1  99   0   0   0|   0     0 | 900  1847
  0   1  99   0   0   0|   0     0 | 940  1905
  1   4  95   0   0   0|   0    32k| 904  5153
  1   2  97   0   0   0|   0    36k| 913  4240
  0   1  99   0   0   0|   0     0 | 907  1849
  0   1  99   0   0   0|   0     0 | 908  1852
  1   1  98   0   0   1|   0     0 | 933  1901
  1   2  98   0   0   0|   0  8192B| 916  2808
  0   1  99   0   0   0|   0     0 | 917  1843
  0   1  99   0   0   1|   0     0 | 908  1844
  1   1  99   0   0   0|   0     0 | 905  1860
  0   5  95   0   0   0|   0    36k| 943  7565
  1   1  99   0   0   0|   0     0 | 911  1861
  0   1  99   0   0   0|   0     0 | 910  1852
  1   1  98   0   0   0|   0     0 | 944  1878
  1   2  97   0   0   1|   0    16k| 898  3753
  0   9  87   4   0   1|   0  1020k|1035    11k
  0  19  74   7   0   1|   0  2092k|3052    24k
  0   1  99   0   0   0|   0     0 | 909  1851
  1   1  98   0   0   1|   0     0 | 915  1856
  1   1  99   0   0   0|   0     0 | 896  1847
  0   2  98   0   0   0|   0  8192B| 931  2847
  0   1  99   0   0   0|   0     0 | 899  1850
  1   1  98   0   0   1|   0     0 | 896  1861
  0   1  99   0   0   0|   0     0 | 911  1855
  1   5  94   0   0   0|   0    28k| 891  6521
  0   9  87   3   0   1|   0  1100k| 963    11k
  0   1  99   0   0   0|   0     0 | 905  1857
  1   1  99   0   0   0|   0     0 | 895  1851
  1   1  98   0   0   0|   0     0 | 911  1852
  0   7  88   4   0   1|   0   700k| 911  8533
  0   1  99   0   0   0|   0     0 | 940  1905
  1   1  99   0   0   0|   0     0 | 912  1851
  1   1  99   0   0   0|   0     0 | 895  1851
  0  10  89   0   0   1|   0   100k| 912    13k

and repeats more or less the same.

Bisection in progress.


david

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
  2012-07-02 14:34   ` David Sterba
@ 2012-07-02 16:10     ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2012-07-02 16:10 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs, jbacik

On Mon, Jul 02, 2012 at 04:34:53PM +0200, David Sterba wrote:
> Bisection in progress.

commit cae76522b19735c576803bec273f49062aa418ab
Author: Josef Bacik <jbacik@fusionio.com>
Date:   Thu Jun 21 14:05:49 2012 -0400

    Btrfs: flush delayed inodes if we're short on space

    Those crazy gentoo guys have been complaining about ENOSPC errors on their
    portage volumes.  This is because doing things like untar tends to create
    lots of new files which will soak up all the reservation space in the
    delayed inodes.  Usually this gets papered over by the fact that we will try
    and commit the transaction, however if this happens in the wrong spot or we
    choose not to commit the transaction you will be screwed.  So add the
    ability to expclitly flush delayed inodes to free up space.  Please test
    this out guys to make sure it works since as usual I cannot reproduce.
    Thanks,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Please hammer my for-linus branch
  2012-07-01  1:22 Please hammer my for-linus branch Chris Mason
  2012-07-02 14:10 ` xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch) David Sterba
@ 2012-07-02 20:17 ` Chris Mason
  2012-07-03 14:39   ` David Sterba
  1 sibling, 1 reply; 6+ messages in thread
From: Chris Mason @ 2012-07-02 20:17 UTC (permalink / raw)
  To: linux-btrfs

On Sat, Jun 30, 2012 at 09:22:59PM -0400, Chris Mason wrote:
> Hi everyone,
> 
> I've got a nice set of fixes from Josef, Jan, Ilya and others in my
> for-linus branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus
> 
> Some of the changes are fixes for the tree logging code, so I ran some
> extra crash runs against them Friday night.
> 
> I ended up with a new crash in the tree log directory deletion replay
> code, so I didn't send out the pull request to Linus.
> 
> It isn't clear yet if the new crash is because I was testing differently
> or if it is a regression.  I'm nailing it down this weekend, but please
> give my for-linus a shot.

Ok, I've just rebased for-linus.  I've dropped Josef's enospc patch,
which should fix the regression Dave hit.  I've also added a fix for my
log replay crash, which was definitely an old bug.  The delayed
directory operations were queuing up the changes made during replay, and
it was confusing the replay code.

Looks like there's a fix pending from Liu Bo, but I'll let Daniel test
that before pulling it in as well.

Thanks everyone.

-chris



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Please hammer my for-linus branch
  2012-07-02 20:17 ` Please hammer my for-linus branch Chris Mason
@ 2012-07-03 14:39   ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2012-07-03 14:39 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Mon, Jul 02, 2012 at 04:17:37PM -0400, Chris Mason wrote:
> Ok, I've just rebased for-linus.  I've dropped Josef's enospc patch,
> which should fix the regression Dave hit.

JFYI, fixed. No other problems observed so far.

david

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-07-03 14:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-01  1:22 Please hammer my for-linus branch Chris Mason
2012-07-02 14:10 ` xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch) David Sterba
2012-07-02 14:34   ` David Sterba
2012-07-02 16:10     ` David Sterba
2012-07-02 20:17 ` Please hammer my for-linus branch Chris Mason
2012-07-03 14:39   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.