All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 43292] New: jdb2 lockup with ext3 and nfs
@ 2012-05-24 15:54 bugzilla-daemon
  2012-05-26 20:31 ` [Bug 43292] " bugzilla-daemon
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-24 15:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292

           Summary: jdb2 lockup with ext3 and nfs
           Product: File System
           Version: 2.5
    Kernel Version: 3.4
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext3
        AssignedTo: fs_ext3@kernel-bugs.osdl.org
        ReportedBy: flo@xssn.at
        Regression: Yes


When using nfs and writing some files nfsd locks up in D state and I get the
attached backtraces in dmesg. Once nfsd killed the request that triggered the
lockup, the problem will not happen again until the system has been rebooted.
(Tested over a period of 20 days)

I'm not yet sure if it happens on the first write request or only on some
special ones, but I don't really want to reboot my file server and let it hang
that often. However, if you can tell me what to look for or what debugging
features I should use, I'll be happy to try them.

I also didn't yet test if this can be reproduced when writing files directly
(without nfs), but I think it would lock up during boot in that case since I'm
running a mysqld slave on the system.

This started happening around linux 3.2 or 3.3.

I've attached the whole dmesg output (including bootup, in case this is related
to the hardware).

Arch Linux x86_64
Linux 3.4 (config [1])
nfs-utils 1.2.5


[1]:
https://projects.archlinux.org/svntogit/packages.git/plain/trunk/config.x86_64?h=packages/linux&id=f8c396e6ca1e29c3da83431f64074f8c127e1af3

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
@ 2012-05-26 20:31 ` bugzilla-daemon
  2012-05-29 14:04 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-26 20:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #1 from Florian Pritz <flo@xssn.at>  2012-05-26 20:31:25 ---
Created an attachment (id=73412)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=73412)
dmesg with backtrace

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
  2012-05-26 20:31 ` [Bug 43292] " bugzilla-daemon
@ 2012-05-29 14:04 ` bugzilla-daemon
  2012-05-29 21:44 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-29 14:04 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292


Jan Kara <jack@suse.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jack@suse.cz




--- Comment #2 from Jan Kara <jack@suse.cz>  2012-05-29 14:04:10 ---
If you can reproduce this, can you make the deadlock happen, then run 'echo w
>/proc/sysrq-trigger', and attach here output from dmesg? Thanks!

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
  2012-05-26 20:31 ` [Bug 43292] " bugzilla-daemon
  2012-05-29 14:04 ` bugzilla-daemon
@ 2012-05-29 21:44 ` bugzilla-daemon
  2012-05-30  9:18 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-29 21:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #3 from Florian Pritz <flo@xssn.at>  2012-05-29 21:44:13 ---
Created an attachment (id=73459)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=73459)
output of echo w > /proc/sysrq-trigger

This is with linux 3.0.32 instead of 3.4 because 3.4 had lots of bugs that
caused me to switch all systems away from it. (These might get their own
reports some day unless someone else takes care of them first)

If you think it really makes a difference I can switch back to 3.4 for testing.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (2 preceding siblings ...)
  2012-05-29 21:44 ` bugzilla-daemon
@ 2012-05-30  9:18 ` bugzilla-daemon
  2012-05-30  9:24 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-30  9:18 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #4 from Jan Kara <jack@suse.cz>  2012-05-30 09:18:30 ---
Hmm, thanks for the data. So jbd2 thread (which is blocking the rest) is
waiting for the following nfsd thread to finish a transaction:
[  128.487611] nfsd            D 0000000000000000     0   750      2 0x00000000
[  128.487613]  ffff880076833050 0000000000000046 ffff880076832f50
ffffffff00000000
[  128.487615]  ffff8800790ea040 ffff880076833fd8 ffff880076833fd8
ffff880076833fd8
[  128.487618]  ffffffff8189b020 ffff8800790ea040 0000000179880938
ffff880079880848
[  128.487620] Call Trace:
[  128.487624]  [<ffffffff81204569>] ? queue_unplugged+0x59/0x110
[  128.487627]  [<ffffffff81187fa0>] ? __wait_on_buffer+0x30/0x30
[  128.487629]  [<ffffffff81421d8f>] schedule+0x3f/0x60
[  128.487631]  [<ffffffff81421e3f>] io_schedule+0x8f/0xd0
[  128.487633]  [<ffffffff81187fae>] sleep_on_buffer+0xe/0x20
[  128.487634]  [<ffffffff814225a0>] __wait_on_bit+0x60/0x90
[  128.487637]  [<ffffffff81187fa0>] ? __wait_on_buffer+0x30/0x30
[  128.487639]  [<ffffffff8142264c>] out_of_line_wait_on_bit+0x7c/0x90
[  128.487641]  [<ffffffff810819f0>] ? autoremove_wake_function+0x40/0x40
[  128.487643]  [<ffffffff81187f9e>] __wait_on_buffer+0x2e/0x30
[  128.487648]  [<ffffffffa01744d3>] ext4_mb_init_cache+0x203/0x9c0 [ext4]
[  128.487651]  [<ffffffff81106e40>] ? __lru_cache_add+0x90/0xb0
[  128.487656]  [<ffffffffa017665e>] ext4_mb_init_group+0x10e/0x210 [ext4]
[  128.487660]  [<ffffffffa0176876>] ext4_mb_good_group+0x116/0x130 [ext4]
[  128.487665]  [<ffffffffa0178a3b>] ext4_mb_regular_allocator+0x19b/0x420
[ext4]
[  128.487669]  [<ffffffffa017620d>] ? ext4_mb_normalize_request+0x20d/0x500
[ext4]
[  128.487674]  [<ffffffffa017a40e>] ext4_mb_new_blocks+0x42e/0x5d0 [ext4]
[  128.487678]  [<ffffffffa0148558>] ext4_alloc_branch+0x528/0x670 [ext4]
[  128.487681]  [<ffffffff81118129>] ? zone_statistics+0x99/0xc0
[  128.487686]  [<ffffffffa014ba78>] ext4_ind_map_blocks+0x328/0x7c0 [ext4]
[  128.487690]  [<ffffffffa014bfea>] ext4_map_blocks+0xda/0x1f0 [ext4]
[  128.487694]  [<ffffffffa014c1a6>] _ext4_get_block+0xa6/0x160 [ext4]
[  128.487698]  [<ffffffffa014c2c6>] ext4_get_block+0x16/0x20 [ext4]
[  128.487701]  [<ffffffff8118a427>] __block_write_begin+0x1c7/0x590
[  128.487705]  [<ffffffffa014c2b0>] ? noalloc_get_block_write+0x30/0x30 [ext4]
[  128.487709]  [<ffffffffa014d9c2>] ext4_write_begin+0x162/0x390 [ext4]
[  128.487711]  [<ffffffff810fa739>] generic_file_buffered_write+0x109/0x260
[  128.487714]  [<ffffffff810fb8d5>] __generic_file_aio_write+0x245/0x460
[  128.487716]  [<ffffffff8142350d>] ? __mutex_lock_slowpath+0x25d/0x350
[  128.487718]  [<ffffffff810fbb5e>] generic_file_aio_write+0x6e/0xe0
[  128.487722]  [<ffffffffa0143dff>] ext4_file_write+0xaf/0x260 [ext4]
[  128.487724]  [<ffffffff811745e9>] ? iget_locked+0x89/0x180
[  128.487727]  [<ffffffffa06293b0>] ? _fh_update.isra.8.part.9+0x60/0x60
[nfsd]
[  128.487729]  [<ffffffff81174902>] ? iput+0x42/0x1c0
[  128.487733]  [<ffffffffa0143d50>] ? ext4_llseek+0x110/0x110 [ext4]
[  128.487735]  [<ffffffff8115a9d2>] do_sync_readv_writev+0xd2/0x110
[  128.487738]  [<ffffffffa06293b0>] ? _fh_update.isra.8.part.9+0x60/0x60
[nfsd]
[  128.487741]  [<ffffffff811449b9>] ? __kmalloc+0x39/0x1a0
[  128.487743]  [<ffffffff811e33ec>] ? security_file_permission+0x2c/0xb0
[  128.487745]  [<ffffffff8115a0f1>] ? rw_verify_area+0x61/0xf0
[  128.487747]  [<ffffffff8115aca4>] do_readv_writev+0xd4/0x1e0
[  128.487751]  [<ffffffffa0143a5f>] ? ext4_file_open+0x6f/0x1e0 [ext4]
[  128.487753]  [<ffffffff8115ade5>] vfs_writev+0x35/0x60
[  128.487757]  [<ffffffffa062a83b>] nfsd_vfs_write.isra.9+0xeb/0x3e0 [nfsd]
[  128.487759]  [<ffffffff811588df>] ? dentry_open+0x4f/0x90
[  128.487762]  [<ffffffffa062b690>] ? nfsd_open+0xa0/0x1a0 [nfsd]
[  128.487766]  [<ffffffffa062ce08>] nfsd_write+0xf8/0x110 [nfsd]
[  128.487770]  [<ffffffffa063476b>] nfsd3_proc_write+0xbb/0x150 [nfsd]
[  128.487773]  [<ffffffffa0626a4e>] nfsd_dispatch+0xfe/0x240 [nfsd]
[  128.487777]  [<ffffffffa054597b>] svc_process+0x4bb/0x840 [sunrpc]
[  128.487779]  [<ffffffff8107143b>] ? recalc_sigpending+0x1b/0x50
[  128.487782]  [<ffffffffa06260c2>] nfsd+0xc2/0x160 [nfsd]
[  128.487785]  [<ffffffffa0626000>] ? 0xffffffffa0625fff
[  128.487786]  [<ffffffff8108102c>] kthread+0x8c/0xa0
[  128.487788]  [<ffffffff814266a4>] kernel_thread_helper+0x4/0x10
[  128.487791]  [<ffffffff81080fa0>] ? kthread_worker_fn+0x180/0x180
[  128.487792]  [<ffffffff814266a0>] ? gs_change+0x13/0x13

That thread waits for block bitmap to be read from disk. The question is why
would reading take so long (or never actually finish).

For start can you maybe switch IO scheduler to deadline. You can do that by:
echo "deadline" >/sys/block/<dev>/queue/scheduler
for all disks, reproduce the problem and post here output from echo w >
/proc/sysrq-trigger again? Thanks.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (3 preceding siblings ...)
  2012-05-30  9:18 ` bugzilla-daemon
@ 2012-05-30  9:24 ` bugzilla-daemon
  2012-05-30  9:41 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-30  9:24 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #5 from Florian Pritz <flo@xssn.at>  2012-05-30 09:24:44 ---
I'm already using deadline, should I try cfq/noop or is that not worth trying?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (4 preceding siblings ...)
  2012-05-30  9:24 ` bugzilla-daemon
@ 2012-05-30  9:41 ` bugzilla-daemon
  2012-05-30  9:47 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-30  9:41 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #6 from Jan Kara <jack@suse.cz>  2012-05-30 09:41:29 ---
OK, so try with NOOP to make things as simple as possible. Thanks!

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (5 preceding siblings ...)
  2012-05-30  9:41 ` bugzilla-daemon
@ 2012-05-30  9:47 ` bugzilla-daemon
  2012-05-30 13:33 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-30  9:47 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #7 from Florian Pritz <flo@xssn.at>  2012-05-30 09:47:27 ---
Created an attachment (id=73465)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=73465)
output of echo w > /proc/sysrq-trigger with noop

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (6 preceding siblings ...)
  2012-05-30  9:47 ` bugzilla-daemon
@ 2012-05-30 13:33 ` bugzilla-daemon
  2012-05-30 21:20 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-30 13:33 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #8 from Jan Kara <jack@suse.cz>  2012-05-30 13:33:29 ---
Hum, it looks exactly the same like the previous attempt. I'm starting to
suspect some problem with plugging or bio list handling. Since I can see you
can trigger the lockup pretty quickly, can you gather blktrace data for me? The
filesystem is on some device mapper device so you'll need to run something
like:
blktrace -d /dev/dm-2 -d /dev/sd? ...
where you include all scsi disks compounding the dm device. And then just
tar.gz the created files after the lockup happens and attach them here.
Hopefully we'll be able to see where the read got stuck. Thanks!

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (7 preceding siblings ...)
  2012-05-30 13:33 ` bugzilla-daemon
@ 2012-05-30 21:20 ` bugzilla-daemon
  2012-05-31 20:16 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-30 21:20 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #9 from Florian Pritz <flo@xssn.at>  2012-05-30 21:20:32 ---
Created an attachment (id=73471)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=73471)
blktrace files

I'm running 2 raid 1 arrays with 2 and 3 disks, both in a single LVM volume
group which contains one big LV (dm-2). 

I've started blktrace before the lockup happened and killed it a few seconds
after with ^C. I can also let it run until nfsd gives up, if you want that.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (8 preceding siblings ...)
  2012-05-30 21:20 ` bugzilla-daemon
@ 2012-05-31 20:16 ` bugzilla-daemon
  2012-05-31 21:40 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-31 20:16 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #10 from Jan Kara <jack@suse.cz>  2012-05-31 20:16:27 ---
I have to say I'm puzzled. Everything looks normal, the IO is submitted and
completed, no sign of any hang. Just the last request - read - has been only
submitted to disk and not yet completed but that can be just a coincidence. To
check what has been read, can you attach here output of dumpe2fs /dev/dm-2?

When the lockup happens, those nfsd threads are sitting there doing nothing for
a long time, am I right?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (9 preceding siblings ...)
  2012-05-31 20:16 ` bugzilla-daemon
@ 2012-05-31 21:40 ` bugzilla-daemon
  2012-05-31 22:44 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-31 21:40 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #11 from Florian Pritz <flo@xssn.at>  2012-05-31 21:40:38 ---
Created an attachment (id=73480)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=73480)
output of dumpe2fs /dev/dm-2

(In reply to comment #10)
> I have to say I'm puzzled. Everything looks normal, the IO is submitted and
> completed, no sign of any hang. Just the last request - read - has been only
> submitted to disk and not yet completed but that can be just a coincidence. To
> check what has been read, can you attach here output of dumpe2fs /dev/dm-2?

I've run it without the hanging nfsd, is that correct?

> When the lockup happens, those nfsd threads are sitting there doing nothing for
> a long time, am I right?

Yeah. All nfs activity is frozen, but local system activity on the same file
system works just fine.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (10 preceding siblings ...)
  2012-05-31 21:40 ` bugzilla-daemon
@ 2012-05-31 22:44 ` bugzilla-daemon
  2012-05-31 22:49 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-31 22:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #12 from Jan Kara <jack@suse.cz>  2012-05-31 22:44:19 ---
Oh, that explains it. I wanted blktrace around the moment when nfsd hangs. So
just start blktrace and let it run until nfsd hangs... Thanks.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (11 preceding siblings ...)
  2012-05-31 22:44 ` bugzilla-daemon
@ 2012-05-31 22:49 ` bugzilla-daemon
  2012-06-01 15:24 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-05-31 22:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #13 from Florian Pritz <flo@xssn.at>  2012-05-31 22:49:02 ---
(In reply to comment #12)
> Oh, that explains it. I wanted blktrace around the moment when nfsd hangs. So
> just start blktrace and let it run until nfsd hangs... Thanks.

I meant, I ran dumpe2fs without nfsd hanging. As for blktrace, I ran that
before nfsd hung and waited a few seconds after it hung before I killed it.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (12 preceding siblings ...)
  2012-05-31 22:49 ` bugzilla-daemon
@ 2012-06-01 15:24 ` bugzilla-daemon
  2012-06-01 15:40 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-01 15:24 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #14 from Jan Kara <jack@suse.cz>  2012-06-01 15:24:37 ---
OK, something strange is going on. The block trace ends like:
253,2    0     3716    19.264276270   751  Q   R 485490688 + 8 [nfsd]
  9,0    0     3699    19.264279134   751  A   R 527540608 + 8 <- (253,2)
485490688
  9,0    0     3700    19.264280391   751  Q   R 527540608 + 8 [nfsd]
  8,32   0    14793    19.264283254   751  A   R 527540672 + 8 <- (8,33)
527540608
  8,32   0    14794    19.264284162   751  Q   R 527540672 + 8 [nfsd]
  8,32   0    14795    19.264286537   751  G   R 527540672 + 8 [nfsd]
  8,32   0    14796    19.264288073   751  P   N [nfsd]
  8,32   0    14797    19.264290727   751  I   R 527540672 + 8 [nfsd]
  8,32   0    14798    19.264292264   751  U   N [nfsd] 1
  8,32   0    14799    19.264293940   751  D   R 527540672 + 8 [nfsd]

We see request to read 8 sectors starting at 485490688 has been issued to
/dev/dm-2. I've checked that against dumpe2fs output and indeed that is a block
bitmap for group 1852. That request has gone through md-0 to sdc where it has
been dispatched to the disk driver for processing. All is well but for some
reason the read never completed which is why nfsd hangs.

So this doesn't look like a problem in the filesystem or block layer but rather
somewhere in the driver or hardware. Can you check what happens if you just run

dd if=/dev/dm-2 of=/dev/null bs=512 count=8 skip=485490688

Does the command finish? And what if you run it after nfsd has hung? If it does
finish in both cases, can you gather the blktrace of hanging once more like
before so that I can check whether we hang waiting for the same block the
second time as well? Thanks!

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (13 preceding siblings ...)
  2012-06-01 15:24 ` bugzilla-daemon
@ 2012-06-01 15:40 ` bugzilla-daemon
  2012-06-01 16:26 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-01 15:40 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #15 from Florian Pritz <flo@xssn.at>  2012-06-01 15:40:50 ---
Created an attachment (id=73488)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=73488)
2nd blktrace

(In reply to comment #14)
> dd if=/dev/dm-2 of=/dev/null bs=512 count=8 skip=485490688

Finishes before and after the hang. 

I've attached another blktrace, but I let it run for like 20 seconds after nfsd
hung up this time instead of only a few seconds.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (14 preceding siblings ...)
  2012-06-01 15:40 ` bugzilla-daemon
@ 2012-06-01 16:26 ` bugzilla-daemon
  2012-06-03 20:16 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-01 16:26 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #16 from Jan Kara <jack@suse.cz>  2012-06-01 16:26:59 ---
Grumble, it's the same, just for a different allocation group:
253,2    0     6514    25.375884394   744  Q   R 707526656 + 8 [nfsd]
  9,0    0     5451    25.375887257   744  A   R 749576576 + 8 <- (253,2)
707526
656
  9,0    0     5452    25.375888584   744  Q   R 749576576 + 8 [nfsd]
  8,16   0    17311    25.375891308   744  A   R 749576639 + 8 <- (8,17)
7495765
76
  8,16   0    17312    25.375892077   744  Q   R 749576639 + 8 [nfsd]
  8,16   0    17313    25.375894381   744  G   R 749576639 + 8 [nfsd]
  8,16   0    17314    25.375895918   744  P   N [nfsd]
  8,16   0    17315    25.375898572   744  I   R 749576639 + 8 [nfsd]
  8,16   0    17316    25.375900038   744  U   N [nfsd] 1
  8,16   0    17317    25.375901715   744  D   R 749576639 + 8 [nfsd]

Let's try the following:
for (( i = 0; i < 25293; i++ )); do
  dd if=/dev/dm-2 of=/dev/null bs=4k count=1 skip=$(($i*32768))
done

Again run it before and after the hang. If it finishes even after the hang,
check whether nfsd is still hung.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (15 preceding siblings ...)
  2012-06-01 16:26 ` bugzilla-daemon
@ 2012-06-03 20:16 ` bugzilla-daemon
  2012-06-04 10:19 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-03 20:16 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #17 from Florian Pritz <flo@xssn.at>  2012-06-03 20:16:10 ---
It finishes in both cases.

I've run it directly after nfs hung and when the loop completed nfs was running
again. Normally it takes ~2minutes until nfs unfreezes, but in this case it was
faster.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (16 preceding siblings ...)
  2012-06-03 20:16 ` bugzilla-daemon
@ 2012-06-04 10:19 ` bugzilla-daemon
  2012-06-06 21:56 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-04 10:19 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #18 from Jan Kara <jack@suse.cz>  2012-06-04 10:19:33 ---
OK, thanks for testing. So you say the problem resolves itself even when you
don't run dd, just it takes longer? I'm more and more inclined to blame some
hardware strangeness... Also originally you said the problem started happening
around 3.3 or 3.4 but now you are testing with 3.0. So do you have any kernel
where you don't see the problem?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (17 preceding siblings ...)
  2012-06-04 10:19 ` bugzilla-daemon
@ 2012-06-06 21:56 ` bugzilla-daemon
  2012-06-07  9:29 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-06 21:56 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #19 from Florian Pritz <flo@xssn.at>  2012-06-06 21:56:31 ---
(In reply to comment #18)
> OK, thanks for testing. So you say the problem resolves itself even when you
> don't run dd, just it takes longer? I'm more and more inclined to blame some
> hardware strangeness... Also originally you said the problem started happening
> around 3.3 or 3.4 but now you are testing with 3.0. So do you have any kernel
> where you don't see the problem?

So far I can *not* reproduce the problem with 3.0.17 (transfered 8gb of data).
However, with 3.0.32 I can reproduce every time by copying a 20mb+ file.

If you want, I can test a few kernels in between .17 and .32, but that's going
to take some time.

FYI: I do notice short hickups on the client, but those are most likely just
because the disks aren't fast enough to handle the write requests. I'm running
with Gbit ethernet and reading is obviously faster than writing and writing is
slower than the link limit, resulting in the network buffers filling up. I
don't consider this a bug.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (18 preceding siblings ...)
  2012-06-06 21:56 ` bugzilla-daemon
@ 2012-06-07  9:29 ` bugzilla-daemon
  2012-06-22 20:45 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-07  9:29 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #20 from Jan Kara <jack@suse.cz>  2012-06-07 09:29:00 ---
Ah, that's interesting to know! It would be great if you could narrow down in
which -stable update the problem got introduced (that should take like 4 tests)
and then I can have a look at patches in that release and see whether something
that catches my eye.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (19 preceding siblings ...)
  2012-06-07  9:29 ` bugzilla-daemon
@ 2012-06-22 20:45 ` bugzilla-daemon
  2012-06-27 13:49 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-22 20:45 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #21 from Florian Pritz <bluewind@xinu.at>  2012-06-22 20:45:13 ---
Finally got around to testing. Looks like it break from 3.0.22 -> .23. 

I've used the Arch Linux packages for testing and noticed that there have been
a few config changes [1] at that time. Could those be related? I've already
tried booting .23 with transparent_hugepage=never, but that didn't help.

[1]:
https://projects.archlinux.org/svntogit/packages.git/log/trunk?h=packages/linux-lts

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (20 preceding siblings ...)
  2012-06-22 20:45 ` bugzilla-daemon
@ 2012-06-27 13:49 ` bugzilla-daemon
  2012-07-08 14:44 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-06-27 13:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #22 from Jan Kara <jack@suse.cz>  2012-06-27 13:49:47 ---
Hum, I'm not sure how to actually find out what has changed in that package.
The reference you gave shows just packaging changes. Looking at the changes
between 3.0.22 and 3.0.23 there are commits:
genirq: Unmask oneshot irqs when thread was not woken
genirq: Handle pending irqs in irq_startup()
which could be related (and actually one of them really caused problems). Those
problems should supposedly be fixed in 3.0.24 but maybe they are not.

So can you test 3.0.23 with those two patches reverted? And can you test
whether you see hang with 3.0.24 as well? Thanks!

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (21 preceding siblings ...)
  2012-06-27 13:49 ` bugzilla-daemon
@ 2012-07-08 14:44 ` bugzilla-daemon
  2012-07-10 13:12 ` bugzilla-daemon
  2012-11-02 15:46 ` bugzilla-daemon
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-07-08 14:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #23 from Florian Pritz <bluewind@xinu.at>  2012-07-08 14:44:49 ---
Alright, after bisecting and trying lots of kernels, I believe it's not a code
change causing this, but rather a change [1] in Arch's config file.

3.0.22 with the ext3 module works fine, but if I apply that patch to use ext4
for ext3 file systems, it hangs. 
3.0.23 with the ext3 module works too.

Since I don't know if it ever worked with that option enabled, I don't think
bisecting will help. Any ideas?

[1]:
https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux-lts&id=7c53d3c600983af5a021315b1ef27ed6d4807170

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (22 preceding siblings ...)
  2012-07-08 14:44 ` bugzilla-daemon
@ 2012-07-10 13:12 ` bugzilla-daemon
  2012-11-02 15:46 ` bugzilla-daemon
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-07-10 13:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292





--- Comment #24 from Jan Kara <jack@suse.cz>  2012-07-10 13:12:45 ---
Thanks a lot for trying out the kernels. I don't see how using ext4 instead of
ext3 can make the difference we observed - i.e. a submitted read request is not
completed. I guess I'll try to use ext4 to mount ext3 filesystem and beat it up
a bit to see whether I'll reproduce the problem locally for further
investigation.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug 43292] jdb2 lockup with ext3 and nfs
  2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
                   ` (23 preceding siblings ...)
  2012-07-10 13:12 ` bugzilla-daemon
@ 2012-11-02 15:46 ` bugzilla-daemon
  24 siblings, 0 replies; 26+ messages in thread
From: bugzilla-daemon @ 2012-11-02 15:46 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=43292


Florian Pritz <bluewind@xinu.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |UNREPRODUCIBLE




--- Comment #25 from Florian Pritz <bluewind@xinu.at>  2012-11-02 15:46:21 ---
I've recently switched my file server to xfs and can no longer reproduce this.
Since I seem to have been the only one with this issue, I'm closing this bug.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-11-02 15:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-24 15:54 [Bug 43292] New: jdb2 lockup with ext3 and nfs bugzilla-daemon
2012-05-26 20:31 ` [Bug 43292] " bugzilla-daemon
2012-05-29 14:04 ` bugzilla-daemon
2012-05-29 21:44 ` bugzilla-daemon
2012-05-30  9:18 ` bugzilla-daemon
2012-05-30  9:24 ` bugzilla-daemon
2012-05-30  9:41 ` bugzilla-daemon
2012-05-30  9:47 ` bugzilla-daemon
2012-05-30 13:33 ` bugzilla-daemon
2012-05-30 21:20 ` bugzilla-daemon
2012-05-31 20:16 ` bugzilla-daemon
2012-05-31 21:40 ` bugzilla-daemon
2012-05-31 22:44 ` bugzilla-daemon
2012-05-31 22:49 ` bugzilla-daemon
2012-06-01 15:24 ` bugzilla-daemon
2012-06-01 15:40 ` bugzilla-daemon
2012-06-01 16:26 ` bugzilla-daemon
2012-06-03 20:16 ` bugzilla-daemon
2012-06-04 10:19 ` bugzilla-daemon
2012-06-06 21:56 ` bugzilla-daemon
2012-06-07  9:29 ` bugzilla-daemon
2012-06-22 20:45 ` bugzilla-daemon
2012-06-27 13:49 ` bugzilla-daemon
2012-07-08 14:44 ` bugzilla-daemon
2012-07-10 13:12 ` bugzilla-daemon
2012-11-02 15:46 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.