* unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
@ 2004-12-28 11:24 Gildas LE NADAN
2004-12-28 11:39 ` bert hubert
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Gildas LE NADAN @ 2004-12-28 11:24 UTC (permalink / raw)
To: linux-kernel
Hi,
I experience hangs on samba processes on a filer using xfs over lvm2 as
data partitions, when there is active snapshots of the xfs partitions.
I have a clone of the production server (same software, same hardware)
where the situation can be reproduced perfectly.
Testings showed that the result was the same, whether the snapshots were
mounted or not : smbd processes are locked and unkillable while the
machine is normaly working otherwise, except software reboot is
impossible and hardware reset is needed.
I noticed Brad Fitzpatrick's case in kernel 2.6.10 changelog
(http://lkml.org/lkml/2004/11/14/98) and tested kernel 2.6.10 today
without success.
Configuration is the following :
- supermicro m/b with dual Xeon 2,8Ghz (SMT is active)
- 1 GB ram,
- adaptec u320 raid controler
- kernel 2.6.10
- debian sarge
- samba 3
- LVM2
- XFS with quota turned on
All software are from debian sarge packages, except the kernel.
I'm not able to determine if the problem is more xfs, device mapper or
samba related, and was not able to do extensive testings (using a
different filesystem, testing with a different daemon, etc...), but
SMT/SMP testings showed that this is not a SMP/SMT related problem.
I've compiled the kernel with the debugging options, so I might provide
additional informations if needed as in Brad's case.
Sincerely,
Gildas LE NADAN
(Please CC me as I didn't suscribe to LKML)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
@ 2004-12-28 11:39 ` bert hubert
2004-12-28 15:15 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) [includes backtrace] Gildas LE NADAN
2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: bert hubert @ 2004-12-28 11:39 UTC (permalink / raw)
To: Gildas LE NADAN; +Cc: linux-kernel
On Tue, Dec 28, 2004 at 12:24:01PM +0100, Gildas LE NADAN wrote:
> I experience hangs on samba processes on a filer using xfs over lvm2 as
> data partitions, when there is active snapshots of the xfs partitions.
A trick is to enable alt-sysrq and press alt-sysrq-t (I think) which spams
your syslog with backtraces of all processes currently running, including
the ones stuck in 'D' state (ps aux | grep " D ").
If you isolate these backtraces and send them to this list, they will enable
developers to help you. Make sure you add 'includes backtrace' in your
Subject.
> Testings showed that the result was the same, whether the snapshots were
> mounted or not : smbd processes are locked and unkillable while the
> machine is normaly working otherwise, except software reboot is
> impossible and hardware reset is needed.
For maximum usefulness, make your setup as simple as possible and reproduce.
Good luck - I personally can't help you in any real way, except to help you
get the debugging information that is needed.
--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
2004-12-28 11:39 ` bert hubert
@ 2004-12-28 14:07 ` Gene Heskett
2005-01-02 12:41 ` Christian Leber
2004-12-29 18:01 ` Julien BLACHE
2005-01-05 11:37 ` Christoph Hellwig
3 siblings, 1 reply; 7+ messages in thread
From: Gene Heskett @ 2004-12-28 14:07 UTC (permalink / raw)
To: linux-kernel; +Cc: Gildas LE NADAN
On Tuesday 28 December 2004 06:24, Gildas LE NADAN wrote:
>Hi,
>
>I experience hangs on samba processes on a filer using xfs over lvm2
> as data partitions, when there is active snapshots of the xfs
> partitions.
>
>I have a clone of the production server (same software, same
> hardware) where the situation can be reproduced perfectly.
>
>Testings showed that the result was the same, whether the snapshots
> were mounted or not : smbd processes are locked and unkillable
> while the machine is normaly working otherwise, except software
> reboot is impossible and hardware reset is needed.
>
>I noticed Brad Fitzpatrick's case in kernel 2.6.10 changelog
>(http://lkml.org/lkml/2004/11/14/98) and tested kernel 2.6.10 today
>without success.
>
>Configuration is the following :
>- supermicro m/b with dual Xeon 2,8Ghz (SMT is active)
>- 1 GB ram,
>- adaptec u320 raid controler
>- kernel 2.6.10
>- debian sarge
>- samba 3
>- LVM2
>- XFS with quota turned on
>
>All software are from debian sarge packages, except the kernel.
>
>I'm not able to determine if the problem is more xfs, device mapper
> or samba related, and was not able to do extensive testings (using
> a different filesystem, testing with a different daemon, etc...),
> but SMT/SMP testings showed that this is not a SMP/SMT related
> problem.
>
>I've compiled the kernel with the debugging options, so I might
> provide additional informations if needed as in Brad's case.
>
>Sincerely,
>Gildas LE NADAN
>(Please CC me as I didn't suscribe to LKML)
I have a somewhat similar case here, samba processses are unkillable,
but I can do a software reboot. Something is also killing amandad,
and I lost the backup of this machine last night. The amanda logs
are bereft of any info and I've no clue that its happened except a
message from amanda that the client access timed out on this machine.
This was while running 2.6.10-rc3-mm1-V0.33-04 which ran stably for 8
days previously.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) [includes backtrace]
2004-12-28 11:39 ` bert hubert
@ 2004-12-28 15:15 ` Gildas LE NADAN
0 siblings, 0 replies; 7+ messages in thread
From: Gildas LE NADAN @ 2004-12-28 15:15 UTC (permalink / raw)
To: linux-kernel
>>I experience hangs on samba processes on a filer using xfs over lvm2 as
>>data partitions, when there is active snapshots of the xfs partitions.
>
> A trick is to enable alt-sysrq and press alt-sysrq-t (I think) which spams
> your syslog with backtraces of all processes currently running, including
> the ones stuck in 'D' state (ps aux | grep " D ").
>
> If you isolate these backtraces and send them to this list, they will enable
> developers to help you. Make sure you add 'includes backtrace' in your
> Subject.
OK, this is what I get after provoking the problem on the test server
(copying 1Go of data is enough to trigger the problem) :
# ps afx | grep smbd
2279 ? Ss 0:00 /usr/sbin/smbd -D
2288 ? S 0:00 \_ /usr/sbin/smbd -D
2447 ? D 0:01 \_ /usr/sbin/smbd -D
2487 pts/0 S+ 0:00 | \_ grep smbd
# killall -9 smbd
# ps afx | grep smbd
2554 pts/0 S+ 0:00 | \_ grep smbd
2447 ? D 0:01 /usr/sbin/smbd -D
I did a "echo t > /proc/sysrq-trigger" and tried to clean the resulting
logs a bit before sending. Hope this gives enough info, otherwise I kept
the whole log so I can send whatever part is needed
SysRq : Show State
sibling
task PC pid father child younger older
...
xfslogd/0 S 00000004 0 218 11 220 216 (L-TLB)
f7eecf44 00000046 f7eecf34 00000004 00000002 f60ef53c c0427ba0 f60ef5a8
00000282 c01017cc 00000000 f7f28974 f7f2896c 00000000 c170f020
00000000
00000c41 ff6027e0 00000005 00000286 f7eb9530 f7eb96b0 f7eecf94
00000002
Call Trace:
[__up+28/32] __up+0x1c/0x20
[worker_thread+565/608] worker_thread+0x235/0x260
[pagebuf_iodone_work+0/80] pagebuf_iodone_work+0x0/0x50
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfslogd/1 S 00000004 0 219 10 221 217 (L-TLB)
f7c82f44 00000046 f7c82f30 00000004 00000001 ffffffff f7eb9020 35a49146
00000000 f7eb9020 c170f020 f7eb9020 00000000 c1717a00 c1717020
00000001
000008ae 0395f3e5 00000000 c171705c f7c7e020 f7c7e1a0 00000001
f7f289dc
Call Trace:
[worker_thread+565/608] worker_thread+0x235/0x260
[schedule+1132/3360] schedule+0x46c/0xd20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfslogd/2 S 00000004 0 220 11 222 218 (L-TLB)
f7eedf44 00000046 f7eedf34 00000004 00000004 00000000 f7c1f020 c01f4a99
f714b13c 00000000 00000000 f7f28a74 f7f28a6c 00000000 c171f020
00000002
00000d47 6261fbfd 00000074 00000286 f7eb9020 f7eb91a0 f7eedf94
00000008
Call Trace:
[xfs_buf_iodone_callbacks+361/368] xfs_buf_iodone_callbacks+0x169/0x170
[worker_thread+565/608] worker_thread+0x235/0x260
[pagebuf_iodone_work+0/80] pagebuf_iodone_work+0x0/0x50
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfslogd/3 S 00000004 0 221 10 223 219 (L-TLB)
f7c84f44 00000046 f7c84f34 00000004 00000003 ffffffff f7c27a40 35a47b19
00000000 03969a3b 03969a3b 00000000 f7c84f28 c0116200 c1727020
00000003
00000ef7 0396cbec 00000000 00000286 f7c83a40 f7c83bc0 f7c84f94
00000004
Call Trace:
[activate_task+144/176] activate_task+0x90/0xb0
[worker_thread+565/608] worker_thread+0x235/0x260
[schedule+1132/3360] schedule+0x46c/0xd20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfsdatad/0 S 00000004 0 222 11 224 220 (L-TLB)
f7f06f44 00000046 f7f06f30 00000004 00000002 ffffffff f7c83530 35a48050
00000000 f7c83530 c1717020 f7c83530 00000000 c170fa00 c170f020
00000000
00000897 0397ce5a 00000000 c170f05c f7f05a40 f7f05bc0 00000002
f7f28550
Call Trace:
[worker_thread+565/608] worker_thread+0x235/0x260
[schedule+1132/3360] schedule+0x46c/0xd20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfsdatad/1 S 00000004 0 223 10 225 221 (L-TLB)
f7c85f44 00000046 f7c85f30 00000004 00000001 ffffffff f7f05530 35a47efe
00000000 f7f05530 c170f020 f7f05530 00000000 c1717a00 c1717020
00000001
000008f9 0398532d 00000000 c171705c f7c83530 f7c836b0 00000001
f7f285d0
Call Trace:
[worker_thread+565/608] worker_thread+0x235/0x260
[schedule+1132/3360] schedule+0x46c/0xd20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfsdatad/2 S 00000004 0 224 11 903 222 (L-TLB)
f7f07f44 00000046 f7f07f34 00000004 00000004 ffffffff f7c1f020 35a493ed
00000000 03985b99 03985b99 00000000 f7f07f28 c0116200 c171f020
00000002
00000d7c 03988e25 00000000 00000286 f7f05530 f7f056b0 f7f07f94
00000008
Call Trace:
[activate_task+144/176] activate_task+0x90/0xb0
[worker_thread+565/608] worker_thread+0x235/0x260
[schedule+1132/3360] schedule+0x46c/0xd20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfsdatad/3 S 00000004 0 225 10 902 223 (L-TLB)
f7c87f44 00000046 f7c87f34 00000004 00000003 ffffffff f7c27a40 35a49032
00000000 0398e44a 0398e44a 00000000 f7c87f28 c0116200 c1727020
00000003
000010b0 0399175f 00000000 00000286 f7c83020 f7c831a0 f7c87f94
00000004
Call Trace:
[activate_task+144/176] activate_task+0x90/0xb0
[worker_thread+565/608] worker_thread+0x235/0x260
[schedule+1132/3360] schedule+0x46c/0xd20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[default_wake_function+0/32] default_wake_function+0x0/0x20
[worker_thread+0/608] worker_thread+0x0/0x260
[kthread+186/192] kthread+0xba/0xc0
[kthread+0/192] kthread+0x0/0xc0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfsbufd S 00000004 0 226 1 815 213 (L-TLB)
f7f08f78 00000046 f7f08f68 00000004 00000001 00000000 f7c1f530 c02c1edb
f7f70e64 00000000 f7eaa944 c0264f5f 00000004 c04f99e8 c1717020
00000001
00000134 6d117e89 0000009e c0125879 f7f05020 f7f051a0 00000000
00000001
Call Trace:
[elv_next_request+27/256] elv_next_request+0x1b/0x100
[kobject_put+31/48] kobject_put+0x1f/0x30
[__mod_timer+249/320] __mod_timer+0xf9/0x140
[schedule_timeout+117/208] schedule_timeout+0x75/0xd0
[process_timeout+0/16] process_timeout+0x0/0x10
[dm_unplug_all+39/64] dm_unplug_all+0x27/0x40
[blk_backing_dev_unplug+0/32] blk_backing_dev_unplug+0x0/0x20
[pagebuf_daemon+118/512] pagebuf_daemon+0x76/0x200
[pagebuf_daemon+0/512] pagebuf_daemon+0x0/0x200
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
...
xfssyncd S 00000004 0 1361 1 1362 1360 (L-TLB)
f756ef74 00000046 f756ef64 00000004 00000002 f5735568 c0427ba0 f5735568
f756ef2c 0000022e 00000031 f714be3c f5735568 00000000 c170f020
00000000
000034a1 58c63e72 00000098 c0125879 f6fb6a40 f6fb6bc0 00000000
00000002
Call Trace:
[__mod_timer+249/320] __mod_timer+0xf9/0x140
[schedule_timeout+117/208] schedule_timeout+0x75/0xd0
[process_timeout+0/16] process_timeout+0x0/0x10
[xfssyncd+134/480] xfssyncd+0x86/0x1e0
[xfssyncd+0/480] xfssyncd+0x0/0x1e0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
xfssyncd S 00000004 0 1362 1 2233 1361 (L-TLB)
f6accf74 00000046 f6accf64 00000004 00000002 f60f4360 c0427ba0 f60efc3c
c050ad58 c023a3ee 00000031 f60efc3c f6d6ccd0 00000000 c170f020
00000000
00001568 6080d951 00000098 c0125879 f6a95530 f6a956b0 00000000
00000002
Call Trace:
[pagebuf_rele+46/240] pagebuf_rele+0x2e/0xf0
[__mod_timer+249/320] __mod_timer+0xf9/0x140
[schedule_timeout+117/208] schedule_timeout+0x75/0xd0
[process_timeout+0/16] process_timeout+0x0/0x10
[xfssyncd+134/480] xfssyncd+0x86/0x1e0
[xfssyncd+0/480] xfssyncd+0x0/0x1e0
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
...
smbd S 00000004 0 2279 1 2288 2285 2277 (NOTLB)
f5110ea4 00000082 f5110e90 00000004 00000002 c013ed74 f6770020 c042cd80
000000d0 f6770020 c1717020 f6770020 00000000 c170fa00 c170f020
00000000
0000b4c6 78f5a598 0000006d c170f05c f779a530 f779a6b0 00000002
f5d37028
Call Trace:
[__alloc_pages+484/928] __alloc_pages+0x1e4/0x3a0
[schedule_timeout+199/208] schedule_timeout+0xc7/0xd0
[tcp_poll+52/400] tcp_poll+0x34/0x190
[handle_mm_fault+344/384] handle_mm_fault+0x158/0x180
[add_wait_queue+29/80] add_wait_queue+0x1d/0x50
[pipe_poll+52/128] pipe_poll+0x34/0x80
[do_select+401/736] do_select+0x191/0x2e0
[__pollwait+0/208] __pollwait+0x0/0xd0
[sys_select+731/1456] sys_select+0x2db/0x5b0
[syscall_call+7/11] syscall_call+0x7/0xb
...
smbd D 00000004 0 2447 2279 2288 (NOTLB)
f6736bbc 00000082 f6736bac 00000004 00000002 00000000 c0427ba0 00000000
f6770020 c0118350 00000000 00000000 c17ff080 00000007 c170f020
00000000
00008e96 5602776d 00000071 00000000 f6770020 f67701a0 c023a197
00000002
Call Trace:
[default_wake_function+0/32] default_wake_function+0x0/0x20
[pagebuf_associate_memory+103/400] pagebuf_associate_memory+0x67/0x190
[schedule_timeout+199/208] schedule_timeout+0xc7/0xd0
[xlog_sync+630/1216] xlog_sync+0x276/0x4c0
[xlog_state_release_iclog+91/272] xlog_state_release_iclog+0x5b/0x110
[add_wait_queue_exclusive+26/80] add_wait_queue_exclusive+0x1a/0x50
[xlog_state_sync+602/656] xlog_state_sync+0x25a/0x290
[default_wake_function+0/32] default_wake_function+0x0/0x20
[xlog_assign_tail_lsn+73/128] xlog_assign_tail_lsn+0x49/0x80
[default_wake_function+0/32] default_wake_function+0x0/0x20
[xfs_log_force+132/144] xfs_log_force+0x84/0x90
[xfs_trans_commit+631/1008] xfs_trans_commit+0x277/0x3f0
[xfs_trans_dup+191/208] xfs_trans_dup+0xbf/0xd0
[xfs_itruncate_finish+593/1072] xfs_itruncate_finish+0x251/0x430
[xfs_setattr+3578/4128] xfs_setattr+0xdfa/0x1020
[linvfs_setattr+258/384] linvfs_setattr+0x102/0x180
[kmem_cache_alloc+114/192] kmem_cache_alloc+0x72/0xc0
[linvfs_setattr+0/384] linvfs_setattr+0x0/0x180
[notify_change+334/400] notify_change+0x14e/0x190
[do_truncate+147/208] do_truncate+0x93/0xd0
[fget+73/96] fget+0x49/0x60
[sys_ftruncate64+204/304] sys_ftruncate64+0xcc/0x130
[sys_open+108/144] sys_open+0x6c/0x90
[syscall_call+7/11] syscall_call+0x7/0xb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
2004-12-28 11:39 ` bert hubert
2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
@ 2004-12-29 18:01 ` Julien BLACHE
2005-01-05 11:37 ` Christoph Hellwig
3 siblings, 0 replies; 7+ messages in thread
From: Julien BLACHE @ 2004-12-29 18:01 UTC (permalink / raw)
To: Gildas LE NADAN; +Cc: linux-kernel
Gildas LE NADAN <gildas.le-nadan@inha.fr> wrote:
> I experience hangs on samba processes on a filer using xfs over lvm2
> as data partitions, when there is active snapshots of the xfs
> partitions.
Your problem probably lies between lvm2 and XFS. I got the same
problems this summer while doing the exact same thing.
The server would just completely hang once I started doing lvm
snapshots:
-> at the beginning, the snapshots would work OK, but XFS would hang
when accessing the filesystem afterwards
-> after a while (usually 2 or 3 snapshots, and I was taking a
snapshot every 2 hours), the snapshot would not complete, and
then only a hard reboot would work
I was doing the snapshots from a crontab, the script used xfs_freeze
to freeze the filesystem before doing the snapshot (and unfreeze it
afterwards, of course). Sometimes xfs_freeze -u would hang too (but at
this time, the server was in a pretty bad state already).
The server wasn't loaded at all, we were doing some reads/writes
through samba to have some modified files lying around, but we were
mainly prototyping the server, not stress-testing it.
LVM and XFS just don't play nice together when it comes to snapshots,
I thought it had been fixed already, but it's not the case, as we both
know...
(I can't remember the kernel version, it could have been a 2.4 kernel,
but I was using LVM2 and the latest XFS code available)
Feel free to correct me if I did something wrong (but AFAIK I took
care of everything, knowing there could have been bad interactions
between LVM and XFS).
JB.
--
Julien BLACHE <http://www.jblache.org>
<jb@jblache.org> GPG KeyID 0xF5D65169
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
@ 2005-01-02 12:41 ` Christian Leber
0 siblings, 0 replies; 7+ messages in thread
From: Christian Leber @ 2005-01-02 12:41 UTC (permalink / raw)
To: linux-kernel
On Tue, Dec 28, 2004 at 09:07:01AM -0500, Gene Heskett wrote:
> I have a somewhat similar case here, samba processses are unkillable,
> but I can do a software reboot. Something is also killing amandad,
> and I lost the backup of this machine last night. The amanda logs
> are bereft of any info and I've no clue that its happened except a
> message from amanda that the client access timed out on this machine.
> This was while running 2.6.10-rc3-mm1-V0.33-04 which ran stably for 8
> days previously.
I have the same problem (2.6.10-rc3 running 7 days without problems) and
i had D state mc, smbd and lsof:
(there is something about nfs in the call tree, it _might_ be that i
halted a nfs server the day before without unmounting it on the system with the
problem)
Dec 31 18:55:20 core kernel: nfs warning: mount version older than kernel
Jan 1 06:25:38 core kernel: nfs: server igor3 not responding, still trying
Jan 1 17:29:15 core kernel: nfs: server igor3 not responding, still trying
Jan 1 18:05:12 core kernel: (NOTLB)
Jan 1 18:05:12 core kernel: c843deb4 00200086 f68a6580 c04c0150 000274ab c18e57e0 587f3ab2 000274ab
Jan 1 18:05:12 core kernel: 00003875 5880d98d 000274ab d696d560 d696d6bc 00000014 d696d560 c843c000
Jan 1 18:05:12 core kernel: ffffe000 c011cca2 d696d560 ecc030e0 00040005 c011d00f ffffffff d696d9d4
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c011cca2>] finish_stop+0x42/0x90
Jan 1 18:05:12 core kernel: [<c011d00f>] get_signal_to_deliver+0x19f/0x2b0
Jan 1 18:05:12 core kernel: [<c0102388>] do_signal+0x98/0x130
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c01059b2>] sys_ptrace+0xb2/0x610
Jan 1 18:05:12 core kernel: [<c0102455>] do_notify_resume+0x35/0x38
Jan 1 18:05:12 core kernel: [<c0102596>] work_notifysig+0x13/0x15
Jan 1 18:05:12 core kernel: mc D C04C0120 0 21514 30032 21516 (NOTLB)
Jan 1 18:05:12 core kernel: f0269da8 00000082 d1edba20 c04c0120 00000000 00000292 f7de72a4 f0269da8
Jan 1 18:05:12 core kernel: 000f8c0f 36b16015 00026d31 d1edba20 d1edbb7c df1cbe94 df1cbda0 f0269dcc
Jan 1 18:05:12 core kernel: df1cbeb8 c01d4a65 c1b25560 f7de72a4 c01af638 00000000 d6aba200 00000000
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan 1 18:05:12 core kernel: [<c01af638>] ext3_mark_iloc_dirty+0x28/0x40
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c01545a7>] follow_mount+0x57/0xa0
Jan 1 18:05:12 core kernel: [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan 1 18:05:12 core kernel: [<c0160120>] update_atime+0xd0/0xe0
Jan 1 18:05:12 core kernel: [<c0154e6d>] link_path_walk+0x73d/0xb60
Jan 1 18:05:12 core kernel: [<c01bd3c9>] journal_stop+0x149/0x200
Jan 1 18:05:12 core kernel: [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan 1 18:05:12 core kernel: [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan 1 18:05:12 core kernel: [<c01509f9>] vfs_getattr+0x39/0xa0
Jan 1 18:05:12 core kernel: [<c0150aaf>] vfs_stat+0x4f/0x60
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c015a1ff>] fifo_open+0x13f/0x26f
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: mozilla-bin S C04C0120 0 2810 2994 2859 (NOTLB)
Jan 1 18:05:12 core kernel: e6135f10 00200086 dde78a00 c04c0120 e6135fa0 cb94fa98 c012ff73 c1125b60
Jan 1 18:05:12 core kernel: 00002623 41070976 00028179 dde78a00 dde78b5c 00000000 7fffffff e6135f68
Jan 1 18:05:12 core kernel: 7fffffff c0389a35 c01536c4 e4cb7d80 cb1db8e0 e6135fa0 00000145 e491a420
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c012ff73>] __get_free_pages+0x33/0x40
Jan 1 18:05:12 core kernel: [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan 1 18:05:12 core kernel: [<c01536c4>] pipe_poll+0x34/0x80
Jan 1 18:05:12 core kernel: [<c0159d1f>] do_pollfd+0x4f/0x90
Jan 1 18:05:12 core kernel: [<c0159e0a>] do_poll+0xaa/0xd0
Jan 1 18:05:12 core kernel: [<c0159f82>] sys_poll+0x152/0x210
Jan 1 18:05:12 core kernel: [<c0159370>] __pollwait+0x0/0xd0
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: mozilla-bin S C04C0120 0 2859 2994 2860 2810 (NOTLB)
Jan 1 18:05:12 core kernel: db23bf10 00200086 c46cda20 c04c0120 000000d0 2d9fbab8 000000d0 f3ca81a0
Jan 1 18:05:12 core kernel: 00000799 d2284a24 00028178 c46cda20 c46cdb7c 00000000 7fffffff db23bf68
Jan 1 18:05:12 core kernel: 7fffffff c0389a35 c01536c4 f3ca81a0 cb1dbf20 db23bfa0 00000145 da0da768
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan 1 18:05:12 core kernel: [<c01536c4>] pipe_poll+0x34/0x80
Jan 1 18:05:12 core kernel: [<c0159d1f>] do_pollfd+0x4f/0x90
Jan 1 18:05:12 core kernel: [<c0159e0a>] do_poll+0xaa/0xd0
Jan 1 18:05:12 core kernel: [<c0159f82>] sys_poll+0x152/0x210
Jan 1 18:05:12 core kernel: [<c0159370>] __pollwait+0x0/0xd0
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: mozilla-bin S C04C0150 0 2860 2994 2859 (NOTLB)
Jan 1 18:05:12 core kernel: c79ffe90 00200086 dde78a00 c04c0150 00028179 00000000 4105997b 00028179
Jan 1 18:05:12 core kernel: 0000159c 4105b238 00028179 c46cd540 c46cd69c 2a0aec95 c79ffea4 fffffff5
Jan 1 18:05:12 core kernel: c79ffedc c03899e3 c79ffea4 2a0aec95 c79ffec8 c04c60c8 c04c60c8 2a0aec95
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c03899e3>] schedule_timeout+0x63/0xc0
Jan 1 18:05:12 core kernel: [<c011aa90>] process_timeout+0x0/0x10
Jan 1 18:05:12 core kernel: [<c012585f>] futex_wait+0x12f/0x170
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c01535d0>] pipe_write+0x0/0x40
Jan 1 18:05:12 core kernel: [<c0125b18>] do_futex+0x48/0xa0
Jan 1 18:05:12 core kernel: [<c0125c5e>] sys_futex+0xee/0x100
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: xchat S C04C0120 0 7965 1 7966 26544 (NOTLB)
Jan 1 18:05:12 core kernel: e4e43f10 00200082 ce06d5a0 c04c0120 e4e43fa0 e93f8af8 c012ff73 d5703400
Jan 1 18:05:12 core kernel: 00000852 5500c143 0002817b ce06d5a0 ce06d6fc 2a0adc55 e4e43f24 e4e43f68
Jan 1 18:05:12 core kernel: 00000034 c03899e3 e4e43f24 2a0adc55 c86fd740 c04c5a10 c04c5a10 2a0adc55
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c012ff73>] __get_free_pages+0x33/0x40
Jan 1 18:05:12 core kernel: [<c03899e3>] schedule_timeout+0x63/0xc0
Jan 1 18:05:12 core kernel: [<c011aa90>] process_timeout+0x0/0x10
Jan 1 18:05:12 core kernel: [<c0159e0a>] do_poll+0xaa/0xd0
Jan 1 18:05:12 core kernel: [<c0159f82>] sys_poll+0x152/0x210
Jan 1 18:05:12 core kernel: [<c01163fb>] sys_gettimeofday+0x3b/0x80
Jan 1 18:05:12 core kernel: [<c0159370>] __pollwait+0x0/0xd0
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: xchat S C04C05C8 0 7966 1 12134 7965 (NOTLB)
Jan 1 18:05:12 core kernel: c30bfeb4 00200082 ce06d5a0 c04c05c8 0002817b 00000001 5500689d 0002817b
Jan 1 18:05:12 core kernel: 000006de 55006e0a 0002817b dde78520 dde7867c 00000000 7fffffff 00000006
Jan 1 18:05:12 core kernel: 00000006 c0389a35 c1346120 00000000 c01593f5 f64b54a4 00200246 f64b54a4
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan 1 18:05:12 core kernel: [<c01593f5>] __pollwait+0x85/0xd0
Jan 1 18:05:12 core kernel: [<c01536c4>] pipe_poll+0x34/0x80
Jan 1 18:05:12 core kernel: [<c0159693>] do_select+0x173/0x2b0
Jan 1 18:05:12 core kernel: [<c0159370>] __pollwait+0x0/0xd0
Jan 1 18:05:12 core kernel: [<c0159abf>] sys_select+0x2bf/0x4d0
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: smbd D C04C0120 0 9058 1 12172 26379 (NOTLB)
Jan 1 18:05:12 core kernel: e1d95c24 00000082 ed6365a0 c04c0120 f7cbdce0 c03762de d0130400 f7cbdce0
Jan 1 18:05:12 core kernel: 00011688 13c6a05f 000280e1 ed6365a0 ed6366fc f7cbdce0 e1d95c60 f7cbdd58
Jan 1 18:05:12 core kernel: e1d95c40 c037800d f7cbdce0 00000000 da6b95f8 e1d94000 00000000 00000000
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c03762de>] xprt_prepare_transmit+0x7e/0xc0
Jan 1 18:05:12 core kernel: [<c037800d>] __rpc_execute+0x13d/0x3c0
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c03786b6>] rpc_new_task+0x36/0xb0
Jan 1 18:05:12 core kernel: [<c0373a24>] rpc_call_sync+0x74/0xb0
Jan 1 18:05:12 core kernel: [<c01dd6fa>] nfs3_rpc_wrapper+0x3a/0x80
Jan 1 18:05:12 core kernel: [<c01ddc9a>] nfs3_proc_access+0xda/0x170
Jan 1 18:05:12 core kernel: [<c01bdc75>] __journal_file_buffer+0x175/0x230
Jan 1 18:05:12 core kernel: [<c01bd02f>] journal_dirty_metadata+0xef/0x170
Jan 1 18:05:12 core kernel: [<c037973c>] rpcauth_lookup_credcache+0x1cc/0x210
Jan 1 18:05:12 core kernel: [<c01d29c5>] nfs_do_access+0x65/0xb0
Jan 1 18:05:12 core kernel: [<c01d2b00>] nfs_permission+0xf0/0x170
Jan 1 18:05:12 core kernel: [<c0154241>] permission+0x51/0x60
Jan 1 18:05:12 core kernel: [<c01551c2>] link_path_walk+0xa92/0xb60
Jan 1 18:05:12 core kernel: [<c0160120>] update_atime+0xd0/0xe0
Jan 1 18:05:12 core kernel: [<c0154e0d>] link_path_walk+0x6dd/0xb60
Jan 1 18:05:12 core kernel: [<c01554e0>] path_lookup+0x70/0x110
Jan 1 18:05:12 core kernel: [<c0155733>] __user_walk+0x33/0x60
Jan 1 18:05:12 core kernel: [<c0150a7f>] vfs_stat+0x1f/0x60
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c0103ee8>] math_state_restore+0x28/0x50
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: lsof D C04C0120 0 12134 1 12137 7966 (NOTLB)
Jan 1 18:05:12 core kernel: c2341da8 00000086 c46cd060 c04c0120 c0389b45 d900a5f8 c0149040 c2341dd4
Jan 1 18:05:12 core kernel: 0000196b 064b7168 000280e4 c46cd060 c46cd1bc df1cbe94 df1cbda0 c2341dcc
Jan 1 18:05:12 core kernel: df1cbeb8 c01d4a65 c0124cc0 c2341dd4 c2341dd4 00000000 d6aba200 00000002
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c0389b45>] __wait_on_bit+0x45/0x60
Jan 1 18:05:12 core kernel: [<c0149040>] sync_buffer+0x0/0x50
Jan 1 18:05:12 core kernel: [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan 1 18:05:12 core kernel: [<c0124cc0>] wake_bit_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan 1 18:05:12 core kernel: [<c01545a7>] follow_mount+0x57/0xa0
Jan 1 18:05:12 core kernel: [<c0155011>] link_path_walk+0x8e1/0xb60
Jan 1 18:05:12 core kernel: [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan 1 18:05:12 core kernel: [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan 1 18:05:12 core kernel: [<c01509f9>] vfs_getattr+0x39/0xa0
Jan 1 18:05:12 core kernel: [<c0150aaf>] vfs_stat+0x4f/0x60
Jan 1 18:05:12 core kernel: [<c01532d7>] pipe_read+0x37/0x40
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c0147ea1>] vfs_read+0xd1/0x130
Jan 1 18:05:12 core kernel: [<c038944e>] schedule+0x2ce/0x4d0
Jan 1 18:05:12 core kernel: [<c0148171>] sys_read+0x51/0x80
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: lsof D C04C0120 0 12137 1 12141 12134 (NOTLB)
Jan 1 18:05:12 core kernel: cbed5da8 00200086 f2996a40 c04c0120 c0139fa3 dbd4d900 d9115078 d221efe4
Jan 1 18:05:12 core kernel: 00001704 00436954 000280e6 f2996a40 f2996b9c df1cbe94 df1cbda0 cbed5dcc
Jan 1 18:05:12 core kernel: df1cbeb8 c01d4a65 d9115078 c013a386 dbd4d900 00000000 d6aba200 00000001
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c0139fa3>] do_no_page+0x63/0x250
Jan 1 18:05:12 core kernel: [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan 1 18:05:12 core kernel: [<c013a386>] handle_mm_fault+0xf6/0x170
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c010d34c>] do_page_fault+0x18c/0x599
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan 1 18:05:12 core kernel: [<c01545a7>] follow_mount+0x57/0xa0
Jan 1 18:05:12 core kernel: [<c0155011>] link_path_walk+0x8e1/0xb60
Jan 1 18:05:12 core kernel: [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan 1 18:05:12 core kernel: [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan 1 18:05:12 core kernel: [<c01509f9>] vfs_getattr+0x39/0xa0
Jan 1 18:05:12 core kernel: [<c0150aaf>] vfs_stat+0x4f/0x60
Jan 1 18:05:12 core kernel: [<c01532d7>] pipe_read+0x37/0x40
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c0147ea1>] vfs_read+0xd1/0x130
Jan 1 18:05:12 core kernel: [<c038944e>] schedule+0x2ce/0x4d0
Jan 1 18:05:12 core kernel: [<c0148171>] sys_read+0x51/0x80
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: lsof D C04C0120 0 12141 1 12169 12137 (NOTLB)
Jan 1 18:05:12 core kernel: d6507da8 00200082 f2996560 c04c0120 c0139fa3 dbd4db20 e57184bc d6224fe4
Jan 1 18:05:12 core kernel: 00001ad7 d3d93fee 000280e9 f2996560 f29966bc df1cbe94 df1cbda0 d6507dcc
Jan 1 18:05:12 core kernel: df1cbeb8 c01d4a65 e57184bc c013a386 dbd4db20 00000000 d6aba200 00000001
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c0139fa3>] do_no_page+0x63/0x250
Jan 1 18:05:12 core kernel: [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan 1 18:05:12 core kernel: [<c013a386>] handle_mm_fault+0xf6/0x170
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c010d34c>] do_page_fault+0x18c/0x599
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan 1 18:05:12 core kernel: [<c01545a7>] follow_mount+0x57/0xa0
Jan 1 18:05:12 core kernel: [<c0155011>] link_path_walk+0x8e1/0xb60
Jan 1 18:05:12 core kernel: [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan 1 18:05:12 core kernel: [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan 1 18:05:12 core kernel: [<c01509f9>] vfs_getattr+0x39/0xa0
Jan 1 18:05:12 core kernel: [<c0150aaf>] vfs_stat+0x4f/0x60
Jan 1 18:05:12 core kernel: [<c01532d7>] pipe_read+0x37/0x40
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c0147ea1>] vfs_read+0xd1/0x130
Jan 1 18:05:12 core kernel: [<c038944e>] schedule+0x2ce/0x4d0
Jan 1 18:05:12 core kernel: [<c0148171>] sys_read+0x51/0x80
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: lsof D C04C0120 0 12169 1 12171 12141 (NOTLB)
Jan 1 18:05:12 core kernel: de709da8 00200082 dfe36a80 c04c0120 c0139fa3 dbd4d6e0 d9115ee8 ef49dfe4
Jan 1 18:05:12 core kernel: 000019ce 39260da8 000280f0 dfe36a80 dfe36bdc df1cbe94 df1cbda0 de709dcc
Jan 1 18:05:12 core kernel: df1cbeb8 c01d4a65 d9115ee8 c013a386 dbd4d6e0 00000000 d6aba200 00000001
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c0139fa3>] do_no_page+0x63/0x250
Jan 1 18:05:12 core kernel: [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan 1 18:05:12 core kernel: [<c013a386>] handle_mm_fault+0xf6/0x170
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c010d34c>] do_page_fault+0x18c/0x599
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan 1 18:05:12 core kernel: [<c01545a7>] follow_mount+0x57/0xa0
Jan 1 18:05:12 core kernel: [<c0155011>] link_path_walk+0x8e1/0xb60
Jan 1 18:05:12 core kernel: [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan 1 18:05:12 core kernel: [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan 1 18:05:12 core kernel: [<c01509f9>] vfs_getattr+0x39/0xa0
Jan 1 18:05:12 core kernel: [<c0150aaf>] vfs_stat+0x4f/0x60
Jan 1 18:05:12 core kernel: [<c01532d7>] pipe_read+0x37/0x40
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c0147ea1>] vfs_read+0xd1/0x130
Jan 1 18:05:12 core kernel: [<c038944e>] schedule+0x2ce/0x4d0
Jan 1 18:05:12 core kernel: [<c0148171>] sys_read+0x51/0x80
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: lsof D C04C0120 0 12171 1 26379 12169 (NOTLB)
Jan 1 18:05:12 core kernel: ed273da8 00200082 dde78040 c04c0120 c0139fa3 eb16f740 d9115e40 de75efe4
Jan 1 18:05:12 core kernel: 000019a1 bccd2773 000280f0 dde78040 dde7819c df1cbe94 df1cbda0 ed273dcc
Jan 1 18:05:12 core kernel: df1cbeb8 c01d4a65 d9115e40 c013a386 eb16f740 00000000 d6aba200 00000001
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c0139fa3>] do_no_page+0x63/0x250
Jan 1 18:05:12 core kernel: [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan 1 18:05:12 core kernel: [<c013a386>] handle_mm_fault+0xf6/0x170
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c010d34c>] do_page_fault+0x18c/0x599
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan 1 18:05:12 core kernel: [<c01545a7>] follow_mount+0x57/0xa0
Jan 1 18:05:12 core kernel: [<c0155011>] link_path_walk+0x8e1/0xb60
Jan 1 18:05:12 core kernel: [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan 1 18:05:12 core kernel: [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan 1 18:05:12 core kernel: [<c01509f9>] vfs_getattr+0x39/0xa0
Jan 1 18:05:12 core kernel: [<c0150aaf>] vfs_stat+0x4f/0x60
Jan 1 18:05:12 core kernel: [<c01532d7>] pipe_read+0x37/0x40
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c0147ea1>] vfs_read+0xd1/0x130
Jan 1 18:05:12 core kernel: [<c038944e>] schedule+0x2ce/0x4d0
Jan 1 18:05:12 core kernel: [<c0148171>] sys_read+0x51/0x80
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: smbd D C04C0120 0 12172 1 13162 9058 (NOTLB)
Jan 1 18:05:12 core kernel: e1763c24 00000082 d696da40 c04c0120 f7cbdec0 c03762de d0130400 f7cbdec0
Jan 1 18:05:12 core kernel: 000d4246 e30867f8 000280ff d696da40 d696db9c f7cbdec0 e1763c60 f7cbdf38
Jan 1 18:05:12 core kernel: e1763c40 c037800d f7cbdec0 00000000 da6b95f8 e1762000 00000000 00000000
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c03762de>] xprt_prepare_transmit+0x7e/0xc0
Jan 1 18:05:12 core kernel: [<c037800d>] __rpc_execute+0x13d/0x3c0
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan 1 18:05:12 core kernel: [<c03786b6>] rpc_new_task+0x36/0xb0
Jan 1 18:05:12 core kernel: [<c0373a24>] rpc_call_sync+0x74/0xb0
Jan 1 18:05:12 core kernel: [<c01dd6fa>] nfs3_rpc_wrapper+0x3a/0x80
Jan 1 18:05:12 core kernel: [<c01ddc9a>] nfs3_proc_access+0xda/0x170
Jan 1 18:05:12 core kernel: [<c01bdc75>] __journal_file_buffer+0x175/0x230
Jan 1 18:05:12 core kernel: [<c01bd02f>] journal_dirty_metadata+0xef/0x170
Jan 1 18:05:12 core kernel: [<c037973c>] rpcauth_lookup_credcache+0x1cc/0x210
Jan 1 18:05:12 core kernel: [<c01d29c5>] nfs_do_access+0x65/0xb0
Jan 1 18:05:12 core kernel: [<c01d2b00>] nfs_permission+0xf0/0x170
Jan 1 18:05:12 core kernel: [<c0154241>] permission+0x51/0x60
Jan 1 18:05:12 core kernel: [<c01551c2>] link_path_walk+0xa92/0xb60
Jan 1 18:05:12 core kernel: [<c0160120>] update_atime+0xd0/0xe0
Jan 1 18:05:12 core kernel: [<c0154e0d>] link_path_walk+0x6dd/0xb60
Jan 1 18:05:12 core kernel: [<c01554e0>] path_lookup+0x70/0x110
Jan 1 18:05:12 core kernel: [<c0155733>] __user_walk+0x33/0x60
Jan 1 18:05:12 core kernel: [<c0150a7f>] vfs_stat+0x1f/0x60
Jan 1 18:05:12 core kernel: [<c01511ab>] sys_stat64+0x1b/0x40
Jan 1 18:05:12 core kernel: [<c0103ee8>] math_state_restore+0x28/0x50
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: nmbd S C04C0120 0 13162 1 13438 12172 (NOTLB)
Jan 1 18:05:12 core kernel: f6ef1eb4 00200086 f7d585a0 c04c0120 c473dc18 c012ff73 c132a260 00000000
Jan 1 18:05:12 core kernel: 000169ae 1415a4e0 0002817a f7d585a0 f7d586fc 2a0aee26 f6ef1ec8 0000000b
Jan 1 18:05:12 core kernel: 0000000b c03899e3 f6ef1ec8 2a0aee26 f6ec60c0 c04079d8 d5349ec8 2a0aee26
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c012ff73>] __get_free_pages+0x33/0x40
Jan 1 18:05:12 core kernel: [<c03899e3>] schedule_timeout+0x63/0xc0
Jan 1 18:05:12 core kernel: [<c011aa90>] process_timeout+0x0/0x10
Jan 1 18:05:12 core kernel: [<c0159693>] do_select+0x173/0x2b0
Jan 1 18:05:12 core kernel: [<c0159370>] __pollwait+0x0/0xd0
Jan 1 18:05:12 core kernel: [<c0159abf>] sys_select+0x2bf/0x4d0
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: mutt S C04C0150 0 13427 26545 (NOTLB)
Jan 1 18:05:12 core kernel: e751ff10 00000086 d4370ae0 c04c0150 00028179 00000001 f32707b5 00028179
Jan 1 18:05:12 core kernel: 00000f9a f3bf6150 00028179 f2996080 f29961dc 2a13ecb8 e751ff24 e751ff68
Jan 1 18:05:12 core kernel: 000927c1 c03899e3 e751ff24 2a13ecb8 00000145 c04c61e0 c04c61e0 2a13ecb8
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c03899e3>] schedule_timeout+0x63/0xc0
Jan 1 18:05:12 core kernel: [<c011aa90>] process_timeout+0x0/0x10
Jan 1 18:05:12 core kernel: [<c0159e0a>] do_poll+0xaa/0xd0
Jan 1 18:05:12 core kernel: [<c0159f82>] sys_poll+0x152/0x210
Jan 1 18:05:12 core kernel: [<c01163fb>] sys_gettimeofday+0x3b/0x80
Jan 1 18:05:12 core kernel: [<c0159370>] __pollwait+0x0/0xd0
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: xterm S C04C0120 0 13438 1 13440 13162 (NOTLB)
Jan 1 18:05:12 core kernel: ca025eb4 00000082 f6fb1140 c04c0120 00000010 00000000 00000096 ec92f240
Jan 1 18:05:12 core kernel: 00000b61 43a9a434 0002817b f6fb1140 f6fb129c 2a0adee6 ca025ec8 00000006
Jan 1 18:05:12 core kernel: 00000006 c03899e3 ca025ec8 2a0adee6 f6805000 f57a7ec8 f0cf1ec8 2a0adee6
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c03899e3>] schedule_timeout+0x63/0xc0
Jan 1 18:05:12 core kernel: [<c011aa90>] process_timeout+0x0/0x10
Jan 1 18:05:12 core kernel: [<c0159693>] do_select+0x173/0x2b0
Jan 1 18:05:12 core kernel: [<c0159370>] __pollwait+0x0/0xd0
Jan 1 18:05:12 core kernel: [<c0159abf>] sys_select+0x2bf/0x4d0
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: bash S C04C0150 0 13440 13438 13443 (NOTLB)
Jan 1 18:05:12 core kernel: f4179f1c 00000086 d3a0db00 c04c0150 0002813e f30432a0 27a9401d 0002813e
Jan 1 18:05:12 core kernel: 0002baaf 27a9401d 0002813e f6fb1620 f6fb177c fffffe00 f6fb1620 f6fb16c4
Jan 1 18:05:12 core kernel: f6fb16c4 c01159cd ffffffff 00000006 f1b15100 f4179f50 00000292 00030002
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c01159cd>] do_wait+0x18d/0x460
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c022cf2e>] copy_to_user+0x3e/0x50
Jan 1 18:05:12 core kernel: [<c0115d6f>] sys_wait4+0x3f/0x50
Jan 1 18:05:12 core kernel: [<c0115da7>] sys_waitpid+0x27/0x2b
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
Jan 1 18:05:12 core kernel: bash S C04C0150 0 13443 13440 (NOTLB)
Jan 1 18:05:12 core kernel: f169fe70 00000086 f6fb1140 c04c0150 00028143 c038944e 600f9ce1 00028143
Jan 1 18:05:12 core kernel: 000013b7 60106e9a 00028143 f1b15100 f1b1525c e000f000 7fffffff edbe8000
Jan 1 18:05:12 core kernel: c081dc40 c0389a35 00000002 e7593c0f c0272c48 f6805000 e7593c11 00000000
Jan 1 18:05:12 core kernel: Call Trace:
Jan 1 18:05:12 core kernel: [<c038944e>] schedule+0x2ce/0x4d0
Jan 1 18:05:12 core kernel: [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan 1 18:05:12 core kernel: [<c0272c48>] pty_write+0x68/0x70
Jan 1 18:05:12 core kernel: [<c0271a73>] read_chan+0x5e3/0x6f0
Jan 1 18:05:12 core kernel: [<c0271cdd>] write_chan+0x15d/0x210
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c010f740>] default_wake_function+0x0/0x20
Jan 1 18:05:12 core kernel: [<c026c94e>] tty_write+0x20e/0x260
Jan 1 18:05:12 core kernel: [<c026c721>] tty_read+0xe1/0x100
Jan 1 18:05:12 core kernel: [<c0147e88>] vfs_read+0xb8/0x130
Jan 1 18:05:12 core kernel: [<c0148171>] sys_read+0x51/0x80
Jan 1 18:05:12 core kernel: [<c010254b>] syscall_call+0x7/0xb
I was not able to reproduce as of now, because i have to use the system.
Christian Leber
--
http://www.nosoftwarepatents.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
` (2 preceding siblings ...)
2004-12-29 18:01 ` Julien BLACHE
@ 2005-01-05 11:37 ` Christoph Hellwig
3 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2005-01-05 11:37 UTC (permalink / raw)
To: Gildas LE NADAN; +Cc: linux-kernel, linux-xfs
On Tue, Dec 28, 2004 at 12:24:01PM +0100, Gildas LE NADAN wrote:
> Hi,
>
> I experience hangs on samba processes on a filer using xfs over lvm2 as
> data partitions, when there is active snapshots of the xfs partitions.
>
> I have a clone of the production server (same software, same hardware)
> where the situation can be reproduced perfectly.
>
> Testings showed that the result was the same, whether the snapshots were
> mounted or not : smbd processes are locked and unkillable while the
> machine is normaly working otherwise, except software reboot is
> impossible and hardware reset is needed.
>
> I noticed Brad Fitzpatrick's case in kernel 2.6.10 changelog
> (http://lkml.org/lkml/2004/11/14/98) and tested kernel 2.6.10 today
> without success.
>
> Configuration is the following :
> - supermicro m/b with dual Xeon 2,8Ghz (SMT is active)
> - 1 GB ram,
> - adaptec u320 raid controler
> - kernel 2.6.10
> - debian sarge
> - samba 3
> - LVM2
> - XFS with quota turned on
>
> All software are from debian sarge packages, except the kernel.
>
> I'm not able to determine if the problem is more xfs, device mapper or
> samba related, and was not able to do extensive testings (using a
> different filesystem, testing with a different daemon, etc...), but
> SMT/SMP testings showed that this is not a SMP/SMT related problem.
>
> I've compiled the kernel with the debugging options, so I might provide
> additional informations if needed as in Brad's case.
I'll try to reproduce your problems soon.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-01-05 11:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
2004-12-28 11:39 ` bert hubert
2004-12-28 15:15 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) [includes backtrace] Gildas LE NADAN
2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
2005-01-02 12:41 ` Christian Leber
2004-12-29 18:01 ` Julien BLACHE
2005-01-05 11:37 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).