linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
@ 2004-12-28 11:24 Gildas LE NADAN
  2004-12-28 11:39 ` bert hubert
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Gildas LE NADAN @ 2004-12-28 11:24 UTC (permalink / raw)
  To: linux-kernel

Hi,

I experience hangs on samba processes on a filer using xfs over lvm2 as 
data partitions, when there is active snapshots of the xfs partitions.

I have a clone of the production server (same software, same hardware) 
where the situation can be reproduced perfectly.

Testings showed that the result was the same, whether the snapshots were 
mounted or not : smbd processes are locked and unkillable while the 
machine is normaly working otherwise, except software reboot is 
impossible and hardware reset is needed.

I noticed Brad Fitzpatrick's case in kernel 2.6.10 changelog 
(http://lkml.org/lkml/2004/11/14/98) and tested kernel 2.6.10 today 
without success.

Configuration is the following :
- supermicro m/b with dual Xeon 2,8Ghz (SMT is active)
- 1 GB ram,
- adaptec u320 raid controler
- kernel 2.6.10
- debian sarge
- samba 3
- LVM2
- XFS with quota turned on

All software are from debian sarge packages, except the kernel.

I'm not able to determine if the problem is more xfs, device mapper or 
samba related, and was not able to do extensive testings (using a 
different filesystem, testing with a different daemon, etc...), but 
SMT/SMP testings showed that this is not a SMP/SMT related problem.

I've compiled the kernel with the debugging options, so I might provide 
additional informations if needed as in Brad's case.

Sincerely,
Gildas LE NADAN
(Please CC me as I didn't suscribe to LKML)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
  2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
@ 2004-12-28 11:39 ` bert hubert
  2004-12-28 15:15   ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) [includes backtrace] Gildas LE NADAN
  2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: bert hubert @ 2004-12-28 11:39 UTC (permalink / raw)
  To: Gildas LE NADAN; +Cc: linux-kernel

On Tue, Dec 28, 2004 at 12:24:01PM +0100, Gildas LE NADAN wrote:

> I experience hangs on samba processes on a filer using xfs over lvm2 as 
> data partitions, when there is active snapshots of the xfs partitions.

A trick is to enable alt-sysrq and press alt-sysrq-t (I think) which spams
your syslog with backtraces of all processes currently running, including
the ones stuck in 'D' state (ps aux | grep " D ").

If you isolate these backtraces and send them to this list, they will enable
developers to help you. Make sure you add 'includes backtrace' in your
Subject.

> Testings showed that the result was the same, whether the snapshots were 
> mounted or not : smbd processes are locked and unkillable while the 
> machine is normaly working otherwise, except software reboot is 
> impossible and hardware reset is needed.

For maximum usefulness, make your setup as simple as possible and reproduce.

Good luck - I personally can't help you in any real way, except to help you
get the debugging information that is needed.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
  2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
  2004-12-28 11:39 ` bert hubert
@ 2004-12-28 14:07 ` Gene Heskett
  2005-01-02 12:41   ` Christian Leber
  2004-12-29 18:01 ` Julien BLACHE
  2005-01-05 11:37 ` Christoph Hellwig
  3 siblings, 1 reply; 7+ messages in thread
From: Gene Heskett @ 2004-12-28 14:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Gildas LE NADAN

On Tuesday 28 December 2004 06:24, Gildas LE NADAN wrote:
>Hi,
>
>I experience hangs on samba processes on a filer using xfs over lvm2
> as data partitions, when there is active snapshots of the xfs
> partitions.
>
>I have a clone of the production server (same software, same
> hardware) where the situation can be reproduced perfectly.
>
>Testings showed that the result was the same, whether the snapshots
> were mounted or not : smbd processes are locked and unkillable
> while the machine is normaly working otherwise, except software
> reboot is impossible and hardware reset is needed.
>
>I noticed Brad Fitzpatrick's case in kernel 2.6.10 changelog
>(http://lkml.org/lkml/2004/11/14/98) and tested kernel 2.6.10 today
>without success.
>
>Configuration is the following :
>- supermicro m/b with dual Xeon 2,8Ghz (SMT is active)
>- 1 GB ram,
>- adaptec u320 raid controler
>- kernel 2.6.10
>- debian sarge
>- samba 3
>- LVM2
>- XFS with quota turned on
>
>All software are from debian sarge packages, except the kernel.
>
>I'm not able to determine if the problem is more xfs, device mapper
> or samba related, and was not able to do extensive testings (using
> a different filesystem, testing with a different daemon, etc...),
> but SMT/SMP testings showed that this is not a SMP/SMT related
> problem.
>
>I've compiled the kernel with the debugging options, so I might
> provide additional informations if needed as in Brad's case.
>
>Sincerely,
>Gildas LE NADAN
>(Please CC me as I didn't suscribe to LKML)

I have a somewhat similar case here, samba processses are unkillable, 
but I can do a software reboot. Something is also killing amandad, 
and I lost the backup of this machine last night.  The amanda logs 
are bereft of any info and I've no clue that its happened except a 
message from amanda that the client access timed out on this machine.
This was while running 2.6.10-rc3-mm1-V0.33-04 which ran stably for 8 
days previously.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) [includes backtrace]
  2004-12-28 11:39 ` bert hubert
@ 2004-12-28 15:15   ` Gildas LE NADAN
  0 siblings, 0 replies; 7+ messages in thread
From: Gildas LE NADAN @ 2004-12-28 15:15 UTC (permalink / raw)
  To: linux-kernel

  >>I experience hangs on samba processes on a filer using xfs over lvm2 as
>>data partitions, when there is active snapshots of the xfs partitions.
> 
> A trick is to enable alt-sysrq and press alt-sysrq-t (I think) which spams
> your syslog with backtraces of all processes currently running, including
> the ones stuck in 'D' state (ps aux | grep " D ").
> 
> If you isolate these backtraces and send them to this list, they will enable
> developers to help you. Make sure you add 'includes backtrace' in your
> Subject.

OK, this is what I get after provoking the problem on the test server 
(copying 1Go of data is enough to trigger the problem) :

# ps afx | grep smbd
  2279 ?        Ss     0:00 /usr/sbin/smbd -D
  2288 ?        S      0:00  \_ /usr/sbin/smbd -D
  2447 ?        D      0:01  \_ /usr/sbin/smbd -D
  2487 pts/0    S+     0:00  |               \_ grep smbd
#  killall -9 smbd
# ps afx | grep smbd
  2554 pts/0    S+     0:00  |               \_ grep smbd
  2447 ?        D      0:01 /usr/sbin/smbd -D


I did a "echo t > /proc/sysrq-trigger" and tried to clean the resulting 
logs a bit before sending. Hope this gives enough info, otherwise I kept 
the whole log so I can send whatever part is needed

  SysRq : Show State
                                                 sibling
    task             PC      pid father child younger older
  ...
xfslogd/0     S 00000004     0   218     11           220   216 (L-TLB)
  f7eecf44 00000046 f7eecf34 00000004 00000002 f60ef53c c0427ba0 f60ef5a8
         00000282 c01017cc 00000000 f7f28974 f7f2896c 00000000 c170f020 
00000000
         00000c41 ff6027e0 00000005 00000286 f7eb9530 f7eb96b0 f7eecf94 
00000002
  Call Trace:
   [__up+28/32] __up+0x1c/0x20
   [worker_thread+565/608] worker_thread+0x235/0x260
   [pagebuf_iodone_work+0/80] pagebuf_iodone_work+0x0/0x50
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfslogd/1     S 00000004     0   219     10           221   217 (L-TLB)
  f7c82f44 00000046 f7c82f30 00000004 00000001 ffffffff f7eb9020 35a49146
         00000000 f7eb9020 c170f020 f7eb9020 00000000 c1717a00 c1717020 
00000001
         000008ae 0395f3e5 00000000 c171705c f7c7e020 f7c7e1a0 00000001 
f7f289dc
  Call Trace:
   [worker_thread+565/608] worker_thread+0x235/0x260
   [schedule+1132/3360] schedule+0x46c/0xd20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfslogd/2     S 00000004     0   220     11           222   218 (L-TLB)
  f7eedf44 00000046 f7eedf34 00000004 00000004 00000000 f7c1f020 c01f4a99
         f714b13c 00000000 00000000 f7f28a74 f7f28a6c 00000000 c171f020 
00000002
         00000d47 6261fbfd 00000074 00000286 f7eb9020 f7eb91a0 f7eedf94 
00000008
  Call Trace:
   [xfs_buf_iodone_callbacks+361/368] xfs_buf_iodone_callbacks+0x169/0x170
   [worker_thread+565/608] worker_thread+0x235/0x260
   [pagebuf_iodone_work+0/80] pagebuf_iodone_work+0x0/0x50
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfslogd/3     S 00000004     0   221     10           223   219 (L-TLB)
  f7c84f44 00000046 f7c84f34 00000004 00000003 ffffffff f7c27a40 35a47b19
         00000000 03969a3b 03969a3b 00000000 f7c84f28 c0116200 c1727020 
00000003
         00000ef7 0396cbec 00000000 00000286 f7c83a40 f7c83bc0 f7c84f94 
00000004
  Call Trace:
   [activate_task+144/176] activate_task+0x90/0xb0
   [worker_thread+565/608] worker_thread+0x235/0x260
   [schedule+1132/3360] schedule+0x46c/0xd20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfsdatad/0    S 00000004     0   222     11           224   220 (L-TLB)
  f7f06f44 00000046 f7f06f30 00000004 00000002 ffffffff f7c83530 35a48050
         00000000 f7c83530 c1717020 f7c83530 00000000 c170fa00 c170f020 
00000000
         00000897 0397ce5a 00000000 c170f05c f7f05a40 f7f05bc0 00000002 
f7f28550
  Call Trace:
   [worker_thread+565/608] worker_thread+0x235/0x260
   [schedule+1132/3360] schedule+0x46c/0xd20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfsdatad/1    S 00000004     0   223     10           225   221 (L-TLB)
  f7c85f44 00000046 f7c85f30 00000004 00000001 ffffffff f7f05530 35a47efe
         00000000 f7f05530 c170f020 f7f05530 00000000 c1717a00 c1717020 
00000001
         000008f9 0398532d 00000000 c171705c f7c83530 f7c836b0 00000001 
f7f285d0
  Call Trace:
   [worker_thread+565/608] worker_thread+0x235/0x260
   [schedule+1132/3360] schedule+0x46c/0xd20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfsdatad/2    S 00000004     0   224     11           903   222 (L-TLB)
  f7f07f44 00000046 f7f07f34 00000004 00000004 ffffffff f7c1f020 35a493ed
         00000000 03985b99 03985b99 00000000 f7f07f28 c0116200 c171f020 
00000002
         00000d7c 03988e25 00000000 00000286 f7f05530 f7f056b0 f7f07f94 
00000008
  Call Trace:
   [activate_task+144/176] activate_task+0x90/0xb0
   [worker_thread+565/608] worker_thread+0x235/0x260
   [schedule+1132/3360] schedule+0x46c/0xd20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfsdatad/3    S 00000004     0   225     10           902   223 (L-TLB)
  f7c87f44 00000046 f7c87f34 00000004 00000003 ffffffff f7c27a40 35a49032
         00000000 0398e44a 0398e44a 00000000 f7c87f28 c0116200 c1727020 
00000003
         000010b0 0399175f 00000000 00000286 f7c83020 f7c831a0 f7c87f94 
00000004
  Call Trace:
   [activate_task+144/176] activate_task+0x90/0xb0
   [worker_thread+565/608] worker_thread+0x235/0x260
   [schedule+1132/3360] schedule+0x46c/0xd20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [worker_thread+0/608] worker_thread+0x0/0x260
   [kthread+186/192] kthread+0xba/0xc0
   [kthread+0/192] kthread+0x0/0xc0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfsbufd       S 00000004     0   226      1           815   213 (L-TLB)
  f7f08f78 00000046 f7f08f68 00000004 00000001 00000000 f7c1f530 c02c1edb
         f7f70e64 00000000 f7eaa944 c0264f5f 00000004 c04f99e8 c1717020 
00000001
         00000134 6d117e89 0000009e c0125879 f7f05020 f7f051a0 00000000 
00000001
  Call Trace:
   [elv_next_request+27/256] elv_next_request+0x1b/0x100
   [kobject_put+31/48] kobject_put+0x1f/0x30
   [__mod_timer+249/320] __mod_timer+0xf9/0x140
   [schedule_timeout+117/208] schedule_timeout+0x75/0xd0
   [process_timeout+0/16] process_timeout+0x0/0x10
   [dm_unplug_all+39/64] dm_unplug_all+0x27/0x40
   [blk_backing_dev_unplug+0/32] blk_backing_dev_unplug+0x0/0x20
   [pagebuf_daemon+118/512] pagebuf_daemon+0x76/0x200
   [pagebuf_daemon+0/512] pagebuf_daemon+0x0/0x200
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
...
  xfssyncd      S 00000004     0  1361      1          1362  1360 (L-TLB)
  f756ef74 00000046 f756ef64 00000004 00000002 f5735568 c0427ba0 f5735568
         f756ef2c 0000022e 00000031 f714be3c f5735568 00000000 c170f020 
00000000
         000034a1 58c63e72 00000098 c0125879 f6fb6a40 f6fb6bc0 00000000 
00000002
  Call Trace:
   [__mod_timer+249/320] __mod_timer+0xf9/0x140
   [schedule_timeout+117/208] schedule_timeout+0x75/0xd0
   [process_timeout+0/16] process_timeout+0x0/0x10
   [xfssyncd+134/480] xfssyncd+0x86/0x1e0
   [xfssyncd+0/480] xfssyncd+0x0/0x1e0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
  xfssyncd      S 00000004     0  1362      1          2233  1361 (L-TLB)
  f6accf74 00000046 f6accf64 00000004 00000002 f60f4360 c0427ba0 f60efc3c
         c050ad58 c023a3ee 00000031 f60efc3c f6d6ccd0 00000000 c170f020 
00000000
         00001568 6080d951 00000098 c0125879 f6a95530 f6a956b0 00000000 
00000002
  Call Trace:
   [pagebuf_rele+46/240] pagebuf_rele+0x2e/0xf0
   [__mod_timer+249/320] __mod_timer+0xf9/0x140
   [schedule_timeout+117/208] schedule_timeout+0x75/0xd0
   [process_timeout+0/16] process_timeout+0x0/0x10
   [xfssyncd+134/480] xfssyncd+0x86/0x1e0
   [xfssyncd+0/480] xfssyncd+0x0/0x1e0
   [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
...
  smbd          S 00000004     0  2279      1  2288    2285  2277 (NOTLB)
  f5110ea4 00000082 f5110e90 00000004 00000002 c013ed74 f6770020 c042cd80
         000000d0 f6770020 c1717020 f6770020 00000000 c170fa00 c170f020 
00000000
         0000b4c6 78f5a598 0000006d c170f05c f779a530 f779a6b0 00000002 
f5d37028
  Call Trace:
   [__alloc_pages+484/928] __alloc_pages+0x1e4/0x3a0
   [schedule_timeout+199/208] schedule_timeout+0xc7/0xd0
   [tcp_poll+52/400] tcp_poll+0x34/0x190
   [handle_mm_fault+344/384] handle_mm_fault+0x158/0x180
   [add_wait_queue+29/80] add_wait_queue+0x1d/0x50
   [pipe_poll+52/128] pipe_poll+0x34/0x80
   [do_select+401/736] do_select+0x191/0x2e0
   [__pollwait+0/208] __pollwait+0x0/0xd0
   [sys_select+731/1456] sys_select+0x2db/0x5b0
   [syscall_call+7/11] syscall_call+0x7/0xb
...
  smbd          D 00000004     0  2447   2279                2288 (NOTLB)
  f6736bbc 00000082 f6736bac 00000004 00000002 00000000 c0427ba0 00000000
         f6770020 c0118350 00000000 00000000 c17ff080 00000007 c170f020 
00000000
         00008e96 5602776d 00000071 00000000 f6770020 f67701a0 c023a197 
00000002
  Call Trace:
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [pagebuf_associate_memory+103/400] pagebuf_associate_memory+0x67/0x190
   [schedule_timeout+199/208] schedule_timeout+0xc7/0xd0
   [xlog_sync+630/1216] xlog_sync+0x276/0x4c0
   [xlog_state_release_iclog+91/272] xlog_state_release_iclog+0x5b/0x110
   [add_wait_queue_exclusive+26/80] add_wait_queue_exclusive+0x1a/0x50
   [xlog_state_sync+602/656] xlog_state_sync+0x25a/0x290
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [xlog_assign_tail_lsn+73/128] xlog_assign_tail_lsn+0x49/0x80
   [default_wake_function+0/32] default_wake_function+0x0/0x20
   [xfs_log_force+132/144] xfs_log_force+0x84/0x90
   [xfs_trans_commit+631/1008] xfs_trans_commit+0x277/0x3f0
   [xfs_trans_dup+191/208] xfs_trans_dup+0xbf/0xd0
   [xfs_itruncate_finish+593/1072] xfs_itruncate_finish+0x251/0x430
   [xfs_setattr+3578/4128] xfs_setattr+0xdfa/0x1020
   [linvfs_setattr+258/384] linvfs_setattr+0x102/0x180
   [kmem_cache_alloc+114/192] kmem_cache_alloc+0x72/0xc0
   [linvfs_setattr+0/384] linvfs_setattr+0x0/0x180
   [notify_change+334/400] notify_change+0x14e/0x190
   [do_truncate+147/208] do_truncate+0x93/0xd0
   [fget+73/96] fget+0x49/0x60
   [sys_ftruncate64+204/304] sys_ftruncate64+0xcc/0x130
   [sys_open+108/144] sys_open+0x6c/0x90
   [syscall_call+7/11] syscall_call+0x7/0xb

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
  2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
  2004-12-28 11:39 ` bert hubert
  2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
@ 2004-12-29 18:01 ` Julien BLACHE
  2005-01-05 11:37 ` Christoph Hellwig
  3 siblings, 0 replies; 7+ messages in thread
From: Julien BLACHE @ 2004-12-29 18:01 UTC (permalink / raw)
  To: Gildas LE NADAN; +Cc: linux-kernel

Gildas LE NADAN <gildas.le-nadan@inha.fr> wrote:

> I experience hangs on samba processes on a filer using xfs over lvm2
> as data partitions, when there is active snapshots of the xfs
> partitions.

Your problem probably lies between lvm2 and XFS. I got the same
problems this summer while doing the exact same thing.

The server would just completely hang once I started doing lvm
snapshots:
 -> at the beginning, the snapshots would work OK, but XFS would hang
    when accessing the filesystem afterwards
 -> after a while (usually 2 or 3 snapshots, and I was taking a
    snapshot every 2 hours), the snapshot would not complete, and
    then only a hard reboot would work

I was doing the snapshots from a crontab, the script used xfs_freeze
to freeze the filesystem before doing the snapshot (and unfreeze it
afterwards, of course). Sometimes xfs_freeze -u would hang too (but at
this time, the server was in a pretty bad state already).

The server wasn't loaded at all, we were doing some reads/writes
through samba to have some modified files lying around, but we were
mainly prototyping the server, not stress-testing it.

LVM and XFS just don't play nice together when it comes to snapshots,
I thought it had been fixed already, but it's not the case, as we both
know...

(I can't remember the kernel version, it could have been a 2.4 kernel,
but I was using LVM2 and the latest XFS code available)

Feel free to correct me if I did something wrong (but AFAIK I took
care of everything, knowing there could have been bad interactions
between LVM and XFS).

JB.

-- 
Julien BLACHE                                   <http://www.jblache.org> 
<jb@jblache.org>                                  GPG KeyID 0xF5D65169

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
  2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
@ 2005-01-02 12:41   ` Christian Leber
  0 siblings, 0 replies; 7+ messages in thread
From: Christian Leber @ 2005-01-02 12:41 UTC (permalink / raw)
  To: linux-kernel

On Tue, Dec 28, 2004 at 09:07:01AM -0500, Gene Heskett wrote:

> I have a somewhat similar case here, samba processses are unkillable, 
> but I can do a software reboot. Something is also killing amandad, 
> and I lost the backup of this machine last night.  The amanda logs 
> are bereft of any info and I've no clue that its happened except a 
> message from amanda that the client access timed out on this machine.
> This was while running 2.6.10-rc3-mm1-V0.33-04 which ran stably for 8 
> days previously.

I have the same problem (2.6.10-rc3 running 7 days without problems) and
i had D state mc, smbd and lsof:
(there is something about nfs in the call tree, it _might_ be that i
halted a nfs server the day before without unmounting it on the system with the
problem)

Dec 31 18:55:20 core kernel: nfs warning: mount version older than kernel
Jan  1 06:25:38 core kernel: nfs: server igor3 not responding, still trying
Jan  1 17:29:15 core kernel: nfs: server igor3 not responding, still trying
Jan  1 18:05:12 core kernel:       (NOTLB)
Jan  1 18:05:12 core kernel: c843deb4 00200086 f68a6580 c04c0150 000274ab c18e57e0 587f3ab2 000274ab 
Jan  1 18:05:12 core kernel:        00003875 5880d98d 000274ab d696d560 d696d6bc 00000014 d696d560 c843c000 
Jan  1 18:05:12 core kernel:        ffffe000 c011cca2 d696d560 ecc030e0 00040005 c011d00f ffffffff d696d9d4 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c011cca2>] finish_stop+0x42/0x90
Jan  1 18:05:12 core kernel:  [<c011d00f>] get_signal_to_deliver+0x19f/0x2b0
Jan  1 18:05:12 core kernel:  [<c0102388>] do_signal+0x98/0x130
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c01059b2>] sys_ptrace+0xb2/0x610
Jan  1 18:05:12 core kernel:  [<c0102455>] do_notify_resume+0x35/0x38
Jan  1 18:05:12 core kernel:  [<c0102596>] work_notifysig+0x13/0x15
Jan  1 18:05:12 core kernel: mc            D C04C0120     0 21514  30032 21516               (NOTLB)
Jan  1 18:05:12 core kernel: f0269da8 00000082 d1edba20 c04c0120 00000000 00000292 f7de72a4 f0269da8 
Jan  1 18:05:12 core kernel:        000f8c0f 36b16015 00026d31 d1edba20 d1edbb7c df1cbe94 df1cbda0 f0269dcc 
Jan  1 18:05:12 core kernel:        df1cbeb8 c01d4a65 c1b25560 f7de72a4 c01af638 00000000 d6aba200 00000000 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan  1 18:05:12 core kernel:  [<c01af638>] ext3_mark_iloc_dirty+0x28/0x40
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c01545a7>] follow_mount+0x57/0xa0
Jan  1 18:05:12 core kernel:  [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan  1 18:05:12 core kernel:  [<c0160120>] update_atime+0xd0/0xe0
Jan  1 18:05:12 core kernel:  [<c0154e6d>] link_path_walk+0x73d/0xb60
Jan  1 18:05:12 core kernel:  [<c01bd3c9>] journal_stop+0x149/0x200
Jan  1 18:05:12 core kernel:  [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan  1 18:05:12 core kernel:  [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan  1 18:05:12 core kernel:  [<c01509f9>] vfs_getattr+0x39/0xa0
Jan  1 18:05:12 core kernel:  [<c0150aaf>] vfs_stat+0x4f/0x60
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c015a1ff>] fifo_open+0x13f/0x26f
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: mozilla-bin   S C04C0120     0  2810   2994          2859       (NOTLB)
Jan  1 18:05:12 core kernel: e6135f10 00200086 dde78a00 c04c0120 e6135fa0 cb94fa98 c012ff73 c1125b60 
Jan  1 18:05:12 core kernel:        00002623 41070976 00028179 dde78a00 dde78b5c 00000000 7fffffff e6135f68 
Jan  1 18:05:12 core kernel:        7fffffff c0389a35 c01536c4 e4cb7d80 cb1db8e0 e6135fa0 00000145 e491a420 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c012ff73>] __get_free_pages+0x33/0x40
Jan  1 18:05:12 core kernel:  [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan  1 18:05:12 core kernel:  [<c01536c4>] pipe_poll+0x34/0x80
Jan  1 18:05:12 core kernel:  [<c0159d1f>] do_pollfd+0x4f/0x90
Jan  1 18:05:12 core kernel:  [<c0159e0a>] do_poll+0xaa/0xd0
Jan  1 18:05:12 core kernel:  [<c0159f82>] sys_poll+0x152/0x210
Jan  1 18:05:12 core kernel:  [<c0159370>] __pollwait+0x0/0xd0
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: mozilla-bin   S C04C0120     0  2859   2994          2860  2810 (NOTLB)
Jan  1 18:05:12 core kernel: db23bf10 00200086 c46cda20 c04c0120 000000d0 2d9fbab8 000000d0 f3ca81a0 
Jan  1 18:05:12 core kernel:        00000799 d2284a24 00028178 c46cda20 c46cdb7c 00000000 7fffffff db23bf68 
Jan  1 18:05:12 core kernel:        7fffffff c0389a35 c01536c4 f3ca81a0 cb1dbf20 db23bfa0 00000145 da0da768 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan  1 18:05:12 core kernel:  [<c01536c4>] pipe_poll+0x34/0x80
Jan  1 18:05:12 core kernel:  [<c0159d1f>] do_pollfd+0x4f/0x90
Jan  1 18:05:12 core kernel:  [<c0159e0a>] do_poll+0xaa/0xd0
Jan  1 18:05:12 core kernel:  [<c0159f82>] sys_poll+0x152/0x210
Jan  1 18:05:12 core kernel:  [<c0159370>] __pollwait+0x0/0xd0
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: mozilla-bin   S C04C0150     0  2860   2994                2859 (NOTLB)
Jan  1 18:05:12 core kernel: c79ffe90 00200086 dde78a00 c04c0150 00028179 00000000 4105997b 00028179 
Jan  1 18:05:12 core kernel:        0000159c 4105b238 00028179 c46cd540 c46cd69c 2a0aec95 c79ffea4 fffffff5 
Jan  1 18:05:12 core kernel:        c79ffedc c03899e3 c79ffea4 2a0aec95 c79ffec8 c04c60c8 c04c60c8 2a0aec95 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c03899e3>] schedule_timeout+0x63/0xc0
Jan  1 18:05:12 core kernel:  [<c011aa90>] process_timeout+0x0/0x10
Jan  1 18:05:12 core kernel:  [<c012585f>] futex_wait+0x12f/0x170
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c01535d0>] pipe_write+0x0/0x40
Jan  1 18:05:12 core kernel:  [<c0125b18>] do_futex+0x48/0xa0
Jan  1 18:05:12 core kernel:  [<c0125c5e>] sys_futex+0xee/0x100
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: xchat         S C04C0120     0  7965      1          7966 26544 (NOTLB)
Jan  1 18:05:12 core kernel: e4e43f10 00200082 ce06d5a0 c04c0120 e4e43fa0 e93f8af8 c012ff73 d5703400 
Jan  1 18:05:12 core kernel:        00000852 5500c143 0002817b ce06d5a0 ce06d6fc 2a0adc55 e4e43f24 e4e43f68 
Jan  1 18:05:12 core kernel:        00000034 c03899e3 e4e43f24 2a0adc55 c86fd740 c04c5a10 c04c5a10 2a0adc55 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c012ff73>] __get_free_pages+0x33/0x40
Jan  1 18:05:12 core kernel:  [<c03899e3>] schedule_timeout+0x63/0xc0
Jan  1 18:05:12 core kernel:  [<c011aa90>] process_timeout+0x0/0x10
Jan  1 18:05:12 core kernel:  [<c0159e0a>] do_poll+0xaa/0xd0
Jan  1 18:05:12 core kernel:  [<c0159f82>] sys_poll+0x152/0x210
Jan  1 18:05:12 core kernel:  [<c01163fb>] sys_gettimeofday+0x3b/0x80
Jan  1 18:05:12 core kernel:  [<c0159370>] __pollwait+0x0/0xd0
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: xchat         S C04C05C8     0  7966      1         12134  7965 (NOTLB)
Jan  1 18:05:12 core kernel: c30bfeb4 00200082 ce06d5a0 c04c05c8 0002817b 00000001 5500689d 0002817b 
Jan  1 18:05:12 core kernel:        000006de 55006e0a 0002817b dde78520 dde7867c 00000000 7fffffff 00000006 
Jan  1 18:05:12 core kernel:        00000006 c0389a35 c1346120 00000000 c01593f5 f64b54a4 00200246 f64b54a4 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan  1 18:05:12 core kernel:  [<c01593f5>] __pollwait+0x85/0xd0
Jan  1 18:05:12 core kernel:  [<c01536c4>] pipe_poll+0x34/0x80
Jan  1 18:05:12 core kernel:  [<c0159693>] do_select+0x173/0x2b0
Jan  1 18:05:12 core kernel:  [<c0159370>] __pollwait+0x0/0xd0
Jan  1 18:05:12 core kernel:  [<c0159abf>] sys_select+0x2bf/0x4d0
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: smbd          D C04C0120     0  9058      1         12172 26379 (NOTLB)
Jan  1 18:05:12 core kernel: e1d95c24 00000082 ed6365a0 c04c0120 f7cbdce0 c03762de d0130400 f7cbdce0 
Jan  1 18:05:12 core kernel:        00011688 13c6a05f 000280e1 ed6365a0 ed6366fc f7cbdce0 e1d95c60 f7cbdd58 
Jan  1 18:05:12 core kernel:        e1d95c40 c037800d f7cbdce0 00000000 da6b95f8 e1d94000 00000000 00000000 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c03762de>] xprt_prepare_transmit+0x7e/0xc0
Jan  1 18:05:12 core kernel:  [<c037800d>] __rpc_execute+0x13d/0x3c0
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c03786b6>] rpc_new_task+0x36/0xb0
Jan  1 18:05:12 core kernel:  [<c0373a24>] rpc_call_sync+0x74/0xb0
Jan  1 18:05:12 core kernel:  [<c01dd6fa>] nfs3_rpc_wrapper+0x3a/0x80
Jan  1 18:05:12 core kernel:  [<c01ddc9a>] nfs3_proc_access+0xda/0x170
Jan  1 18:05:12 core kernel:  [<c01bdc75>] __journal_file_buffer+0x175/0x230
Jan  1 18:05:12 core kernel:  [<c01bd02f>] journal_dirty_metadata+0xef/0x170
Jan  1 18:05:12 core kernel:  [<c037973c>] rpcauth_lookup_credcache+0x1cc/0x210
Jan  1 18:05:12 core kernel:  [<c01d29c5>] nfs_do_access+0x65/0xb0
Jan  1 18:05:12 core kernel:  [<c01d2b00>] nfs_permission+0xf0/0x170
Jan  1 18:05:12 core kernel:  [<c0154241>] permission+0x51/0x60
Jan  1 18:05:12 core kernel:  [<c01551c2>] link_path_walk+0xa92/0xb60
Jan  1 18:05:12 core kernel:  [<c0160120>] update_atime+0xd0/0xe0
Jan  1 18:05:12 core kernel:  [<c0154e0d>] link_path_walk+0x6dd/0xb60
Jan  1 18:05:12 core kernel:  [<c01554e0>] path_lookup+0x70/0x110
Jan  1 18:05:12 core kernel:  [<c0155733>] __user_walk+0x33/0x60
Jan  1 18:05:12 core kernel:  [<c0150a7f>] vfs_stat+0x1f/0x60
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c0103ee8>] math_state_restore+0x28/0x50
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: lsof          D C04C0120     0 12134      1         12137  7966 (NOTLB)
Jan  1 18:05:12 core kernel: c2341da8 00000086 c46cd060 c04c0120 c0389b45 d900a5f8 c0149040 c2341dd4 
Jan  1 18:05:12 core kernel:        0000196b 064b7168 000280e4 c46cd060 c46cd1bc df1cbe94 df1cbda0 c2341dcc 
Jan  1 18:05:12 core kernel:        df1cbeb8 c01d4a65 c0124cc0 c2341dd4 c2341dd4 00000000 d6aba200 00000002 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c0389b45>] __wait_on_bit+0x45/0x60
Jan  1 18:05:12 core kernel:  [<c0149040>] sync_buffer+0x0/0x50
Jan  1 18:05:12 core kernel:  [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan  1 18:05:12 core kernel:  [<c0124cc0>] wake_bit_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan  1 18:05:12 core kernel:  [<c01545a7>] follow_mount+0x57/0xa0
Jan  1 18:05:12 core kernel:  [<c0155011>] link_path_walk+0x8e1/0xb60
Jan  1 18:05:12 core kernel:  [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan  1 18:05:12 core kernel:  [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan  1 18:05:12 core kernel:  [<c01509f9>] vfs_getattr+0x39/0xa0
Jan  1 18:05:12 core kernel:  [<c0150aaf>] vfs_stat+0x4f/0x60
Jan  1 18:05:12 core kernel:  [<c01532d7>] pipe_read+0x37/0x40
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c0147ea1>] vfs_read+0xd1/0x130
Jan  1 18:05:12 core kernel:  [<c038944e>] schedule+0x2ce/0x4d0
Jan  1 18:05:12 core kernel:  [<c0148171>] sys_read+0x51/0x80
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: lsof          D C04C0120     0 12137      1         12141 12134 (NOTLB)
Jan  1 18:05:12 core kernel: cbed5da8 00200086 f2996a40 c04c0120 c0139fa3 dbd4d900 d9115078 d221efe4 
Jan  1 18:05:12 core kernel:        00001704 00436954 000280e6 f2996a40 f2996b9c df1cbe94 df1cbda0 cbed5dcc 
Jan  1 18:05:12 core kernel:        df1cbeb8 c01d4a65 d9115078 c013a386 dbd4d900 00000000 d6aba200 00000001 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c0139fa3>] do_no_page+0x63/0x250
Jan  1 18:05:12 core kernel:  [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan  1 18:05:12 core kernel:  [<c013a386>] handle_mm_fault+0xf6/0x170
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c010d34c>] do_page_fault+0x18c/0x599
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan  1 18:05:12 core kernel:  [<c01545a7>] follow_mount+0x57/0xa0
Jan  1 18:05:12 core kernel:  [<c0155011>] link_path_walk+0x8e1/0xb60
Jan  1 18:05:12 core kernel:  [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan  1 18:05:12 core kernel:  [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan  1 18:05:12 core kernel:  [<c01509f9>] vfs_getattr+0x39/0xa0
Jan  1 18:05:12 core kernel:  [<c0150aaf>] vfs_stat+0x4f/0x60
Jan  1 18:05:12 core kernel:  [<c01532d7>] pipe_read+0x37/0x40
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c0147ea1>] vfs_read+0xd1/0x130
Jan  1 18:05:12 core kernel:  [<c038944e>] schedule+0x2ce/0x4d0
Jan  1 18:05:12 core kernel:  [<c0148171>] sys_read+0x51/0x80
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: lsof          D C04C0120     0 12141      1         12169 12137 (NOTLB)
Jan  1 18:05:12 core kernel: d6507da8 00200082 f2996560 c04c0120 c0139fa3 dbd4db20 e57184bc d6224fe4 
Jan  1 18:05:12 core kernel:        00001ad7 d3d93fee 000280e9 f2996560 f29966bc df1cbe94 df1cbda0 d6507dcc 
Jan  1 18:05:12 core kernel:        df1cbeb8 c01d4a65 e57184bc c013a386 dbd4db20 00000000 d6aba200 00000001 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c0139fa3>] do_no_page+0x63/0x250
Jan  1 18:05:12 core kernel:  [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan  1 18:05:12 core kernel:  [<c013a386>] handle_mm_fault+0xf6/0x170
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c010d34c>] do_page_fault+0x18c/0x599
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan  1 18:05:12 core kernel:  [<c01545a7>] follow_mount+0x57/0xa0
Jan  1 18:05:12 core kernel:  [<c0155011>] link_path_walk+0x8e1/0xb60
Jan  1 18:05:12 core kernel:  [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan  1 18:05:12 core kernel:  [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan  1 18:05:12 core kernel:  [<c01509f9>] vfs_getattr+0x39/0xa0
Jan  1 18:05:12 core kernel:  [<c0150aaf>] vfs_stat+0x4f/0x60
Jan  1 18:05:12 core kernel:  [<c01532d7>] pipe_read+0x37/0x40
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c0147ea1>] vfs_read+0xd1/0x130
Jan  1 18:05:12 core kernel:  [<c038944e>] schedule+0x2ce/0x4d0
Jan  1 18:05:12 core kernel:  [<c0148171>] sys_read+0x51/0x80
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: lsof          D C04C0120     0 12169      1         12171 12141 (NOTLB)
Jan  1 18:05:12 core kernel: de709da8 00200082 dfe36a80 c04c0120 c0139fa3 dbd4d6e0 d9115ee8 ef49dfe4 
Jan  1 18:05:12 core kernel:        000019ce 39260da8 000280f0 dfe36a80 dfe36bdc df1cbe94 df1cbda0 de709dcc 
Jan  1 18:05:12 core kernel:        df1cbeb8 c01d4a65 d9115ee8 c013a386 dbd4d6e0 00000000 d6aba200 00000001 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c0139fa3>] do_no_page+0x63/0x250
Jan  1 18:05:12 core kernel:  [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan  1 18:05:12 core kernel:  [<c013a386>] handle_mm_fault+0xf6/0x170
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c010d34c>] do_page_fault+0x18c/0x599
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan  1 18:05:12 core kernel:  [<c01545a7>] follow_mount+0x57/0xa0
Jan  1 18:05:12 core kernel:  [<c0155011>] link_path_walk+0x8e1/0xb60
Jan  1 18:05:12 core kernel:  [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan  1 18:05:12 core kernel:  [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan  1 18:05:12 core kernel:  [<c01509f9>] vfs_getattr+0x39/0xa0
Jan  1 18:05:12 core kernel:  [<c0150aaf>] vfs_stat+0x4f/0x60
Jan  1 18:05:12 core kernel:  [<c01532d7>] pipe_read+0x37/0x40
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c0147ea1>] vfs_read+0xd1/0x130
Jan  1 18:05:12 core kernel:  [<c038944e>] schedule+0x2ce/0x4d0
Jan  1 18:05:12 core kernel:  [<c0148171>] sys_read+0x51/0x80
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: lsof          D C04C0120     0 12171      1         26379 12169 (NOTLB)
Jan  1 18:05:12 core kernel: ed273da8 00200082 dde78040 c04c0120 c0139fa3 eb16f740 d9115e40 de75efe4 
Jan  1 18:05:12 core kernel:        000019a1 bccd2773 000280f0 dde78040 dde7819c df1cbe94 df1cbda0 ed273dcc 
Jan  1 18:05:12 core kernel:        df1cbeb8 c01d4a65 d9115e40 c013a386 eb16f740 00000000 d6aba200 00000001 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c0139fa3>] do_no_page+0x63/0x250
Jan  1 18:05:12 core kernel:  [<c01d4a65>] nfs_wait_on_inode+0xd5/0x1e0
Jan  1 18:05:12 core kernel:  [<c013a386>] handle_mm_fault+0xf6/0x170
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c010d34c>] do_page_fault+0x18c/0x599
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c01d4f64>] __nfs_revalidate_inode+0x74/0x360
Jan  1 18:05:12 core kernel:  [<c01545a7>] follow_mount+0x57/0xa0
Jan  1 18:05:12 core kernel:  [<c0155011>] link_path_walk+0x8e1/0xb60
Jan  1 18:05:12 core kernel:  [<c01d52ea>] nfs_revalidate_inode+0x4a/0x70
Jan  1 18:05:12 core kernel:  [<c01d4bd0>] nfs_getattr+0x60/0xa0
Jan  1 18:05:12 core kernel:  [<c01509f9>] vfs_getattr+0x39/0xa0
Jan  1 18:05:12 core kernel:  [<c0150aaf>] vfs_stat+0x4f/0x60
Jan  1 18:05:12 core kernel:  [<c01532d7>] pipe_read+0x37/0x40
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c0147ea1>] vfs_read+0xd1/0x130
Jan  1 18:05:12 core kernel:  [<c038944e>] schedule+0x2ce/0x4d0
Jan  1 18:05:12 core kernel:  [<c0148171>] sys_read+0x51/0x80
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: smbd          D C04C0120     0 12172      1         13162  9058 (NOTLB)
Jan  1 18:05:12 core kernel: e1763c24 00000082 d696da40 c04c0120 f7cbdec0 c03762de d0130400 f7cbdec0 
Jan  1 18:05:12 core kernel:        000d4246 e30867f8 000280ff d696da40 d696db9c f7cbdec0 e1763c60 f7cbdf38 
Jan  1 18:05:12 core kernel:        e1763c40 c037800d f7cbdec0 00000000 da6b95f8 e1762000 00000000 00000000 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c03762de>] xprt_prepare_transmit+0x7e/0xc0
Jan  1 18:05:12 core kernel:  [<c037800d>] __rpc_execute+0x13d/0x3c0
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c0124c60>] autoremove_wake_function+0x0/0x60
Jan  1 18:05:12 core kernel:  [<c03786b6>] rpc_new_task+0x36/0xb0
Jan  1 18:05:12 core kernel:  [<c0373a24>] rpc_call_sync+0x74/0xb0
Jan  1 18:05:12 core kernel:  [<c01dd6fa>] nfs3_rpc_wrapper+0x3a/0x80
Jan  1 18:05:12 core kernel:  [<c01ddc9a>] nfs3_proc_access+0xda/0x170
Jan  1 18:05:12 core kernel:  [<c01bdc75>] __journal_file_buffer+0x175/0x230
Jan  1 18:05:12 core kernel:  [<c01bd02f>] journal_dirty_metadata+0xef/0x170
Jan  1 18:05:12 core kernel:  [<c037973c>] rpcauth_lookup_credcache+0x1cc/0x210
Jan  1 18:05:12 core kernel:  [<c01d29c5>] nfs_do_access+0x65/0xb0
Jan  1 18:05:12 core kernel:  [<c01d2b00>] nfs_permission+0xf0/0x170
Jan  1 18:05:12 core kernel:  [<c0154241>] permission+0x51/0x60
Jan  1 18:05:12 core kernel:  [<c01551c2>] link_path_walk+0xa92/0xb60
Jan  1 18:05:12 core kernel:  [<c0160120>] update_atime+0xd0/0xe0
Jan  1 18:05:12 core kernel:  [<c0154e0d>] link_path_walk+0x6dd/0xb60
Jan  1 18:05:12 core kernel:  [<c01554e0>] path_lookup+0x70/0x110
Jan  1 18:05:12 core kernel:  [<c0155733>] __user_walk+0x33/0x60
Jan  1 18:05:12 core kernel:  [<c0150a7f>] vfs_stat+0x1f/0x60
Jan  1 18:05:12 core kernel:  [<c01511ab>] sys_stat64+0x1b/0x40
Jan  1 18:05:12 core kernel:  [<c0103ee8>] math_state_restore+0x28/0x50
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: nmbd          S C04C0120     0 13162      1         13438 12172 (NOTLB)
Jan  1 18:05:12 core kernel: f6ef1eb4 00200086 f7d585a0 c04c0120 c473dc18 c012ff73 c132a260 00000000 
Jan  1 18:05:12 core kernel:        000169ae 1415a4e0 0002817a f7d585a0 f7d586fc 2a0aee26 f6ef1ec8 0000000b 
Jan  1 18:05:12 core kernel:        0000000b c03899e3 f6ef1ec8 2a0aee26 f6ec60c0 c04079d8 d5349ec8 2a0aee26 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c012ff73>] __get_free_pages+0x33/0x40
Jan  1 18:05:12 core kernel:  [<c03899e3>] schedule_timeout+0x63/0xc0
Jan  1 18:05:12 core kernel:  [<c011aa90>] process_timeout+0x0/0x10
Jan  1 18:05:12 core kernel:  [<c0159693>] do_select+0x173/0x2b0
Jan  1 18:05:12 core kernel:  [<c0159370>] __pollwait+0x0/0xd0
Jan  1 18:05:12 core kernel:  [<c0159abf>] sys_select+0x2bf/0x4d0
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: mutt          S C04C0150     0 13427  26545                     (NOTLB)
Jan  1 18:05:12 core kernel: e751ff10 00000086 d4370ae0 c04c0150 00028179 00000001 f32707b5 00028179 
Jan  1 18:05:12 core kernel:        00000f9a f3bf6150 00028179 f2996080 f29961dc 2a13ecb8 e751ff24 e751ff68 
Jan  1 18:05:12 core kernel:        000927c1 c03899e3 e751ff24 2a13ecb8 00000145 c04c61e0 c04c61e0 2a13ecb8 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c03899e3>] schedule_timeout+0x63/0xc0
Jan  1 18:05:12 core kernel:  [<c011aa90>] process_timeout+0x0/0x10
Jan  1 18:05:12 core kernel:  [<c0159e0a>] do_poll+0xaa/0xd0
Jan  1 18:05:12 core kernel:  [<c0159f82>] sys_poll+0x152/0x210
Jan  1 18:05:12 core kernel:  [<c01163fb>] sys_gettimeofday+0x3b/0x80
Jan  1 18:05:12 core kernel:  [<c0159370>] __pollwait+0x0/0xd0
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: xterm         S C04C0120     0 13438      1 13440         13162 (NOTLB)
Jan  1 18:05:12 core kernel: ca025eb4 00000082 f6fb1140 c04c0120 00000010 00000000 00000096 ec92f240 
Jan  1 18:05:12 core kernel:        00000b61 43a9a434 0002817b f6fb1140 f6fb129c 2a0adee6 ca025ec8 00000006 
Jan  1 18:05:12 core kernel:        00000006 c03899e3 ca025ec8 2a0adee6 f6805000 f57a7ec8 f0cf1ec8 2a0adee6 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c03899e3>] schedule_timeout+0x63/0xc0
Jan  1 18:05:12 core kernel:  [<c011aa90>] process_timeout+0x0/0x10
Jan  1 18:05:12 core kernel:  [<c0159693>] do_select+0x173/0x2b0
Jan  1 18:05:12 core kernel:  [<c0159370>] __pollwait+0x0/0xd0
Jan  1 18:05:12 core kernel:  [<c0159abf>] sys_select+0x2bf/0x4d0
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: bash          S C04C0150     0 13440  13438 13443               (NOTLB)
Jan  1 18:05:12 core kernel: f4179f1c 00000086 d3a0db00 c04c0150 0002813e f30432a0 27a9401d 0002813e 
Jan  1 18:05:12 core kernel:        0002baaf 27a9401d 0002813e f6fb1620 f6fb177c fffffe00 f6fb1620 f6fb16c4 
Jan  1 18:05:12 core kernel:        f6fb16c4 c01159cd ffffffff 00000006 f1b15100 f4179f50 00000292 00030002 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c01159cd>] do_wait+0x18d/0x460
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c022cf2e>] copy_to_user+0x3e/0x50
Jan  1 18:05:12 core kernel:  [<c0115d6f>] sys_wait4+0x3f/0x50
Jan  1 18:05:12 core kernel:  [<c0115da7>] sys_waitpid+0x27/0x2b
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb
Jan  1 18:05:12 core kernel: bash          S C04C0150     0 13443  13440                     (NOTLB)
Jan  1 18:05:12 core kernel: f169fe70 00000086 f6fb1140 c04c0150 00028143 c038944e 600f9ce1 00028143 
Jan  1 18:05:12 core kernel:        000013b7 60106e9a 00028143 f1b15100 f1b1525c e000f000 7fffffff edbe8000 
Jan  1 18:05:12 core kernel:        c081dc40 c0389a35 00000002 e7593c0f c0272c48 f6805000 e7593c11 00000000 
Jan  1 18:05:12 core kernel: Call Trace:
Jan  1 18:05:12 core kernel:  [<c038944e>] schedule+0x2ce/0x4d0
Jan  1 18:05:12 core kernel:  [<c0389a35>] schedule_timeout+0xb5/0xc0
Jan  1 18:05:12 core kernel:  [<c0272c48>] pty_write+0x68/0x70
Jan  1 18:05:12 core kernel:  [<c0271a73>] read_chan+0x5e3/0x6f0
Jan  1 18:05:12 core kernel:  [<c0271cdd>] write_chan+0x15d/0x210
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c010f740>] default_wake_function+0x0/0x20
Jan  1 18:05:12 core kernel:  [<c026c94e>] tty_write+0x20e/0x260
Jan  1 18:05:12 core kernel:  [<c026c721>] tty_read+0xe1/0x100
Jan  1 18:05:12 core kernel:  [<c0147e88>] vfs_read+0xb8/0x130
Jan  1 18:05:12 core kernel:  [<c0148171>] sys_read+0x51/0x80
Jan  1 18:05:12 core kernel:  [<c010254b>] syscall_call+0x7/0xb

I was not able to reproduce as of now, because i have to use the system.

Christian Leber

-- 
http://www.nosoftwarepatents.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10)
  2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
                   ` (2 preceding siblings ...)
  2004-12-29 18:01 ` Julien BLACHE
@ 2005-01-05 11:37 ` Christoph Hellwig
  3 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2005-01-05 11:37 UTC (permalink / raw)
  To: Gildas LE NADAN; +Cc: linux-kernel, linux-xfs

On Tue, Dec 28, 2004 at 12:24:01PM +0100, Gildas LE NADAN wrote:
> Hi,
> 
> I experience hangs on samba processes on a filer using xfs over lvm2 as 
> data partitions, when there is active snapshots of the xfs partitions.
> 
> I have a clone of the production server (same software, same hardware) 
> where the situation can be reproduced perfectly.
> 
> Testings showed that the result was the same, whether the snapshots were 
> mounted or not : smbd processes are locked and unkillable while the 
> machine is normaly working otherwise, except software reboot is 
> impossible and hardware reset is needed.
> 
> I noticed Brad Fitzpatrick's case in kernel 2.6.10 changelog 
> (http://lkml.org/lkml/2004/11/14/98) and tested kernel 2.6.10 today 
> without success.
> 
> Configuration is the following :
> - supermicro m/b with dual Xeon 2,8Ghz (SMT is active)
> - 1 GB ram,
> - adaptec u320 raid controler
> - kernel 2.6.10
> - debian sarge
> - samba 3
> - LVM2
> - XFS with quota turned on
> 
> All software are from debian sarge packages, except the kernel.
> 
> I'm not able to determine if the problem is more xfs, device mapper or 
> samba related, and was not able to do extensive testings (using a 
> different filesystem, testing with a different daemon, etc...), but 
> SMT/SMP testings showed that this is not a SMP/SMT related problem.
> 
> I've compiled the kernel with the debugging options, so I might provide 
> additional informations if needed as in Brad's case.

I'll try to reproduce your problems soon.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-01-05 11:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-28 11:24 unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gildas LE NADAN
2004-12-28 11:39 ` bert hubert
2004-12-28 15:15   ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) [includes backtrace] Gildas LE NADAN
2004-12-28 14:07 ` unkillable processes using samba, xfs and lvm2 snapshots (k 2.6.10) Gene Heskett
2005-01-02 12:41   ` Christian Leber
2004-12-29 18:01 ` Julien BLACHE
2005-01-05 11:37 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).