All of lore.kernel.org
 help / color / mirror / Atom feed
* flush-btrfs-1 hangs when building openwrt
@ 2011-01-25  8:51 Daniel Poelzleithner
  2011-01-25 15:30 ` Josef Bacik
  2011-01-31 10:52 ` Sander
  0 siblings, 2 replies; 8+ messages in thread
From: Daniel Poelzleithner @ 2011-01-25  8:51 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]

Hi,

Since update to 2.6.37 I can't build openwrt on my btrfs buildroot anymore.
I'm not sure if this is related to the other flush-btrfs-1 thread.

plenty of diskspace is free:

/dev/mapper/cruor-build
                       97G   68G   27G  73% /opt/build

It always hangs when openwrt builds the ext4 image and runs tune2fs on it.

/opt/build/fahrenheit/openwrt/staging_dir/host/bin/tune2fs -O
extents,uninit_bg,dir_index
/opt/build/fahrenheit/openwrt/build_dir/linux-x86_kvm_guest/root.ext4
tune2fs 1.41.13 (13-Dec-2010)

the processes can't be killed.

alt-sysctl-t does not show anything, nor is there a oops.

I put the the openwrt config I'm using at https://gist.github.com/794593
, maybe it is reproduceable.

Linux cruor 2.6.37 #2 SMP Thu Jan 20 02:09:59 CET 2011 x86_64 GNU/Linux

Please CC.

kind regards
 Daniel




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: flush-btrfs-1 hangs when building openwrt
  2011-01-25  8:51 flush-btrfs-1 hangs when building openwrt Daniel Poelzleithner
@ 2011-01-25 15:30 ` Josef Bacik
  2011-01-25 21:16   ` Daniel Poelzleithner
  2011-01-31 10:52 ` Sander
  1 sibling, 1 reply; 8+ messages in thread
From: Josef Bacik @ 2011-01-25 15:30 UTC (permalink / raw)
  To: Daniel Poelzleithner; +Cc: linux-btrfs

On Tue, Jan 25, 2011 at 09:51:01AM +0100, Daniel Poelzleithner wrote:
> Hi,
> 
> Since update to 2.6.37 I can't build openwrt on my btrfs buildroot anymore.
> I'm not sure if this is related to the other flush-btrfs-1 thread.
> 
> plenty of diskspace is free:
> 
> /dev/mapper/cruor-build
>                        97G   68G   27G  73% /opt/build
> 
> It always hangs when openwrt builds the ext4 image and runs tune2fs on it.
> 
> /opt/build/fahrenheit/openwrt/staging_dir/host/bin/tune2fs -O
> extents,uninit_bg,dir_index
> /opt/build/fahrenheit/openwrt/build_dir/linux-x86_kvm_guest/root.ext4
> tune2fs 1.41.13 (13-Dec-2010)
> 
> the processes can't be killed.
> 
> alt-sysctl-t does not show anything, nor is there a oops.
> 
> I put the the openwrt config I'm using at https://gist.github.com/794593
> , maybe it is reproduceable.
> 
> Linux cruor 2.6.37 #2 SMP Thu Jan 20 02:09:59 CET 2011 x86_64 GNU/Linux
> 

How about sysrq+w when it's hanging.  Also could you give the exact steps to
reproduce?  I went to the openwrt site to try and build, but it seems like
theres alot of moving parts.  If you can just tell me what to download and what
you run to reproduce I can try and reproduce locally.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: flush-btrfs-1 hangs when building openwrt
  2011-01-25 15:30 ` Josef Bacik
@ 2011-01-25 21:16   ` Daniel Poelzleithner
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Poelzleithner @ 2011-01-25 21:16 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 913 bytes --]

On 01/25/2011 04:30 PM, Josef Bacik wrote:

> How about sysrq+w when it's hanging.  

Shows nothing.

> Also could you give the exact steps to
> reproduce?  I went to the openwrt site to try and build, but it seems like
> theres alot of moving parts.  If you can just tell me what to download and what
> you run to reproduce I can try and reproduce locally.  Thanks,

mkdir /btrfs/with/lots/of/space
cd /btrfs/with/lots/of/space
git clone git://nbd.name/openwrt.git
cd openwrt
wget
https://gist.github.com/raw/794593/b9e7e7b6dce71093a653953d7e39c94a6ffa4528/gistfile1.txt
-O .config

make
# take a nap, will take quite some time on the first run
If you prefere seeing output, which slows things down do

make V=99

instead. I skipped the packages repo here, because I don't think it will
make a difference. It hanges in the final steps of creating the image.

kind regards
 Daniel


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: flush-btrfs-1 hangs when building openwrt
  2011-01-25  8:51 flush-btrfs-1 hangs when building openwrt Daniel Poelzleithner
  2011-01-25 15:30 ` Josef Bacik
@ 2011-01-31 10:52 ` Sander
  2011-01-31 11:08   ` cwillu
  1 sibling, 1 reply; 8+ messages in thread
From: Sander @ 2011-01-31 10:52 UTC (permalink / raw)
  To: Daniel Poelzleithner; +Cc: linux-btrfs

Daniel Poelzleithner wrote (ao):
> Since update to 2.6.37 I can't build openwrt on my btrfs buildroot anymore.
> I'm not sure if this is related to the other flush-btrfs-1 thread.

While I thought it was related to a dying disk used for backups, after
your post I think it might not.

Running 2.6.37 on openrd-client (ARM).

It started with hanging jobs on the backup disk. I stopped cron and
could kill most of the jobs. Some are still hanging though.

Since then (uptime 12 days) I see hanging procmail processes, and an
apt-get upgrade last week gave an unkillable dpkg process. All these have
nothing to do with the backup disk. CPU is maxed out:

top - 11:49:54 up 12 days,  1:19, 31 users,  load average: 13.54, 13.41, 13.36
Tasks: 201 total,  13 running, 187 sleeping,   0 stopped,   1 zombie
Cpu(s): 41.5%us, 58.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    515004k total,   400824k used,   114180k free,       28k buffers
Swap:  4302560k total,   173988k used,  4128572k free,   202948k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1592 ookhoi    20   0  2716  456  348 S  1.9  0.1  25:17.42 showNewMail2
 6761 ookhoi    20   0  2736 1000  704 S  1.3  0.2  61:21.93 top
27609 ookhoi    20   0  2736 1264  936 R  1.3  0.2   0:01.06 top
30678 ookhoi    20   0  2736  892  584 S  1.3  0.2  91:37.75 top
 6036 ookhoi    39  19  2692   64   52 R  1.0  0.0 869:46.32 procmail
11373 ookhoi    39  19  4800   64   52 R  1.0  0.0 714:25.88 procmail
18871 root      39  19  2540   32   20 R  1.0  0.0   1528:51 lzop
18894 ookhoi    39  19  2692   64   52 R  1.0  0.0 611:16.18 procmail
20305 ookhoi    39  19  2692   68   56 R  1.0  0.0 610:51.97 procmail
20378 ookhoi    39  19  2692   68   56 R  1.0  0.0 610:50.75 procmail
23661 ookhoi    39  19  2692   80   68 R  1.0  0.0   1308:23 procmail
25091 root      20   0     0    0    0 S  1.0  0.0   0:25.63 flush-btrfs-2
26409 root      39  19  2264   32   28 R  1.0  0.0   1526:42 mv
27606 ookhoi    39  19  9084   40   28 R  1.0  0.0   3637:39 procmail
27910 root      39  19 15096 3756  304 R  1.0  0.7 638:46.62 dpkg
11804 ookhoi    39  19  4700   64   52 R  0.6  0.0 714:08.67 procmail
    3 root      20   0     0    0    0 R  0.3  0.0   9:39.76 ksoftirqd/0


What can I do to provide more info?

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: flush-btrfs-1 hangs when building openwrt
  2011-01-31 10:52 ` Sander
@ 2011-01-31 11:08   ` cwillu
  2011-01-31 11:18     ` Sander
  0 siblings, 1 reply; 8+ messages in thread
From: cwillu @ 2011-01-31 11:08 UTC (permalink / raw)
  To: sander; +Cc: Daniel Poelzleithner, linux-btrfs

On Mon, Jan 31, 2011 at 4:52 AM, Sander <sander@humilis.net> wrote:
> Daniel Poelzleithner wrote (ao):
>> Since update to 2.6.37 I can't build openwrt on my btrfs buildroot a=
nymore.
>> I'm not sure if this is related to the other flush-btrfs-1 thread.
>
> While I thought it was related to a dying disk used for backups, afte=
r
> your post I think it might not.
>
> Running 2.6.37 on openrd-client (ARM).
>
> It started with hanging jobs on the backup disk. I stopped cron and
> could kill most of the jobs. Some are still hanging though.
>
> Since then (uptime 12 days) I see hanging procmail processes, and an
> apt-get upgrade last week gave an unkillable dpkg process. All these =
have
> nothing to do with the backup disk. CPU is maxed out:
>
> top - 11:49:54 up 12 days, =A01:19, 31 users, =A0load average: 13.54,=
 13.41, 13.36
> Tasks: 201 total, =A013 running, 187 sleeping, =A0 0 stopped, =A0 1 z=
ombie
> Cpu(s): 41.5%us, 58.5%sy, =A00.0%ni, =A00.0%id, =A00.0%wa, =A00.0%hi,=
 =A00.0%si, =A00.0%st
> Mem: =A0 =A0515004k total, =A0 400824k used, =A0 114180k free, =A0 =A0=
 =A0 28k buffers
> Swap: =A04302560k total, =A0 173988k used, =A04128572k free, =A0 2029=
48k cached
>
> =A0PID USER =A0 =A0 =A0PR =A0NI =A0VIRT =A0RES =A0SHR S %CPU %MEM =A0=
 =A0TIME+ =A0COMMAND
> =A01592 ookhoi =A0 =A020 =A0 0 =A02716 =A0456 =A0348 S =A01.9 =A00.1 =
=A025:17.42 showNewMail2
> =A06761 ookhoi =A0 =A020 =A0 0 =A02736 1000 =A0704 S =A01.3 =A00.2 =A0=
61:21.93 top
> 27609 ookhoi =A0 =A020 =A0 0 =A02736 1264 =A0936 R =A01.3 =A00.2 =A0 =
0:01.06 top
> 30678 ookhoi =A0 =A020 =A0 0 =A02736 =A0892 =A0584 S =A01.3 =A00.2 =A0=
91:37.75 top
> =A06036 ookhoi =A0 =A039 =A019 =A02692 =A0 64 =A0 52 R =A01.0 =A00.0 =
869:46.32 procmail
> 11373 ookhoi =A0 =A039 =A019 =A04800 =A0 64 =A0 52 R =A01.0 =A00.0 71=
4:25.88 procmail
> 18871 root =A0 =A0 =A039 =A019 =A02540 =A0 32 =A0 20 R =A01.0 =A00.0 =
=A0 1528:51 lzop
> 18894 ookhoi =A0 =A039 =A019 =A02692 =A0 64 =A0 52 R =A01.0 =A00.0 61=
1:16.18 procmail
> 20305 ookhoi =A0 =A039 =A019 =A02692 =A0 68 =A0 56 R =A01.0 =A00.0 61=
0:51.97 procmail
> 20378 ookhoi =A0 =A039 =A019 =A02692 =A0 68 =A0 56 R =A01.0 =A00.0 61=
0:50.75 procmail
> 23661 ookhoi =A0 =A039 =A019 =A02692 =A0 80 =A0 68 R =A01.0 =A00.0 =A0=
 1308:23 procmail
> 25091 root =A0 =A0 =A020 =A0 0 =A0 =A0 0 =A0 =A00 =A0 =A00 S =A01.0 =A0=
0.0 =A0 0:25.63 flush-btrfs-2
> 26409 root =A0 =A0 =A039 =A019 =A02264 =A0 32 =A0 28 R =A01.0 =A00.0 =
=A0 1526:42 mv
> 27606 ookhoi =A0 =A039 =A019 =A09084 =A0 40 =A0 28 R =A01.0 =A00.0 =A0=
 3637:39 procmail
> 27910 root =A0 =A0 =A039 =A019 15096 3756 =A0304 R =A01.0 =A00.7 638:=
46.62 dpkg
> 11804 ookhoi =A0 =A039 =A019 =A04700 =A0 64 =A0 52 R =A00.6 =A00.0 71=
4:08.67 procmail
> =A0 =A03 root =A0 =A0 =A020 =A0 0 =A0 =A0 0 =A0 =A00 =A0 =A00 R =A00.=
3 =A00.0 =A0 9:39.76 ksoftirqd/0
>
>
> What can I do to provide more info?

alt-sysrq-w, and then the dmesg output, which will contain then a
backtrace for every blocked process.

(Be careful of typos, there sysrq keystrokes to do other less
diagnostic tasks that you won't want to hit by accident:
http://en.wikipedia.org/wiki/Magic_SysRq_key).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: flush-btrfs-1 hangs when building openwrt
  2011-01-31 11:08   ` cwillu
@ 2011-01-31 11:18     ` Sander
  2011-01-31 11:26       ` cwillu
  0 siblings, 1 reply; 8+ messages in thread
From: Sander @ 2011-01-31 11:18 UTC (permalink / raw)
  To: cwillu; +Cc: sander, Daniel Poelzleithner, linux-btrfs

cwillu wrote (ao):
> On Mon, Jan 31, 2011 at 4:52 AM, Sander <sander@humilis.net> wrote:
> > Daniel Poelzleithner wrote (ao):
> >> Since update to 2.6.37 I can't build openwrt on my btrfs buildroot anymore.
> >> I'm not sure if this is related to the other flush-btrfs-1 thread.
> >
> > While I thought it was related to a dying disk used for backups, after
> > your post I think it might not.
> >
> > Running 2.6.37 on openrd-client (ARM).
> >
> > It started with hanging jobs on the backup disk. I stopped cron and
> > could kill most of the jobs. Some are still hanging though.
> >
> > Since then (uptime 12 days) I see hanging procmail processes, and an
> > apt-get upgrade last week gave an unkillable dpkg process. All these have
> > nothing to do with the backup disk. CPU is maxed out:
> >
> > top - 11:49:54 up 12 days, ?1:19, 31 users, ?load average: 13.54, 13.41, 13.36
> > Tasks: 201 total, ?13 running, 187 sleeping, ? 0 stopped, ? 1 zombie
> > Cpu(s): 41.5%us, 58.5%sy, ?0.0%ni, ?0.0%id, ?0.0%wa, ?0.0%hi, ?0.0%si, ?0.0%st
> > Mem: ? ?515004k total, ? 400824k used, ? 114180k free, ? ? ? 28k buffers
> > Swap: ?4302560k total, ? 173988k used, ?4128572k free, ? 202948k cached
> >
> > ?PID USER ? ? ?PR ?NI ?VIRT ?RES ?SHR S %CPU %MEM ? ?TIME+ ?COMMAND
> > ?1592 ookhoi ? ?20 ? 0 ?2716 ?456 ?348 S ?1.9 ?0.1 ?25:17.42 showNewMail2
> > ?6761 ookhoi ? ?20 ? 0 ?2736 1000 ?704 S ?1.3 ?0.2 ?61:21.93 top
> > 27609 ookhoi ? ?20 ? 0 ?2736 1264 ?936 R ?1.3 ?0.2 ? 0:01.06 top
> > 30678 ookhoi ? ?20 ? 0 ?2736 ?892 ?584 S ?1.3 ?0.2 ?91:37.75 top
> > ?6036 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 869:46.32 procmail
> > 11373 ookhoi ? ?39 ?19 ?4800 ? 64 ? 52 R ?1.0 ?0.0 714:25.88 procmail
> > 18871 root ? ? ?39 ?19 ?2540 ? 32 ? 20 R ?1.0 ?0.0 ? 1528:51 lzop
> > 18894 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 611:16.18 procmail
> > 20305 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:51.97 procmail
> > 20378 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:50.75 procmail
> > 23661 ookhoi ? ?39 ?19 ?2692 ? 80 ? 68 R ?1.0 ?0.0 ? 1308:23 procmail
> > 25091 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 S ?1.0 ?0.0 ? 0:25.63 flush-btrfs-2
> > 26409 root ? ? ?39 ?19 ?2264 ? 32 ? 28 R ?1.0 ?0.0 ? 1526:42 mv
> > 27606 ookhoi ? ?39 ?19 ?9084 ? 40 ? 28 R ?1.0 ?0.0 ? 3637:39 procmail
> > 27910 root ? ? ?39 ?19 15096 3756 ?304 R ?1.0 ?0.7 638:46.62 dpkg
> > 11804 ookhoi ? ?39 ?19 ?4700 ? 64 ? 52 R ?0.6 ?0.0 714:08.67 procmail
> > ? ?3 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 R ?0.3 ?0.0 ? 9:39.76 ksoftirqd/0
> >
> >
> > What can I do to provide more info?
> 
> alt-sysrq-w, and then the dmesg output, which will contain then a
> backtrace for every blocked process.

Thanks cwillu.

Seems only two processes. And these are related to the backup disk
(which might or might not be broken: can't access it anymore).

Nothing to do with the procmail and dpkg processes.


[1042949.513831] SysRq : Show Blocked State
[1042949.517776]   task                PC stack   pid father
[1042949.523247] cat           D c0475dd0     0 30063      1 0x00000001
[1042949.529668] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88)
[1042949.538943] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from [<c01af0e8>] (do_lookup+0x90/0x128)
[1042949.548209] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>] (do_last+0x198/0x5b8)
[1042949.556432] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>] (do_filp_open+0x168/0x49c)
[1042949.565004] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a555c>] (do_sys_open+0x58/0x11c)
[1042949.573838] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee0>] (ret_fast_syscall+0x0/0x2c)
[1042949.582750] cat           D c0475dd0     0  4591      1 0x00000001
[1042949.589152] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88)
[1042949.598418] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from [<c01af0e8>] (do_lookup+0x90/0x128)
[1042949.607687] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>] (do_last+0x198/0x5b8)
[1042949.615910] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>] (do_filp_open+0x168/0x49c)
[1042949.624482] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a555c>] (do_sys_open+0x58/0x11c)
[1042949.633315] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee0>] (ret_fast_syscall+0x0/0x2c)


-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: flush-btrfs-1 hangs when building openwrt
  2011-01-31 11:18     ` Sander
@ 2011-01-31 11:26       ` cwillu
  2011-01-31 11:40         ` Sander
  0 siblings, 1 reply; 8+ messages in thread
From: cwillu @ 2011-01-31 11:26 UTC (permalink / raw)
  To: sander; +Cc: Daniel Poelzleithner, linux-btrfs

On Mon, Jan 31, 2011 at 5:18 AM, Sander <sander@humilis.net> wrote:
> cwillu wrote (ao):
>> On Mon, Jan 31, 2011 at 4:52 AM, Sander <sander@humilis.net> wrote:
>> > Daniel Poelzleithner wrote (ao):
>> >> Since update to 2.6.37 I can't build openwrt on my btrfs buildroo=
t anymore.
>> >> I'm not sure if this is related to the other flush-btrfs-1 thread=
=2E
>> >
>> > While I thought it was related to a dying disk used for backups, a=
fter
>> > your post I think it might not.
>> >
>> > Running 2.6.37 on openrd-client (ARM).
>> >
>> > It started with hanging jobs on the backup disk. I stopped cron an=
d
>> > could kill most of the jobs. Some are still hanging though.
>> >
>> > Since then (uptime 12 days) I see hanging procmail processes, and =
an
>> > apt-get upgrade last week gave an unkillable dpkg process. All the=
se have
>> > nothing to do with the backup disk. CPU is maxed out:
>> >
>> > top - 11:49:54 up 12 days, ?1:19, 31 users, ?load average: 13.54, =
13.41, 13.36
>> > Tasks: 201 total, ?13 running, 187 sleeping, ? 0 stopped, ? 1 zomb=
ie
>> > Cpu(s): 41.5%us, 58.5%sy, ?0.0%ni, ?0.0%id, ?0.0%wa, ?0.0%hi, ?0.0=
%si, ?0.0%st
>> > Mem: ? ?515004k total, ? 400824k used, ? 114180k free, ? ? ? 28k b=
uffers
>> > Swap: ?4302560k total, ? 173988k used, ?4128572k free, ? 202948k c=
ached
>> >
>> > ?PID USER ? ? ?PR ?NI ?VIRT ?RES ?SHR S %CPU %MEM ? ?TIME+ ?COMMAN=
D
>> > ?1592 ookhoi ? ?20 ? 0 ?2716 ?456 ?348 S ?1.9 ?0.1 ?25:17.42 showN=
ewMail2
>> > ?6761 ookhoi ? ?20 ? 0 ?2736 1000 ?704 S ?1.3 ?0.2 ?61:21.93 top
>> > 27609 ookhoi ? ?20 ? 0 ?2736 1264 ?936 R ?1.3 ?0.2 ? 0:01.06 top
>> > 30678 ookhoi ? ?20 ? 0 ?2736 ?892 ?584 S ?1.3 ?0.2 ?91:37.75 top
>> > ?6036 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 869:46.32 procm=
ail
>> > 11373 ookhoi ? ?39 ?19 ?4800 ? 64 ? 52 R ?1.0 ?0.0 714:25.88 procm=
ail
>> > 18871 root ? ? ?39 ?19 ?2540 ? 32 ? 20 R ?1.0 ?0.0 ? 1528:51 lzop
>> > 18894 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 611:16.18 procm=
ail
>> > 20305 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:51.97 procm=
ail
>> > 20378 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:50.75 procm=
ail
>> > 23661 ookhoi ? ?39 ?19 ?2692 ? 80 ? 68 R ?1.0 ?0.0 ? 1308:23 procm=
ail
>> > 25091 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 S ?1.0 ?0.0 ? 0:25.63 flush=
-btrfs-2
>> > 26409 root ? ? ?39 ?19 ?2264 ? 32 ? 28 R ?1.0 ?0.0 ? 1526:42 mv
>> > 27606 ookhoi ? ?39 ?19 ?9084 ? 40 ? 28 R ?1.0 ?0.0 ? 3637:39 procm=
ail
>> > 27910 root ? ? ?39 ?19 15096 3756 ?304 R ?1.0 ?0.7 638:46.62 dpkg
>> > 11804 ookhoi ? ?39 ?19 ?4700 ? 64 ? 52 R ?0.6 ?0.0 714:08.67 procm=
ail
>> > ? ?3 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 R ?0.3 ?0.0 ? 9:39.76 ksofti=
rqd/0
>> >
>> >
>> > What can I do to provide more info?
>>
>> alt-sysrq-w, and then the dmesg output, which will contain then a
>> backtrace for every blocked process.
>
> Thanks cwillu.
>
> Seems only two processes. And these are related to the backup disk
> (which might or might not be broken: can't access it anymore).
>
> Nothing to do with the procmail and dpkg processes.
>
>
> [1042949.513831] SysRq : Show Blocked State
> [1042949.517776] =A0 task =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0PC stack =A0=
 pid father
> [1042949.523247] cat =A0 =A0 =A0 =A0 =A0 D c0475dd0 =A0 =A0 0 30063 =A0=
 =A0 =A01 0x00000001
> [1042949.529668] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>=
] (__mutex_lock_slowpath+0x64/0x88)
> [1042949.538943] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from =
[<c01af0e8>] (do_lookup+0x90/0x128)
> [1042949.548209] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>=
] (do_last+0x198/0x5b8)
> [1042949.556432] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>]=
 (do_filp_open+0x168/0x49c)
> [1042949.565004] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a5=
55c>] (do_sys_open+0x58/0x11c)
> [1042949.573838] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee=
0>] (ret_fast_syscall+0x0/0x2c)
> [1042949.582750] cat =A0 =A0 =A0 =A0 =A0 D c0475dd0 =A0 =A0 0 =A04591=
 =A0 =A0 =A01 0x00000001
> [1042949.589152] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>=
] (__mutex_lock_slowpath+0x64/0x88)
> [1042949.598418] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from =
[<c01af0e8>] (do_lookup+0x90/0x128)
> [1042949.607687] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>=
] (do_last+0x198/0x5b8)
> [1042949.615910] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>]=
 (do_filp_open+0x168/0x49c)
> [1042949.624482] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a5=
55c>] (do_sys_open+0x58/0x11c)
> [1042949.633315] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee=
0>] (ret_fast_syscall+0x0/0x2c)

dpkg and procmail were just showing up for you in top because it was
sorting by memory usage, which isn't what we were looking for here.
In your case, the blocking is almost certainly due to your failing
disk.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: flush-btrfs-1 hangs when building openwrt
  2011-01-31 11:26       ` cwillu
@ 2011-01-31 11:40         ` Sander
  0 siblings, 0 replies; 8+ messages in thread
From: Sander @ 2011-01-31 11:40 UTC (permalink / raw)
  To: cwillu; +Cc: sander, Daniel Poelzleithner, linux-btrfs

cwillu wrote (ao):
> On Mon, Jan 31, 2011 at 5:18 AM, Sander <sander@humilis.net> wrote:
> > cwillu wrote (ao):
> >> On Mon, Jan 31, 2011 at 4:52 AM, Sander <sander@humilis.net> wrote:
> >> > It started with hanging jobs on the backup disk. I stopped cron and
> >> > could kill most of the jobs. Some are still hanging though.
> >> >
> >> > Since then (uptime 12 days) I see hanging procmail processes, and an
> >> > apt-get upgrade last week gave an unkillable dpkg process. All these have
> >> > nothing to do with the backup disk. CPU is maxed out:
> >> >
> >> > top - 11:49:54 up 12 days, ?1:19, 31 users, ?load average: 13.54, 13.41, 13.36
> >> > Tasks: 201 total, ?13 running, 187 sleeping, ? 0 stopped, ? 1 zombie
> >> > Cpu(s): 41.5%us, 58.5%sy, ?0.0%ni, ?0.0%id, ?0.0%wa, ?0.0%hi, ?0.0%si, ?0.0%st
> >> > Mem: ? ?515004k total, ? 400824k used, ? 114180k free, ? ? ? 28k buffers
> >> > Swap: ?4302560k total, ? 173988k used, ?4128572k free, ? 202948k cached
> >> >
> >> > ?PID USER ? ? ?PR ?NI ?VIRT ?RES ?SHR S %CPU %MEM ? ?TIME+ ?COMMAND
> >> > ?1592 ookhoi ? ?20 ? 0 ?2716 ?456 ?348 S ?1.9 ?0.1 ?25:17.42 showNewMail2
> >> > ?6761 ookhoi ? ?20 ? 0 ?2736 1000 ?704 S ?1.3 ?0.2 ?61:21.93 top
> >> > 27609 ookhoi ? ?20 ? 0 ?2736 1264 ?936 R ?1.3 ?0.2 ? 0:01.06 top
> >> > 30678 ookhoi ? ?20 ? 0 ?2736 ?892 ?584 S ?1.3 ?0.2 ?91:37.75 top
> >> > ?6036 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 869:46.32 procmail
> >> > 11373 ookhoi ? ?39 ?19 ?4800 ? 64 ? 52 R ?1.0 ?0.0 714:25.88 procmail
> >> > 18871 root ? ? ?39 ?19 ?2540 ? 32 ? 20 R ?1.0 ?0.0 ? 1528:51 lzop
> >> > 18894 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 611:16.18 procmail
> >> > 20305 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:51.97 procmail
> >> > 20378 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:50.75 procmail
> >> > 23661 ookhoi ? ?39 ?19 ?2692 ? 80 ? 68 R ?1.0 ?0.0 ? 1308:23 procmail
> >> > 25091 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 S ?1.0 ?0.0 ? 0:25.63 flush-btrfs-2
> >> > 26409 root ? ? ?39 ?19 ?2264 ? 32 ? 28 R ?1.0 ?0.0 ? 1526:42 mv
> >> > 27606 ookhoi ? ?39 ?19 ?9084 ? 40 ? 28 R ?1.0 ?0.0 ? 3637:39 procmail
> >> > 27910 root ? ? ?39 ?19 15096 3756 ?304 R ?1.0 ?0.7 638:46.62 dpkg
> >> > 11804 ookhoi ? ?39 ?19 ?4700 ? 64 ? 52 R ?0.6 ?0.0 714:08.67 procmail
> >> > ? ?3 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 R ?0.3 ?0.0 ? 9:39.76 ksoftirqd/0
> >> >
> >> >
> >> > What can I do to provide more info?
> >>
> >> alt-sysrq-w, and then the dmesg output, which will contain then a
> >> backtrace for every blocked process.
> >
> > Thanks cwillu.
> >
> > Seems only two processes. And these are related to the backup disk
> > (which might or might not be broken: can't access it anymore).
> >
> > Nothing to do with the procmail and dpkg processes.
> >
> >
> > [1042949.513831] SysRq : Show Blocked State
> > [1042949.517776] ? task ? ? ? ? ? ? ? ?PC stack ? pid father
> > [1042949.523247] cat ? ? ? ? ? D c0475dd0 ? ? 0 30063 ? ? ?1 0x00000001
> > [1042949.529668] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88)
> > [1042949.538943] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from [<c01af0e8>] (do_lookup+0x90/0x128)
> > [1042949.548209] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>] (do_last+0x198/0x5b8)
> > [1042949.556432] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>] (do_filp_open+0x168/0x49c)
> > [1042949.565004] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a555c>] (do_sys_open+0x58/0x11c)
> > [1042949.573838] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee0>] (ret_fast_syscall+0x0/0x2c)
> > [1042949.582750] cat ? ? ? ? ? D c0475dd0 ? ? 0 ?4591 ? ? ?1 0x00000001
> > [1042949.589152] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88)
> > [1042949.598418] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from [<c01af0e8>] (do_lookup+0x90/0x128)
> > [1042949.607687] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>] (do_last+0x198/0x5b8)
> > [1042949.615910] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>] (do_filp_open+0x168/0x49c)
> > [1042949.624482] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a555c>] (do_sys_open+0x58/0x11c)
> > [1042949.633315] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee0>] (ret_fast_syscall+0x0/0x2c)
> 
> dpkg and procmail were just showing up for you in top because it was
> sorting by memory usage, which isn't what we were looking for here.

It was not. The CPU numbers were low due to a 'find' which consumes a
lot now and then. This one shows better:

top - 12:32:22 up 12 days,  2:01, 32 users,  load average: 13.48, 13.37, 13.39
Tasks: 199 total,  12 running, 186 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us, 75.4%sy,  0.0%ni, 24.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    515004k total,   366200k used,   148804k free,       28k buffers
Swap:  4302560k total,   174188k used,  4128372k free,   170124k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11804 ookhoi    39  19  4700   64   52 R  8.8  0.0 717:10.30 procmail
 6036 ookhoi    39  19  2692   64   52 R  8.5  0.0 872:47.95 procmail
18871 root      39  19  2540   32   20 R  8.5  0.0   1531:53 lzop
20305 ookhoi    39  19  2692   68   56 R  8.5  0.0 613:53.59 procmail
20378 ookhoi    39  19  2692   68   56 R  8.5  0.0 613:52.37 procmail
23661 ookhoi    39  19  2692   80   68 R  8.5  0.0   1311:24 procmail
27910 root      39  19 15096 3748  304 R  8.5  0.7 641:48.25 dpkg
11373 ookhoi    39  19  4800   64   52 R  8.2  0.0 717:27.50 procmail
18894 ookhoi    39  19  2692   64   52 R  8.2  0.0 614:17.80 procmail
26409 root      39  19  2264   32   28 R  8.2  0.0   1529:44 mv
27606 ookhoi    39  19  9084   40   28 R  8.2  0.0   3640:41 procmail
11120 root      20   0     0    0    0 S  5.6  0.0   0:02.94 flush-btrfs-2


> In your case, the blocking is almost certainly due to your failing
> disk.

Also for procmail and dpkg? Which do not operate on the disk that seems
to fail, and is located under /holding/ ?

Anyway, I'll reboot the machine this afternoon with the suspect disk
removed.

Thanks again for your reply cwillu.

	Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-01-31 11:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-25  8:51 flush-btrfs-1 hangs when building openwrt Daniel Poelzleithner
2011-01-25 15:30 ` Josef Bacik
2011-01-25 21:16   ` Daniel Poelzleithner
2011-01-31 10:52 ` Sander
2011-01-31 11:08   ` cwillu
2011-01-31 11:18     ` Sander
2011-01-31 11:26       ` cwillu
2011-01-31 11:40         ` Sander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.