All of lore.kernel.org
 help / color / mirror / Atom feed
* [regression?] hang with 2.6.37 on a BTRFS test machine
@ 2011-01-23 12:06 Martin Steigerwald
  2011-01-23 13:13 ` Helmut Hullen
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Steigerwald @ 2011-01-23 12:06 UTC (permalink / raw)
  To: linux-kernel, linux-btrfs

[-- Attachment #1: Type: Text/Plain, Size: 11099 bytes --]

Not reported on bugzilla.kernel.org yet as its so vague.


Hi!

I have something I do not have the slightest clue on how to track down:

Occasionally with 2.6.37 plain vanilla my ThinkPad T23 freezes in such a 
way, that when I type a command at the prompt, nothing will happen 
anymore. Same goes for login into a tty. Even dmesg is not executed 
anymore. My KDE session doesn't react to clicks anymore, except for window 
movement. Most of the time the mouse pointer is still movable, but 
sometimes it is frozen as well. But the latter might be a different problem 
with X.org.

My theory is that at least I/O related kernel functions do not return 
anymore at that time.

Nothing is in the logs after a reboot.

Problem now is: This happens rather irregularily. That kernel run for days 
without problems. But today two times it hung after the first time it hung. 
One time it was on TTY 1 instead of in X.org. So when this happens it 
seems to persist a while.

But I now booted with 2.6.36 and the machine is as stable as if nothing 
has happened at all.

OTOH a 2.6.37 with recursive mtime patches from Jan Kara - that hung on 
the T23 as well - runs just fine on my T42 for nine days (minus hibernation 
times) already.

Now it could be something hardware related, the T23 is different from the 
T42. Also the T23 has a fan that should be replaced or oiled - but it 
still works right up to now. Also with 2.6.36 or earlier kernels the 
machine never hung that way so far.

Or it could be something BTRFS related. The T23 is using BTRFS on /home 
for months already and for / since beginning this year or end of last year 
while the T42 is still with Ext4 - until there is a robust fsck for BTRFS 
and BTRFS is marked stable in the kernel at last.

A bisection is too much effort for me - especially as it could take days 
till the problems shows up. Some sysrq stuff doesn't work without serial or 
netconsole since a call to dmesg does not return anymore when it happens. 
Having netconsole and serial running for days for just in case it happens 
is also quite some effort, since I have to remember to reinitialize it 
after each hibernation cycle to have a permanent setup.

Since I have no other idea on how to track this with reasonable effort - I 
spent tons of time with trying to help debugging various, partly 
intermittent, kernel problems in the last months and I have enough of that 
ATM - I will stick to 2.6.36 for the time being and see whether the T23 is 
really stable with that one. I still wanted to report my - admittedly 
vague - findings in case it rings a bell for someone. I will monitor 
2.6.37.y updates in case they contain BTRFS and or T23 hardware related 
fixes.

Last but not least: I have no seen BTRFS related errors or other oddities 
in syslog. The following looks quite normal to me, I did a btrfs filesystem 
balance some time ago and had some hard switch off cold reboots due to the 
issue mentioned:

artin@deepdance:~> zgrep -i btrfs /var/log/syslog*
/var/log/syslog.1:Jan 17 20:43:06 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:01:12 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:01:12 localhost kernel: btrfs: unlinked 3 
orphans
/var/log/syslog.1:Jan 23 12:09:36 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:09:36 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.1:Jan 23 12:20:52 localhost kernel: Btrfs loaded
/var/log/syslog.1:Jan 23 12:20:52 localhost kernel: btrfs: unlinked 18 
orphans
/var/log/syslog.2.gz:Jan 14 12:32:16 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 14 12:32:16 localhost kernel: btrfs: unlinked 116 
orphans
/var/log/syslog.2.gz:Jan 14 12:40:02 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 14 14:44:28 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:14:06 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:14:06 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 18:26:46 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:26:46 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 18:41:11 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:41:11 localhost kernel: btrfs: unlinked 3 
orphans
/var/log/syslog.2.gz:Jan 16 18:53:47 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 18:53:47 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 19:00:14 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 19:00:14 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.2.gz:Jan 16 19:05:53 localhost kernel: Btrfs loaded
/var/log/syslog.2.gz:Jan 16 19:05:54 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.7.gz:Dec  9 19:05:38 localhost kernel: Btrfs loaded
/var/log/syslog.7.gz:Dec  9 19:05:38 localhost kernel: btrfs: unlinked 2 
orphans
/var/log/syslog.7.gz:Dec  9 19:05:38 localhost kernel: btrfs: truncated 1 
orphans
/var/log/syslog.7.gz:Dec  9 20:54:10 localhost kernel: btrfs: relocating 
block group 10900996096 flags 36
/var/log/syslog.7.gz:Dec  9 20:55:02 localhost kernel: btrfs: found 2036 
extents
/var/log/syslog.7.gz:Dec  9 20:55:02 localhost kernel: btrfs: relocating 
block group 10766778368 flags 36
/var/log/syslog.7.gz:Dec  9 20:56:15 localhost kernel: btrfs: found 10432 
extents
/var/log/syslog.7.gz:Dec  9 20:56:16 localhost kernel: btrfs: relocating 
block group 9693036544 flags 1
/var/log/syslog.7.gz:Dec  9 20:56:26 localhost kernel: btrfs: found 155 
extents
/var/log/syslog.7.gz:Dec  9 20:56:31 localhost kernel: btrfs: found 154 
extents
/var/log/syslog.7.gz:Dec  9 20:56:31 localhost kernel: btrfs: relocating 
block group 8619294720 flags 1
/var/log/syslog.7.gz:Dec  9 20:57:37 localhost kernel: btrfs: found 5027 
extents
/var/log/syslog.7.gz:Dec  9 20:57:49 localhost kernel: btrfs: found 5027 
extents
/var/log/syslog.7.gz:Dec  9 20:57:51 localhost kernel: btrfs: relocating 
block group 7545552896 flags 1
/var/log/syslog.7.gz:Dec  9 20:59:48 localhost kernel: btrfs: found 12851 
extents
/var/log/syslog.7.gz:Dec  9 21:00:28 localhost kernel: btrfs: found 12846 
extents
/var/log/syslog.7.gz:Dec  9 21:00:29 localhost kernel: btrfs: relocating 
block group 6471811072 flags 1
/var/log/syslog.7.gz:Dec  9 21:01:59 localhost kernel: btrfs: found 46780 
extents
/var/log/syslog.7.gz:Dec  9 21:03:21 localhost kernel: btrfs: found 46780 
extents
/var/log/syslog.7.gz:Dec  9 21:03:26 localhost kernel: btrfs: relocating 
block group 5398069248 flags 1
/var/log/syslog.7.gz:Dec  9 21:04:34 localhost kernel: btrfs: found 19425 
extents
/var/log/syslog.7.gz:Dec  9 21:05:15 localhost kernel: btrfs: found 19425 
extents
/var/log/syslog.7.gz:Dec  9 21:05:16 localhost kernel: btrfs: relocating 
block group 4324327424 flags 1
/var/log/syslog.7.gz:Dec  9 21:06:33 localhost kernel: btrfs: found 26059 
extents
/var/log/syslog.7.gz:Dec  9 21:07:22 localhost kernel: btrfs: found 26059 
extents
/var/log/syslog.7.gz:Dec  9 21:07:25 localhost kernel: btrfs: relocating 
block group 3250585600 flags 1
/var/log/syslog.7.gz:Dec  9 21:08:47 localhost kernel: btrfs: found 15614 
extents
/var/log/syslog.7.gz:Dec  9 21:09:29 localhost kernel: btrfs: found 15614 
extents
/var/log/syslog.7.gz:Dec  9 21:09:31 localhost kernel: btrfs: relocating 
block group 2176843776 flags 1
/var/log/syslog.7.gz:Dec  9 21:10:58 localhost kernel: btrfs: found 4275 
extents
/var/log/syslog.7.gz:Dec  9 21:11:34 localhost kernel: btrfs: found 4275 
extents
/var/log/syslog.7.gz:Dec  9 21:11:35 localhost kernel: btrfs: relocating 
block group 1103101952 flags 1
/var/log/syslog.7.gz:Dec  9 21:13:01 localhost kernel: btrfs: found 7296 
extents
/var/log/syslog.7.gz:Dec  9 21:13:35 localhost kernel: btrfs: found 7296 
extents
/var/log/syslog.7.gz:Dec  9 21:13:36 localhost kernel: btrfs: relocating 
block group 29360128 flags 36
/var/log/syslog.7.gz:Dec  9 21:16:53 localhost kernel: btrfs: found 78308 
extents
/var/log/syslog.7.gz:Dec  9 21:16:54 localhost kernel: btrfs: relocating 
block group 20971520 flags 34
/var/log/syslog.7.gz:Dec  9 21:16:54 localhost kernel: btrfs: found 1 
extents
/var/log/syslog.7.gz:Dec  9 21:16:56 localhost kernel: btrfs: relocating 
block group 12582912 flags 1
/var/log/syslog.7.gz:Dec  9 21:16:57 localhost kernel: btrfs: found 85 
extents
/var/log/syslog.7.gz:Dec  9 21:16:59 localhost kernel: btrfs: found 85 
extents
/var/log/syslog.7.gz:Dec  9 21:17:01 localhost kernel: btrfs: relocating 
block group 4194304 flags 4

Hmmm, one thing there is:

Jan 13 13:27:15 localhost kernel: INFO: task dpkg:21162 blocked for more 
than 120 seconds.
Jan 13 13:27:15 localhost kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 13 13:27:15 localhost kernel: dpkg          D ef891b0c     0 21162   
2334 0x00000000
Jan 13 13:27:15 localhost kernel: d2e4be7c 00000082 ef8918cc ef891b0c 
ef8918a0 ef891b0c ef891b10 1b8b88c5
Jan 13 13:27:15 localhost kernel: 00018796 ef8918a0 7fffffff d2e4bf4c 7fffffff 
d2e4bee4 c13c4495 cb7862ac
Jan 13 13:27:15 localhost kernel: d2e4be98 c102ab56 cb7862ac c1514fa0 
d2e4beb4 c102d740 ee303140 d2e4bec0
Jan 13 13:27:15 localhost kernel: Call Trace:
Jan 13 13:27:15 localhost kernel: [<c13c4495>] 
schedule_timeout+0x175/0x280
Jan 13 13:27:15 localhost kernel: [<c102ab56>] ? 
__dequeue_entity+0x26/0x50
Jan 13 13:27:15 localhost kernel: [<c102d740>] ? set_next_entity+0xc0/0xe0
Jan 13 13:27:15 localhost kernel: [<c10017b3>] ? __switch_to+0xa3/0x160
Jan 13 13:27:15 localhost kernel: [<c1030132>] ? T.1277+0x32/0xa0
Jan 13 13:27:15 localhost kernel: [<c13c394c>] ? schedule+0x1fc/0x5d0
Jan 13 13:27:15 localhost kernel: [<c13c4006>] wait_for_common+0xa6/0x160
Jan 13 13:27:15 localhost kernel: [<c10316a0>] ? 
default_wake_function+0x0/0x20
Jan 13 13:27:15 localhost kernel: [<c13c4197>] 
wait_for_completion+0x17/0x20
Jan 13 13:27:15 localhost kernel: [<c10f8d52>] sync_inodes_sb+0x92/0x1a0
Jan 13 13:27:15 localhost kernel: [<c111f230>] ? 
dquot_quota_sync+0x0/0x310
Jan 13 13:27:15 localhost kernel: [<c10fd4f7>] __sync_filesystem+0x87/0xa0
Jan 13 13:27:15 localhost kernel: [<c10fd52f>] sync_one_sb+0x1f/0x30
Jan 13 13:27:15 localhost kernel: [<c10d9e6b>] iterate_supers+0x7b/0xf0
Jan 13 13:27:15 localhost kernel: [<c10fd510>] ? sync_one_sb+0x0/0x30
Jan 13 13:27:15 localhost kernel: [<c10fd57c>] sys_sync+0x3c/0x60
Jan 13 13:27:15 localhost kernel: [<c1002f23>] sysenter_do_call+0x12/0x2c
Jan 13 13:33:38 localhost init: Trying to re-exec init

There is only one occurance of it:

martin@deepdance:~> zgrep -i "blocked" /var/log/syslog*
/var/log/syslog.2.gz:Jan 13 13:27:15 localhost kernel: INFO: task 
dpkg:21162 blocked for more than 120 seconds.
martin@deepdance:~#1>

I put the logs aside in case anyone wants to have a closer look.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [regression?] hang with 2.6.37 on a BTRFS test machine
  2011-01-23 12:06 [regression?] hang with 2.6.37 on a BTRFS test machine Martin Steigerwald
@ 2011-01-23 13:13 ` Helmut Hullen
  2011-01-23 13:44   ` Martin Steigerwald
  0 siblings, 1 reply; 4+ messages in thread
From: Helmut Hullen @ 2011-01-23 13:13 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Martin,

Du meintest am 23.01.11:


> I have something I do not have the slightest clue on how to track
> down:

> Occasionally with 2.6.37 plain vanilla my ThinkPad T23 freezes in
> such a way, that when I type a command at the prompt, nothing will
> happen anymore. Same goes for login into a tty. Even dmesg is not
> executed anymore. My KDE session doesn't react to clicks anymore,
> except for window movement. Most of the time the mouse pointer is
> still movable, but sometimes it is frozen as well. But the latter
> might be a different problem with X.org.

Maybe it's the same problem ... I have (among my machines) a Thinkpad  
T22 on which I have run btrfs with kernel 2.6.37-rc4. Sometimes it hung,  
too. But it's an old machine, with old RAM, with some old disks. It has  
hung some time ago also without any btrfs partition and with older  
kernel(s).

But I'll test again.

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [regression?] hang with 2.6.37 on a BTRFS test machine
  2011-01-23 13:13 ` Helmut Hullen
@ 2011-01-23 13:44   ` Martin Steigerwald
  2011-01-23 14:13     ` Helmut Hullen
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Steigerwald @ 2011-01-23 13:44 UTC (permalink / raw)
  To: helmut, linux-kernel; +Cc: linux-btrfs

[-- Attachment #1: Type: Text/Plain, Size: 2026 bytes --]

CCs restored. Its usual to keep CCs on Linux kernel related mailing lists.

Am Sunday 23 January 2011 schrieb Helmut Hullen:
> Hallo, Martin,
> 
> Du meintest am 23.01.11:
> > I have something I do not have the slightest clue on how to track
> > down:
> > 
> > Occasionally with 2.6.37 plain vanilla my ThinkPad T23 freezes in
> > such a way, that when I type a command at the prompt, nothing will
> > happen anymore. Same goes for login into a tty. Even dmesg is not
> > executed anymore. My KDE session doesn't react to clicks anymore,
> > except for window movement. Most of the time the mouse pointer is
> > still movable, but sometimes it is frozen as well. But the latter
> > might be a different problem with X.org.
> 
> Maybe it's the same problem ... I have (among my machines) a Thinkpad
> T22 on which I have run btrfs with kernel 2.6.37-rc4. Sometimes it
> hung, too. But it's an old machine, with old RAM, with some old disks.

Is the hung comparable to what I have seen? I.e. mouse pointer and stuff 
still works, but when entering a command the shell does hang?

I can't completely rule out a hardware related problem as well, but then 
with 2.6.36 the machine now just works again as if nothing happened - its 
mostly running at 733 MHz instead of 1.13 GHz and the fan still expels a 
steady stream of lukewarm air. I think I have to run it with 2.6.36 at 
least a week to see whether its stable with that one. Its not important to 
run it with 2.6.37 so thats not a problem. Its just my Amarok machine.

Maybe it would be a good idea when you compare with 2.6.36 as well...

> It has hung some time ago also without any btrfs partition and with
> older kernel(s).
> 
> But I'll test again.

I will also continue to test with a 2.6.37 on a workstation at work, but 
there only some scratch partition uses BTRFS. My T23 has / and /home on 
it.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [regression?] hang with 2.6.37 on a BTRFS test machine
  2011-01-23 13:44   ` Martin Steigerwald
@ 2011-01-23 14:13     ` Helmut Hullen
  0 siblings, 0 replies; 4+ messages in thread
From: Helmut Hullen @ 2011-01-23 14:13 UTC (permalink / raw)
  To: linux-btrfs

Hallo, Martin,

Du meintest am 23.01.11:

>> Maybe it's the same problem ... I have (among my machines) a
>> Thinkpad T22 on which I have run btrfs with kernel 2.6.37-rc4.
>> Sometimes it hung, too. But it's an old machine, with old RAM, with
>> some old disks.

> Is the hung comparable to what I have seen? I.e. mouse pointer and
> stuff still works, but when entering a command the shell does hang?

"What's a mouse pointer"?

I run this machine as a pure CLI machine.
If I remember correct the last crashes: nothing worked. No keyboard, no  
LAN connections. Only the "power down" button ...

> I will also continue to test with a 2.6.37 on a workstation at work,
> but there only some scratch partition uses BTRFS. My T23 has / and
> /home on it.

On my machines I've deleted btrfs for "working" directories, I'm only  
testing on devices with no valuable data. There's still a big problem  
with "ENOSPC".

Viele Gruesse!
Helmut

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-23 14:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-23 12:06 [regression?] hang with 2.6.37 on a BTRFS test machine Martin Steigerwald
2011-01-23 13:13 ` Helmut Hullen
2011-01-23 13:44   ` Martin Steigerwald
2011-01-23 14:13     ` Helmut Hullen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.