All of lore.kernel.org
 help / color / mirror / Atom feed
* possible regression in kernel 3.6: system hangs during nightly tape backup
@ 2012-11-20  0:14 Tilman Schmidt
  2012-11-20 15:49 ` Tilman Schmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Tilman Schmidt @ 2012-11-20  0:14 UTC (permalink / raw)
  To: LKML

[-- Attachment #1: Type: text/plain, Size: 2513 bytes --]

For the 4th time now after switching to kernel 3.6, my system became
unresponsive during the nightly Bacula backup run. It looks as if
all disk accesses are suddenly blocked:
- Desktop apps stop responding one after another, starting with
  Firefox followed by other "heavy" apps, while Konsole windows
  continue being usable for a while.
- "top" shows the load average steadily increasing with no process
  actually consuming relevant quantities of CPU.
- I can do "dmesg > /root/dmesg.out" followed by "less /root/dmesg.out"
  in a Konsole window just fine, but after the inevitable hard reset
  the file /root/dmesg.out isn't there.
- The "sync" command hangs indefinitely.
- The "shutdown" command and ctrl/alt/Del emit "system going down"
  broadcast messages but never get anywhere.
- Killing processes manually works for some (bacula-sd even ejects
  the tape before exiting) but most remain in state D or Z.
- Eventually, all text consoles are blocked and a hardware reset is
  the only remaining option.
- After the reboot, a Bacula spool file is left behind in
  /var/spool/bacula, proof that the hang happened during the backup.

This does not happen during every backup run, but frequently enough
to be annoying. (About once per week.) It never happened with kernel
3.5. For comparison went back to kernel 3.5.7 for a week and it
never happened during that time. Last night I booted 3.6.7 and the
very next backup caused the hang again. The last kernel message that
made it to the syslog on disk was

Nov 19 23:05:04 xenon kernel: [73877.128546] st0: Block limits 256 -
524288 bytes.

triggered by the start of the backup. In dmesg the next message was

[74401.249091] INFO: task flush-253:2:1320 blocked for more than 120
seconds.

followed by a backtrace. I have photos of the remaining dmesg output
which I'll try to upload somewhere accessible tomorrow.

Hardware configuration:
Intel Pentium D, Intel DQ965GF mainboard, 6 GB RAM
onboard S-ATA controller driving two 500 GB S-ATA disks
and a Pioneer DVR-216D DVD-RW drive
Adaptec 29160B Ultra160 SCSI adapter driving a
Tandberg TS400 LTO-2 tape drive

Disk configuration: md RAID1, LVM, ext3 and ext4 volumes

Software: Opensuse 11.4 64 bit, vanilla kernel 3.5.7 and 3.6.7,
Bacula 5.2.12

HTH
T.

-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible regression in kernel 3.6: system hangs during nightly tape backup
  2012-11-20  0:14 possible regression in kernel 3.6: system hangs during nightly tape backup Tilman Schmidt
@ 2012-11-20 15:49 ` Tilman Schmidt
  2012-11-24 19:17 ` Tilman Schmidt
  2012-12-05 11:03 ` possible regression in kernel 3.6 and 3.7-rc: " Tilman Schmidt
  2 siblings, 0 replies; 5+ messages in thread
From: Tilman Schmidt @ 2012-11-20 15:49 UTC (permalink / raw)
  To: LKML

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

On 20.11.2012 01:14, /me wrote:
> For the 4th time now after switching to kernel 3.6, my system became
> unresponsive during the nightly Bacula backup run. It looks as if
> all disk accesses are suddenly blocked [...]
> It never happened with kernel
> 3.5. For comparison went back to kernel 3.5.7 for a week and it
> never happened during that time. Last night I booted 3.6.7 and the
> very next backup caused the hang again. [...]
> I have photos of the remaining dmesg output
> which I'll try to upload somewhere accessible tomorrow.

Screenshots are now up on
http://www.phoenixsoftware.de/~ts/kernel36hang/
along with the .config of the kernel running at the time.

-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible regression in kernel 3.6: system hangs during nightly tape backup
  2012-11-20  0:14 possible regression in kernel 3.6: system hangs during nightly tape backup Tilman Schmidt
  2012-11-20 15:49 ` Tilman Schmidt
@ 2012-11-24 19:17 ` Tilman Schmidt
  2012-11-25 21:34   ` Borislav Petkov
  2012-12-05 11:03 ` possible regression in kernel 3.6 and 3.7-rc: " Tilman Schmidt
  2 siblings, 1 reply; 5+ messages in thread
From: Tilman Schmidt @ 2012-11-24 19:17 UTC (permalink / raw)
  To: LKML

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

Any ideas on that? I'm currently avoiding the 3.6 series because
of that problem but would be willing to reproduce the hang if I'd
know what to do once it happens, ie. what kind of information to
collect in order to identify the cause of the problem.
I may also try 3.7-rc if there's any interest.

Am 20.11.2012 01:14, schrieb Tilman Schmidt:
> For the 4th time now after switching to kernel 3.6, my system became
> unresponsive during the nightly Bacula backup run. It looks as if
> all disk accesses are suddenly blocked:
> - Desktop apps stop responding one after another, starting with
>   Firefox followed by other "heavy" apps, while Konsole windows
>   continue being usable for a while.
> - "top" shows the load average steadily increasing with no process
>   actually consuming relevant quantities of CPU.
> - I can do "dmesg > /root/dmesg.out" followed by "less /root/dmesg.out"
>   in a Konsole window just fine, but after the inevitable hard reset
>   the file /root/dmesg.out isn't there.
> - The "sync" command hangs indefinitely.
> - The "shutdown" command and ctrl/alt/Del emit "system going down"
>   broadcast messages but never get anywhere.
> - Killing processes manually works for some (bacula-sd even ejects
>   the tape before exiting) but most remain in state D or Z.
> - Eventually, all text consoles are blocked and a hardware reset is
>   the only remaining option.
> - After the reboot, a Bacula spool file is left behind in
>   /var/spool/bacula, proof that the hang happened during the backup.
> 
> This does not happen during every backup run, but frequently enough
> to be annoying. (About once per week.) It never happened with kernel
> 3.5. For comparison went back to kernel 3.5.7 for a week and it
> never happened during that time. Last night I booted 3.6.7 and the
> very next backup caused the hang again. The last kernel message that
> made it to the syslog on disk was
> 
> Nov 19 23:05:04 xenon kernel: [73877.128546] st0: Block limits 256 -
> 524288 bytes.
> 
> triggered by the start of the backup. In dmesg the next message was
> 
> [74401.249091] INFO: task flush-253:2:1320 blocked for more than 120
> seconds.
> 
> followed by a backtrace. I have photos of the remaining dmesg output
> which I'll try to upload somewhere accessible tomorrow.
> 
> Hardware configuration:
> Intel Pentium D, Intel DQ965GF mainboard, 6 GB RAM
> onboard S-ATA controller driving two 500 GB S-ATA disks
> and a Pioneer DVR-216D DVD-RW drive
> Adaptec 29160B Ultra160 SCSI adapter driving a
> Tandberg TS400 LTO-2 tape drive
> 
> Disk configuration: md RAID1, LVM, ext3 and ext4 volumes
> 
> Software: Opensuse 11.4 64 bit, vanilla kernel 3.5.7 and 3.6.7,
> Bacula 5.2.12
> 
> HTH
> T.
> 


-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible regression in kernel 3.6: system hangs during nightly tape backup
  2012-11-24 19:17 ` Tilman Schmidt
@ 2012-11-25 21:34   ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2012-11-25 21:34 UTC (permalink / raw)
  To: Tilman Schmidt; +Cc: LKML, linux-ide

+ linux-ide

On Sat, Nov 24, 2012 at 08:17:48PM +0100, Tilman Schmidt wrote:
> Any ideas on that? I'm currently avoiding the 3.6 series because
> of that problem but would be willing to reproduce the hang if I'd
> know what to do once it happens, ie. what kind of information to
> collect in order to identify the cause of the problem.
> I may also try 3.7-rc if there's any interest.
> 
> Am 20.11.2012 01:14, schrieb Tilman Schmidt:
> > For the 4th time now after switching to kernel 3.6, my system became
> > unresponsive during the nightly Bacula backup run. It looks as if
> > all disk accesses are suddenly blocked:
> > - Desktop apps stop responding one after another, starting with
> >   Firefox followed by other "heavy" apps, while Konsole windows
> >   continue being usable for a while.
> > - "top" shows the load average steadily increasing with no process
> >   actually consuming relevant quantities of CPU.
> > - I can do "dmesg > /root/dmesg.out" followed by "less /root/dmesg.out"
> >   in a Konsole window just fine, but after the inevitable hard reset
> >   the file /root/dmesg.out isn't there.
> > - The "sync" command hangs indefinitely.
> > - The "shutdown" command and ctrl/alt/Del emit "system going down"
> >   broadcast messages but never get anywhere.
> > - Killing processes manually works for some (bacula-sd even ejects
> >   the tape before exiting) but most remain in state D or Z.
> > - Eventually, all text consoles are blocked and a hardware reset is
> >   the only remaining option.
> > - After the reboot, a Bacula spool file is left behind in
> >   /var/spool/bacula, proof that the hang happened during the backup.
> > 
> > This does not happen during every backup run, but frequently enough
> > to be annoying. (About once per week.) It never happened with kernel
> > 3.5. For comparison went back to kernel 3.5.7 for a week and it
> > never happened during that time. Last night I booted 3.6.7 and the
> > very next backup caused the hang again. The last kernel message that
> > made it to the syslog on disk was
> > 
> > Nov 19 23:05:04 xenon kernel: [73877.128546] st0: Block limits 256 -
> > 524288 bytes.
> > 
> > triggered by the start of the backup. In dmesg the next message was
> > 
> > [74401.249091] INFO: task flush-253:2:1320 blocked for more than 120
> > seconds.
> > 
> > followed by a backtrace. I have photos of the remaining dmesg output
> > which I'll try to upload somewhere accessible tomorrow.
> > 
> > Hardware configuration:
> > Intel Pentium D, Intel DQ965GF mainboard, 6 GB RAM
> > onboard S-ATA controller driving two 500 GB S-ATA disks
> > and a Pioneer DVR-216D DVD-RW drive
> > Adaptec 29160B Ultra160 SCSI adapter driving a
> > Tandberg TS400 LTO-2 tape drive
> > 
> > Disk configuration: md RAID1, LVM, ext3 and ext4 volumes
> > 
> > Software: Opensuse 11.4 64 bit, vanilla kernel 3.5.7 and 3.6.7,
> > Bacula 5.2.12
> > 
> > HTH
> > T.
> > 
> 
> 
> -- 
> Tilman Schmidt                    E-Mail: tilman@imap.cc
> Bonn, Germany
> Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
> Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
> 
> 



-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible regression in kernel 3.6 and 3.7-rc: system hangs during nightly tape backup
  2012-11-20  0:14 possible regression in kernel 3.6: system hangs during nightly tape backup Tilman Schmidt
  2012-11-20 15:49 ` Tilman Schmidt
  2012-11-24 19:17 ` Tilman Schmidt
@ 2012-12-05 11:03 ` Tilman Schmidt
  2 siblings, 0 replies; 5+ messages in thread
From: Tilman Schmidt @ 2012-12-05 11:03 UTC (permalink / raw)
  To: LKML, Linux IDE

[-- Attachment #1: Type: text/plain, Size: 3008 bytes --]

Problem still exists in kernel 3.7-rc7.

During the second backup run after I booted kernel 3.7.0-rc7, once
again all disk activity suddenly ceased, dmesg started to report
tasks "hung for more than 120 seconds", and everything happening
after that was forgotten after a reboot.

Can I do anything to help hunting this down?

Am 20.11.2012 01:14, schrieb Tilman Schmidt:
> For the 4th time now after switching to kernel 3.6, my system became
> unresponsive during the nightly Bacula backup run. It looks as if
> all disk accesses are suddenly blocked:
> - Desktop apps stop responding one after another, starting with
>   Firefox followed by other "heavy" apps, while Konsole windows
>   continue being usable for a while.
> - "top" shows the load average steadily increasing with no process
>   actually consuming relevant quantities of CPU.
> - I can do "dmesg > /root/dmesg.out" followed by "less /root/dmesg.out"
>   in a Konsole window just fine, but after the inevitable hard reset
>   the file /root/dmesg.out isn't there.
> - The "sync" command hangs indefinitely.
> - The "shutdown" command and ctrl/alt/Del emit "system going down"
>   broadcast messages but never get anywhere.
> - Killing processes manually works for some (bacula-sd even ejects
>   the tape before exiting) but most remain in state D or Z.
> - Eventually, all text consoles are blocked and a hardware reset is
>   the only remaining option.
> - After the reboot, a Bacula spool file is left behind in
>   /var/spool/bacula, proof that the hang happened during the backup.
> 
> This does not happen during every backup run, but frequently enough
> to be annoying. (About once per week.) It never happened with kernel
> 3.5. For comparison went back to kernel 3.5.7 for a week and it
> never happened during that time. Last night I booted 3.6.7 and the
> very next backup caused the hang again. The last kernel message that
> made it to the syslog on disk was
> 
> Nov 19 23:05:04 xenon kernel: [73877.128546] st0: Block limits 256 -
> 524288 bytes.
> 
> triggered by the start of the backup. In dmesg the next message was
> 
> [74401.249091] INFO: task flush-253:2:1320 blocked for more than 120
> seconds.
> 
> followed by a backtrace. I have photos of the remaining dmesg output
> which I'll try to upload somewhere accessible tomorrow.
> 
> Hardware configuration:
> Intel Pentium D, Intel DQ965GF mainboard, 6 GB RAM
> onboard S-ATA controller driving two 500 GB S-ATA disks
> and a Pioneer DVR-216D DVD-RW drive
> Adaptec 29160B Ultra160 SCSI adapter driving a
> Tandberg TS400 LTO-2 tape drive
> 
> Disk configuration: md RAID1, LVM, ext3 and ext4 volumes
> 
> Software: Opensuse 11.4 64 bit, vanilla kernel 3.5.7 and 3.6.7,
> Bacula 5.2.12
> 
> HTH
> T.
> 


-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-12-05 11:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-20  0:14 possible regression in kernel 3.6: system hangs during nightly tape backup Tilman Schmidt
2012-11-20 15:49 ` Tilman Schmidt
2012-11-24 19:17 ` Tilman Schmidt
2012-11-25 21:34   ` Borislav Petkov
2012-12-05 11:03 ` possible regression in kernel 3.6 and 3.7-rc: " Tilman Schmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.