All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Jochen Heuer <jogi-kernel@planetzork.ping.de>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	nathans@sgi.com, xfs@oss.sgi.com
Subject: Re: BUG: soft lockup detected on CPU#1!
Date: Mon, 17 Jul 2006 10:30:08 -0400	[thread overview]
Message-ID: <1153146608.1218.9.camel@localhost.localdomain> (raw)
In-Reply-To: <20060717125216.GA15481@planetzork.ping.de>

On Mon, 2006-07-17 at 14:52 +0200, Jochen Heuer wrote:
> Hi,
> 
> I have been running 2.6.17 on my desktop system (Asus A8V + Athlon64 X2 3800)
> and I am having severe problems with lookups. These only show up when surfing
> the net. During compiling or mprime runs --> absolutly no problem.
> 
> At first I thought this was related to the S-ATA driver since I got error
> messages like these on the console once before it locked up hard (no sysrq!):
> 
> ata1: command 0xca timeout, stat 0x50 host_stat 0x4
> ata1: status=0x50 { DriveReady SeekComplete }
> ata1: command 0xea timeout, stat 0x50 host_stat 0x0
> ata1: status=0x50 { DriveReady SeekComplete }
> 
> But switching to an IDE drive did not fix the lockups. So I switched to
> 2.6.18-rc2 and today I got the following reported via dmesg:
> 
> Jul 17 09:23:03 [kernel] BUG: soft lockup detected on CPU#1!
> Jul 17 09:23:03 [kernel]  [<c0103cd2>] show_trace+0x12/0x20
> Jul 17 09:23:03 [kernel]  [<c0103de9>] dump_stack+0x19/0x20
> Jul 17 09:23:03 [kernel]  [<c0143e77>] softlockup_tick+0xa7/0xd0
> Jul 17 09:23:03 [kernel]  [<c0129422>] run_local_timers+0x12/0x20
> Jul 17 09:23:03 [kernel]  [<c012923e>] update_process_times+0x6e/0xa0
> Jul 17 09:23:03 [kernel]  [<c011127d>] smp_apic_timer_interrupt+0x6d/0x80
> Jul 17 09:23:03 [kernel]  [<c0103942>] apic_timer_interrupt+0x2a/0x30
> Jul 17 09:23:03 [kernel]  [<c022df93>] cbc_process_decrypt+0x93/0xf0

I wonder if we are stuck in a loop here:

	do {
		u8 *tmp_dst = *dst_p;

		fn(tfm, tmp_dst, src);
		xor(tmp_dst, iv);
		memcpy(iv, src, bsize);
		if (tmp_dst != dst)
			memcpy(dst, tmp_dst, bsize);

		src += bsize;
		dst += bsize;
	} while ((done += bsize) <= nbytes);

But unfortunately, this is a worker thread so we don't know exactly what fn is.

> Jul 17 09:23:03 [kernel]  [<c022dcbe>] crypt+0xee/0x1e0
> Jul 17 09:23:03 [kernel]  [<c022ddef>] crypt_iv_unaligned+0x3f/0xc0
> Jul 17 09:23:03 [kernel]  [<c022e23d>] cbc_decrypt_iv+0x3d/0x50
> Jul 17 09:23:03 [kernel]  [<c032f6b7>] crypt_convert_scatterlist+0x117/0x170
> Jul 17 09:23:03 [kernel]  [<c032f8b2>] crypt_convert+0x142/0x190
> Jul 17 09:23:03 [kernel]  [<c032fb82>] kcryptd_do_work+0x42/0x60
> Jul 17 09:23:03 [kernel]  [<c012fcff>] run_workqueue+0x6f/0xe0
> Jul 17 09:23:03 [kernel]  [<c012fe98>] worker_thread+0x128/0x150
> Jul 17 09:23:03 [kernel]  [<c0133364>] kthread+0xa4/0xe0
> Jul 17 09:23:03 [kernel]  [<c01010e5>] kernel_thread_helper+0x5/0x10
> Jul 17 09:24:17 [kernel] =============================================
> Jul 17 09:24:17 [kernel] [ INFO: possible recursive locking detected ]
> Jul 17 09:24:17 [kernel] ---------------------------------------------

This looks like a separate issue, and something more about fixing
lockdep not to report it instead of an actual bug (and why I CC'd the
xfs folks and Ingo).

Probably XFS needs to tell lockdep about it's nesting. But maybe there
is a bug that is lying in there somewhere.

> Jul 17 09:24:17 [kernel] mv/12680 is trying to acquire lock:
> Jul 17 09:24:17 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
> xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel] but task is already holding lock:
> Jul 17 09:24:17 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
> xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel] other info that might help us debug this:
> Jul 17 09:24:17 [kernel] 4 locks held by mv/12680:
> Jul 17 09:24:17 [kernel]  #0:  (&s->s_vfs_rename_mutex){--..}, at: [<c03c2931>]
> mutex_lock+0x21/0x30
> Jul 17 09:24:17 [kernel]  #1:  (&inode->i_mutex/1){--..}, at: [<c017506b>]
> lock_rename+0xbb/0xd0
> Jul 17 09:24:17 [kernel]  #2:  (&inode->i_mutex/2){--..}, at: [<c0175052>]
> lock_rename+0xa2/0xd0
> Jul 17 09:24:17 [kernel]  #3:  (&(&ip->i_lock)->mr_lock){----}, at:
> [<c01f63b0>] xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel] stack backtrace:
> Jul 17 09:24:17 [kernel]  [<c0103cd2>] show_trace+0x12/0x20
> Jul 17 09:24:17 [kernel]  [<c0103de9>] dump_stack+0x19/0x20
> Jul 17 09:24:17 [kernel]  [<c01385a9>] print_deadlock_bug+0xb9/0xd0
> Jul 17 09:24:17 [kernel]  [<c013862b>] check_deadlock+0x6b/0x80
> Jul 17 09:24:17 [kernel]  [<c0139ed4>] __lock_acquire+0x354/0x990
> Jul 17 09:24:17 [kernel]  [<c013ac35>] lock_acquire+0x75/0xa0
> Jul 17 09:24:17 [kernel]  [<c0136aaf>] down_write+0x3f/0x60
> Jul 17 09:24:17 [kernel]  [<c01f63b0>] xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel]  [<c0217981>] xfs_lock_inodes+0xb1/0x120
> Jul 17 09:24:17 [kernel]  [<c020ca7b>] xfs_rename+0x20b/0x8e0
> Jul 17 09:24:17 [kernel]  [<c022351a>] xfs_vn_rename+0x3a/0x90
> Jul 17 09:24:17 [kernel]  [<c017687d>] vfs_rename_dir+0xbd/0xd0
> Jul 17 09:24:17 [kernel]  [<c0176a4c>] vfs_rename+0xdc/0x230
> Jul 17 09:24:17 [kernel]  [<c0176d02>] do_rename+0x162/0x190
> Jul 17 09:24:17 [kernel]  [<c0176d9c>] sys_renameat+0x6c/0x80
> Jul 17 09:24:17 [kernel]  [<c0176dd8>] sys_rename+0x28/0x30
> Jul 17 09:24:17 [kernel]  [<c0102e15>] sysenter_past_esp+0x56/0x8d
> 
> I am not sure if these infos are enough to isolate the problem. If you need any
> further infos just let me know.

Hmm, Ingo, do you have a lockdep set of patches for straight 2.6.17.
Perhaps Jochen can run it there and see if it picks up the lockup that
he is experiencing.

Jochen, you didn't say whether or not the 2.6.18-rc2 locked up. I'm
assuming it did. But did it?

-- Steve


  reply	other threads:[~2006-07-17 14:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-17 12:52 BUG: soft lockup detected on CPU#1! Jochen Heuer
2006-07-17 14:30 ` Steven Rostedt [this message]
2006-07-17 14:48   ` Jochen Heuer
2006-07-21 22:53     ` Jochen Heuer
2006-07-24 13:20       ` Steven Rostedt
2006-07-21 22:58   ` Jochen Heuer
2007-05-02 16:17 brendan powers
2009-02-11  7:16 raksac
2009-02-11  9:21 ` Justin Piszcz
2009-02-11 23:33   ` raksac
2009-02-11 23:36     ` Justin Piszcz
2009-02-12  9:22       ` raksac
2009-02-12 21:55         ` Dave Chinner
2009-02-12 21:59           ` raksac
2009-02-12 22:10             ` Eric Sandeen
2009-02-12 22:16               ` raksac
2009-02-13  4:56                 ` Eric Sandeen
2009-02-19  8:04                   ` raksac
2009-02-13  9:32                 ` Michael Monnerie
2009-02-11 23:34   ` raksac
2009-02-12 21:49 ` Dave Chinner
2009-02-12 21:55   ` raksac
2009-02-11  7:57 Rakesh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1153146608.1218.9.camel@localhost.localdomain \
    --to=rostedt@goodmis.org \
    --cc=jogi-kernel@planetzork.ping.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nathans@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.