All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: soft lockup detected on CPU#1!
@ 2009-02-11  7:57 Rakesh
  0 siblings, 0 replies; 23+ messages in thread
From: Rakesh @ 2009-02-11  7:57 UTC (permalink / raw)
  To: xfs


Hello,

I am running the 2.6.28 based xfs kernel driver on a
custom kernel with following kernel config enabled.

CONFIG_PREEMPT
CONFIG_DETECT_SOFTLOCKUP

Running the following xfsqa causes a soft lockup. The
configuration is a x86 with Hyperthreading, 4GB RAM
and a AHCI connected JBOD. Its 100% reproducible.

Any suggestions/inputs on where to start debugging the
problem would be much appreciated.

#! /bin/sh
# FS QA Test No. 008
#
# randholes test
#

BUG: soft lockup detected on CPU#1!
 [<4013d525>] softlockup_tick+0x9c/0xaf
 [<40123246>] update_process_times+0x3d/0x60
 [<401100ab>] smp_apic_timer_interrupt+0x52/0x58
 [<40103633>] apic_timer_interrupt+0x1f/0x24
 [<402a1557>] _spin_lock_irqsave+0x48/0x61
 [<f8b8fe30>] xfs_iflush_cluster+0x16d/0x31c [xfs]
 [<f8b9018b>] xfs_iflush+0x1ac/0x271 [xfs]
 [<f8ba49a1>] xfs_inode_flush+0xd6/0xfa [xfs]
 [<f8bb13c8>] xfs_fs_write_inode+0x27/0x40 [xfs]
 [<401789d9>] __writeback_single_inode+0x1b0/0x2ff
 [<40101ad5>] __switch_to+0x23/0x1f9
 [<40178f87>] sync_sb_inodes+0x196/0x261
 [<4017920a>] writeback_inodes+0x67/0xb1
 [<401465df>] wb_kupdate+0x7b/0xe0
 [<40146bc3>] pdflush+0x0/0x1b5
 [<40146ce1>] pdflush+0x11e/0x1b5
 [<40146564>] wb_kupdate+0x0/0xe0
 [<4012be6d>] kthread+0xc1/0xec
 [<4012bdac>] kthread+0x0/0xec
 [<401038b3>] kernel_thread_helper+0x7/0x10
 =======================

Thanks,
Rakesh



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-13  4:56                 ` Eric Sandeen
@ 2009-02-19  8:04                   ` raksac
  0 siblings, 0 replies; 23+ messages in thread
From: raksac @ 2009-02-19  8:04 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs


I think I am getting closer now. Can you suggest what
to look for from this oops -


Stack traceback for pid 192
0xad12f030      192       11  1    0   R  0xad12f1d0
*xfsdatad/0
esp        eip        Function (args)
0xaff37eec 0x7815007b map_vm_area+0xc3
0xaff37ef0 0x7814007b find_get_pages_contig+0x3d
0xaff37ef8 0x78156c04 free_block+0x41
0xaff37f20 0x78156d9b cache_flusharray+0x63
0xaff37f3c 0x78156b4e kmem_cache_free+0x52
0xaff37f4c 0x7814296f mempool_free_slab+0xb
0xaff37f50 0x78142954 mempool_free+0x60
0xaff37f60 0x781f429f xfs_destroy_ioend+0x4e
0xaff37f6c 0x781f43d0 xfs_end_bio_read+0x5
0xaff37f70 0x78128498 run_workqueue+0x71
0xaff37f74 0x781f43cb xfs_end_bio_read
0xaff37f8c 0x78128634 worker_thread+0xd9
0xaff37fac 0x781164c0 default_wake_function
0xaff37fc8 0x7812855b worker_thread
0xaff37fcc 0x7812ad7a kthread+0xc1
0xaff37fd8 0x7812acb9 kthread
0xaff37fe4 0x781036df kernel_thread_helper+0x7

Thanks,
Rakesh

--- Eric Sandeen <sandeen@sandeen.net> wrote:

> raksac@yahoo.com wrote:
> > Guys,
> > 
> > Thank you for taking the time to write. Having
> said
> > where I stand and we are kind of on the same page.
> Is
> > there something I can expect which would put me on
> a
> > track of nailing down the problem. It maybe a wild
> > goose chase but something that I can start with
> would
> > be much appreciated.
> 
> Just random debugging thoughts...
> 
> Try stock 2.6.28.4, to see if you have the same
> problem.  If so, and
> esp. if you also see it on 2.6.29, then you'll get a
> lot more attention
> here.  :)
> 
> If not, then it's something with your backport most
> likely.  Figure out
> what you had to backport and see if it's possibly
> causing the error(s).
> 
> If it's locked up, try sysrq-w (echo w >
> /proc/sysrq-trigger) and look
> at dmesg to see if other threads are locked against
> it.  Figure out why.
> 
> On the oops try memory debugging etc, see if you're
> referencing freed
> memory, using corrupt lists, etc.
> 
> Look for other errors in the logs prior to this.
> 
> See if your filesystem is corrupted.
> 
> Bug Red Hat for XFS support, assuming you're
> actually buying RHEL5
> support from them.  :)
> 
> > Unfortunately there is no distro which gets closer
> to
> > where mainline lives today. Reading the changelog
> > there are several problems that I have already
> come
> > across and has convincingly driven me to take on
> this
> > task.
> 
> well certainly there are distros with kernels newer
> than 2.6.18, but it
> depends on your needs & goals I guess.
> 
> Good luck,
> -Eric
> 
> -Eric
> 



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-12 22:16               ` raksac
  2009-02-13  4:56                 ` Eric Sandeen
@ 2009-02-13  9:32                 ` Michael Monnerie
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Monnerie @ 2009-02-13  9:32 UTC (permalink / raw)
  To: xfs

On Donnerstag 12 Februar 2009 raksac@yahoo.com wrote:
> Unfortunately there is no distro which gets closer to
> where mainline lives today.

If I understand correctly, I would say use openSUSE 11.1, which has 
kernel 2.6.27.7-9 as of today (and full XFS support).

And you can still download a newer vanilla kernel and use it on your 
distro. You do not need to use the distro kernel, you just need to 
install kernel updates on your own then.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-12 22:16               ` raksac
@ 2009-02-13  4:56                 ` Eric Sandeen
  2009-02-19  8:04                   ` raksac
  2009-02-13  9:32                 ` Michael Monnerie
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2009-02-13  4:56 UTC (permalink / raw)
  To: raksac; +Cc: xfs

raksac@yahoo.com wrote:
> Guys,
> 
> Thank you for taking the time to write. Having said
> where I stand and we are kind of on the same page. Is
> there something I can expect which would put me on a
> track of nailing down the problem. It maybe a wild
> goose chase but something that I can start with would
> be much appreciated.

Just random debugging thoughts...

Try stock 2.6.28.4, to see if you have the same problem.  If so, and
esp. if you also see it on 2.6.29, then you'll get a lot more attention
here.  :)

If not, then it's something with your backport most likely.  Figure out
what you had to backport and see if it's possibly causing the error(s).

If it's locked up, try sysrq-w (echo w > /proc/sysrq-trigger) and look
at dmesg to see if other threads are locked against it.  Figure out why.

On the oops try memory debugging etc, see if you're referencing freed
memory, using corrupt lists, etc.

Look for other errors in the logs prior to this.

See if your filesystem is corrupted.

Bug Red Hat for XFS support, assuming you're actually buying RHEL5
support from them.  :)

> Unfortunately there is no distro which gets closer to
> where mainline lives today. Reading the changelog
> there are several problems that I have already come
> across and has convincingly driven me to take on this
> task.

well certainly there are distros with kernels newer than 2.6.18, but it
depends on your needs & goals I guess.

Good luck,
-Eric

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-12 22:10             ` Eric Sandeen
@ 2009-02-12 22:16               ` raksac
  2009-02-13  4:56                 ` Eric Sandeen
  2009-02-13  9:32                 ` Michael Monnerie
  0 siblings, 2 replies; 23+ messages in thread
From: raksac @ 2009-02-12 22:16 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs


Guys,

Thank you for taking the time to write. Having said
where I stand and we are kind of on the same page. Is
there something I can expect which would put me on a
track of nailing down the problem. It maybe a wild
goose chase but something that I can start with would
be much appreciated.

Unfortunately there is no distro which gets closer to
where mainline lives today. Reading the changelog
there are several problems that I have already come
across and has convincingly driven me to take on this
task.

Thanks,
Rakesh

--- Eric Sandeen <sandeen@sandeen.net> wrote:

> raksac@yahoo.com wrote:
> > Well the problem is the older kernel XFS driver is
> > buggy to such a large extent that there is data
> loss
> > even for data on rest should a power loss occur.
> > 
> > With a newer version back port I can preserve the
> > kernel version change since it becomes far more
> > reaching to the other kernel components and they
> have
> > to move, to which ..... there is strong
> reservation.
> > 
> > Hope this gives you the perspective.
> 
> It's totally understandable why you might want to do
> it.
> 
> It's also totally understandable why upstream
> developers can't spend a
> lot of time on your custom codebase.
> 
> What you need, of course, is a distribution with
> good support for xfs,
> so you can make it Someone Else's Problem.  :)
> 
> -Eric
> 
> > Thanks,
> > Rakesh 
> 
> 
> 
> 



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-12 21:59           ` raksac
@ 2009-02-12 22:10             ` Eric Sandeen
  2009-02-12 22:16               ` raksac
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2009-02-12 22:10 UTC (permalink / raw)
  To: raksac; +Cc: xfs

raksac@yahoo.com wrote:
> Well the problem is the older kernel XFS driver is
> buggy to such a large extent that there is data loss
> even for data on rest should a power loss occur.
> 
> With a newer version back port I can preserve the
> kernel version change since it becomes far more
> reaching to the other kernel components and they have
> to move, to which ..... there is strong reservation.
> 
> Hope this gives you the perspective.

It's totally understandable why you might want to do it.

It's also totally understandable why upstream developers can't spend a
lot of time on your custom codebase.

What you need, of course, is a distribution with good support for xfs,
so you can make it Someone Else's Problem.  :)

-Eric

> Thanks,
> Rakesh 



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-12 21:55         ` Dave Chinner
@ 2009-02-12 21:59           ` raksac
  2009-02-12 22:10             ` Eric Sandeen
  0 siblings, 1 reply; 23+ messages in thread
From: raksac @ 2009-02-12 21:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


Well the problem is the older kernel XFS driver is
buggy to such a large extent that there is data loss
even for data on rest should a power loss occur.

With a newer version back port I can preserve the
kernel version change since it becomes far more
reaching to the other kernel components and they have
to move, to which ..... there is strong reservation.

Hope this gives you the perspective.

Thanks,
Rakesh 
--- Dave Chinner <david@fromorbit.com> wrote:

> On Thu, Feb 12, 2009 at 01:22:16AM -0800,
> raksac@yahoo.com wrote:
> > 
> > Hi Justin,
> > 
> > Yes it is a 2.6.18 rhel5 based custom kernel but
> the
> > XFS driver is a back port from the 2.6.28.4
> 
> Then you get to keep all the broken bits to
> yourself.  If you want
> to throw random versions of XFS at random versions
> of kernels then
> we can't help you - we don't have the time or
> resources to support
> random backports of XFS to older kernels (and
> non-vanilla kernels
> at that).
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-12 21:49 ` Dave Chinner
@ 2009-02-12 21:55   ` raksac
  0 siblings, 0 replies; 23+ messages in thread
From: raksac @ 2009-02-12 21:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


Hi Dave,

No it does not. Just sits there any access to mount
point (ll) also blocks forever.

I must point out that I brought the changes from
2.6.28.4 xfs into my private space and that seems to
show improvement on test 008 however now test 011 dies
with kernel oops.

Please see my reply posts.

Thanks,
Rakesh
--- Dave Chinner <david@fromorbit.com> wrote:

> On Tue, Feb 10, 2009 at 11:16:25PM -0800,
> raksac@yahoo.com wrote:
> > 
> > Hello,
> > 
> > I am running the 2.6.28 based xfs kernel driver on
> a
> > custom kernel with following kernel config
> enabled.
> > 
> > CONFIG_PREEMPT
> > CONFIG_DETECT_SOFTLOCKUP
> > 
> > Running the following xfsqa causes a soft lockup.
> The
> > configuration is a x86 with Hyperthreading, 4GB
> RAM
> > and a AHCI connected JBOD. Its 100% reproducible.
> 
> Is the system making progress, or has it hung? i.e.
> does the test
> complete?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-12  9:22       ` raksac
@ 2009-02-12 21:55         ` Dave Chinner
  2009-02-12 21:59           ` raksac
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2009-02-12 21:55 UTC (permalink / raw)
  To: raksac; +Cc: xfs

On Thu, Feb 12, 2009 at 01:22:16AM -0800, raksac@yahoo.com wrote:
> 
> Hi Justin,
> 
> Yes it is a 2.6.18 rhel5 based custom kernel but the
> XFS driver is a back port from the 2.6.28.4

Then you get to keep all the broken bits to yourself.  If you want
to throw random versions of XFS at random versions of kernels then
we can't help you - we don't have the time or resources to support
random backports of XFS to older kernels (and non-vanilla kernels
at that).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-11  7:16 raksac
  2009-02-11  9:21 ` Justin Piszcz
@ 2009-02-12 21:49 ` Dave Chinner
  2009-02-12 21:55   ` raksac
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2009-02-12 21:49 UTC (permalink / raw)
  To: raksac; +Cc: xfs

On Tue, Feb 10, 2009 at 11:16:25PM -0800, raksac@yahoo.com wrote:
> 
> Hello,
> 
> I am running the 2.6.28 based xfs kernel driver on a
> custom kernel with following kernel config enabled.
> 
> CONFIG_PREEMPT
> CONFIG_DETECT_SOFTLOCKUP
> 
> Running the following xfsqa causes a soft lockup. The
> configuration is a x86 with Hyperthreading, 4GB RAM
> and a AHCI connected JBOD. Its 100% reproducible.

Is the system making progress, or has it hung? i.e. does the test
complete?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-11 23:36     ` Justin Piszcz
@ 2009-02-12  9:22       ` raksac
  2009-02-12 21:55         ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: raksac @ 2009-02-12  9:22 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs


Hi Justin,

Yes it is a 2.6.18 rhel5 based custom kernel but the
XFS driver is a back port from the 2.6.28.4

Thanks,
Rakesh

--- Justin Piszcz <jpiszcz@lucidpixels.com> wrote:

> 
> On Wed, 11 Feb 2009, raksac@yahoo.com wrote:
> 
> >
> > Hi Justin,
> >
> > Thank you for the pointer. Well as suggested I did
> and
> > now the xfsqa goes upto test 011
> >
> > #! /bin/sh
> > # FS QA Test No. 011
> > #
> > # dirstress
> >
> > but dies with a oops. Any suggestions?
> >
> > Here is the oops trace -
> >
> > BUG: unable to handle kernel NULL pointer
> dereference
> > at virtual address 00000000
> > printing eip:
> > f8bd02c2
> > *pde = e7167067
> > Oops: 0000 [#1]
> > PREEMPT SMP
> > last sysfs file:
> >
>
/devices/pci0000:00/0000:00:1f.3/i2c-0/0-002e/temp1_input
> > Modules linked in: xfs sg sunrpc m24c02 pca9554
> > pca9555 mcp23016 lm85 hwmon_vid i2c_i801 i2c_core
> > midplane uhci_hcd sk98lin tg3 e1000 mv_sata sd_mod
> > ahci libata
> > CPU:    1
> > EIP:    0060:[<f8bd02c2>]    Not tainted VLI
> > EFLAGS: 00010286   (2.6.18.rhel5 #2)
> > EIP is at xfs_iget_core+0x4d6/0x5e9 [xfs]
> 
> Can you show uname -a output?
> 
> EFLAGS: 00010286   (2.6.18.rhel5 #2)
>                      ^^^^^^^^^^^^
> 



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-11 23:33   ` raksac
@ 2009-02-11 23:36     ` Justin Piszcz
  2009-02-12  9:22       ` raksac
  0 siblings, 1 reply; 23+ messages in thread
From: Justin Piszcz @ 2009-02-11 23:36 UTC (permalink / raw)
  To: raksac; +Cc: xfs


On Wed, 11 Feb 2009, raksac@yahoo.com wrote:

>
> Hi Justin,
>
> Thank you for the pointer. Well as suggested I did and
> now the xfsqa goes upto test 011
>
> #! /bin/sh
> # FS QA Test No. 011
> #
> # dirstress
>
> but dies with a oops. Any suggestions?
>
> Here is the oops trace -
>
> BUG: unable to handle kernel NULL pointer dereference
> at virtual address 00000000
> printing eip:
> f8bd02c2
> *pde = e7167067
> Oops: 0000 [#1]
> PREEMPT SMP
> last sysfs file:
> /devices/pci0000:00/0000:00:1f.3/i2c-0/0-002e/temp1_input
> Modules linked in: xfs sg sunrpc m24c02 pca9554
> pca9555 mcp23016 lm85 hwmon_vid i2c_i801 i2c_core
> midplane uhci_hcd sk98lin tg3 e1000 mv_sata sd_mod
> ahci libata
> CPU:    1
> EIP:    0060:[<f8bd02c2>]    Not tainted VLI
> EFLAGS: 00010286   (2.6.18.rhel5 #2)
> EIP is at xfs_iget_core+0x4d6/0x5e9 [xfs]

Can you show uname -a output?

EFLAGS: 00010286   (2.6.18.rhel5 #2)
                     ^^^^^^^^^^^^

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-11  9:21 ` Justin Piszcz
  2009-02-11 23:33   ` raksac
@ 2009-02-11 23:34   ` raksac
  1 sibling, 0 replies; 23+ messages in thread
From: raksac @ 2009-02-11 23:34 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs


With debug enabled it fails with this -

BUG: unable to handle kernel NULL pointer dereference
at virtual address 00000000
 printing eip:
f8bd02c2
*pde = d658e067
Assertion failed: atomic_read(&ip->i_pincount) > 0,
file: fs/xfs/xfs_inode.c, line: 2703
------------[ cut here ]------------
Kernel BUG at [verbose debug info unavailable]
invalid opcode: 0000 [#1]
PREEMPT SMP 
last sysfs file:
/devices/pci0000:00/0000:00:1f.3/i2c-0/0-002e/temp2_input
Modules linked in: sg xfs sunrpc m24c02 pca9554
pca9555 mcp23016 lm85 hwmon_vid i2c_i801 i2c_core
midplane uhci_hcd sk98lin tg3 e1000 mv_sata sd_mod
ahci libata
CPU:    0
EIP:    0060:[<f8bfde19>]    Not tainted VLI
EFLAGS: 00010296   (2.6.18.rhel5 #2) 
EIP is at assfail+0xd/0x13 [xfs]
eax: 0000005c   ebx: e1fb4980   ecx: e4f46000   edx:
00000000
esi: 00000007   edi: e1fb0a58   ebp: 00000007   esp:
e4f47eb8
ds: 007b   es: 007b   ss: 0068
Process xfslogd/0 (pid: 4370, ti=e4f46000
task=75118aa0 task.ti=e4f46000)
Stack: f8c10ccb f8c090ae f8c08d97 00000a8f f8bd0dca
00000526 f8be8e28 00000526 
       00000007 e9db1d80 e861c008 e9db1da0 e861c000
00000003 e9db1d80 e9db1ca4 
       e9db1ca0 00000000 f8be8f56 00000000 00000000
e9db1ca4 ea449880 ea449800 
Call Trace:
 [<f8bd0dca>] xfs_iunpin+0x21/0x49 [xfs]
 [<f8be8e28>] xfs_trans_chunk_committed+0xc3/0xe6
[xfs]
 [<f8be8f56>] xfs_trans_committed+0x38/0xd1 [xfs]
 [<f8bdba5c>] xlog_state_do_callback+0x1b7/0x329 [xfs]
 [<f8bf66b7>] xfs_buf_iodone_work+0x41/0x63 [xfs]
 [<401294c5>] run_workqueue+0x71/0xae
 [<f8bf6676>] xfs_buf_iodone_work+0x0/0x63 [xfs]
 [<40129666>] worker_thread+0xd9/0x10a
 [<40116ca2>] default_wake_function+0x0/0xc
 [<4012958d>] worker_thread+0x0/0x10a
 [<4012be6d>] kthread+0xc1/0xec
 [<4012bdac>] kthread+0x0/0xec
 [<401038b3>] kernel_thread_helper+0x7/0x10
 =======================
Code: d2 df 51 47 89 ea b8 94 59 c2 f8 e8 a9 37 6a 47
83 c4 0c 85 ff 75 02 0f 0b 5b 5e 5f 5d c3 51 52 50 68
cb 0c c1 f8 e8 ab df 51 47 <0f> 0b 83 c4 10 c3 55 57
56 53 83 ec 14 89 44 24 04 89 d7 89 cd 
EIP: [<f8bfde19>] assfail+0xd/0x13 [xfs] SS:ESP
0068:e4f47eb8
 <0>Kernel panic - not syncing: Fatal exception

--- Justin Piszcz <jpiszcz@lucidpixels.com> wrote:

> 
> On Tue, 10 Feb 2009, raksac@yahoo.com wrote:
> 
> >
> > Hello,
> >
> > I am running the 2.6.28 based xfs kernel driver on
> a
> > custom kernel with following kernel config
> enabled.
> >
> > CONFIG_PREEMPT
> > CONFIG_DETECT_SOFTLOCKUP
> >
> > Running the following xfsqa causes a soft lockup.
> The
> > configuration is a x86 with Hyperthreading, 4GB
> RAM
> > and a AHCI connected JBOD. Its 100% reproducible.
> >
> > Any suggestions/inputs on where to start debugging
> the
> > problem would be much appreciated.
> >
> > #! /bin/sh
> > # FS QA Test No. 008
> > #
> > # randholes test
> > #
> >
> > BUG: soft lockup detected on CPU#1!
> > [<4013d525>] softlockup_tick+0x9c/0xaf
> > [<40123246>] update_process_times+0x3d/0x60
> > [<401100ab>] smp_apic_timer_interrupt+0x52/0x58
> > [<40103633>] apic_timer_interrupt+0x1f/0x24
> > [<402a1557>] _spin_lock_irqsave+0x48/0x61
> > [<f8b8fe30>] xfs_iflush_cluster+0x16d/0x31c [xfs]
> > [<f8b9018b>] xfs_iflush+0x1ac/0x271 [xfs]
> > [<f8ba49a1>] xfs_inode_flush+0xd6/0xfa [xfs]
> > [<f8bb13c8>] xfs_fs_write_inode+0x27/0x40 [xfs]
> > [<401789d9>] __writeback_single_inode+0x1b0/0x2ff
> > [<40101ad5>] __switch_to+0x23/0x1f9
> > [<40178f87>] sync_sb_inodes+0x196/0x261
> > [<4017920a>] writeback_inodes+0x67/0xb1
> > [<401465df>] wb_kupdate+0x7b/0xe0
> > [<40146bc3>] pdflush+0x0/0x1b5
> > [<40146ce1>] pdflush+0x11e/0x1b5
> > [<40146564>] wb_kupdate+0x0/0xe0
> > [<4012be6d>] kthread+0xc1/0xec
> > [<4012bdac>] kthread+0x0/0xec
> > [<401038b3>] kernel_thread_helper+0x7/0x10
> > =======================
> >
> > Thanks,
> > Rakesh
> >
> >
> >
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> >
> 
> There were some pretty nasty bugs in 2.6.28 for XFS,
> can you reproduce it on 
> 2.6.28.4?
> 



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-11  9:21 ` Justin Piszcz
@ 2009-02-11 23:33   ` raksac
  2009-02-11 23:36     ` Justin Piszcz
  2009-02-11 23:34   ` raksac
  1 sibling, 1 reply; 23+ messages in thread
From: raksac @ 2009-02-11 23:33 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs


Hi Justin,

Thank you for the pointer. Well as suggested I did and
now the xfsqa goes upto test 011 

#! /bin/sh
# FS QA Test No. 011
#
# dirstress

but dies with a oops. Any suggestions?

Here is the oops trace -

BUG: unable to handle kernel NULL pointer dereference
at virtual address 00000000
 printing eip:
f8bd02c2
*pde = e7167067
Oops: 0000 [#1]
PREEMPT SMP 
last sysfs file:
/devices/pci0000:00/0000:00:1f.3/i2c-0/0-002e/temp1_input
Modules linked in: xfs sg sunrpc m24c02 pca9554
pca9555 mcp23016 lm85 hwmon_vid i2c_i801 i2c_core
midplane uhci_hcd sk98lin tg3 e1000 mv_sata sd_mod
ahci libata
CPU:    1
EIP:    0060:[<f8bd02c2>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18.rhel5 #2) 
EIP is at xfs_iget_core+0x4d6/0x5e9 [xfs]
eax: 00000000   ebx: e8ae7850   ecx: e79b2000   edx:
00000000
esi: e8ae7920   edi: ea44d2d0   ebp: ea44d298   esp:
e79b3ce4
ds: 007b   es: 007b   ss: 0068
Process dirstress (pid: 9927, ti=e79b2000
task=e790faa0 task.ti=e79b2000)
Stack: ea7b0034 e9072c00 ea2a6380 0003b84a e8ae7850
ea2a6380 02c9f600 00000004 
       e9072c00 f8bd0478 1003b84a 00000000 00000001
00000004 e79b3d5c 00000000 
       00000000 ea7b0034 e9072c00 ea7b0034 1003b84a
00000000 f8bebbf1 1003b84a 
Call Trace:
 [<f8bd0478>] xfs_iget+0xa3/0x12f [xfs]
 [<f8bebbf1>] xfs_trans_iget+0x1bd/0x249 [xfs]
 [<f8bd4ebe>] xfs_ialloc+0xb9/0x5a9 [xfs]
 [<f8bdc777>] xlog_grant_push_ail+0x105/0x12c [xfs]
 [<f8bec673>] xfs_dir_ialloc+0x7e/0x26b [xfs]
 [<f8be8c81>] xfs_trans_reserve+0x15c/0x240 [xfs]
 [<f8bf12c8>] xfs_symlink+0x34a/0x862 [xfs]
 [<401a62dd>] avc_has_perm_noaudit+0x38f/0x42d
 [<401a6e9d>] avc_has_perm+0x3b/0x46
 [<f8bf9d8e>] xfs_vn_symlink+0x6a/0xad [xfs]
 [<4016a06d>] vfs_symlink+0xb2/0x11a
 [<4016a149>] sys_symlinkat+0x74/0xab
 [<402a14c2>] _spin_lock+0xd/0x5a
 [<402a1593>] _spin_unlock+0xd/0x21
 [<4016f416>] dput+0x83/0x11c
 [<4015c89d>] __fput+0x152/0x175
 [<4016a18f>] sys_symlink+0xf/0x13
 [<40102b73>] syscall_call+0x7/0xb
 =======================
Code: 8b 40 08 a8 08 74 05 e8 6d ff 6c 47 8b 44 24 04
05 38 01 00 00 e8 24 02 6d 47 8b 44 24 04 8b 98 2c 01
00 00 85 db 74 3d 8b 43 04 <39> 18 74 14 b9 14 01 00
00 ba 20 8a c0 f8 b8 d8 8c c0 f8 e8 32 
EIP: [<f8bd02c2>] xfs_iget_core+0x4d6/0x5e9 [xfs]
SS:ESP 0068:e79b3ce4
 <0>Kernel panic - not syncing: Fatal exception

Thanks,
Rakesh

--- Justin Piszcz <jpiszcz@lucidpixels.com> wrote:

> 
> On Tue, 10 Feb 2009, raksac@yahoo.com wrote:
> 
> >
> > Hello,
> >
> > I am running the 2.6.28 based xfs kernel driver on
> a
> > custom kernel with following kernel config
> enabled.
> >
> > CONFIG_PREEMPT
> > CONFIG_DETECT_SOFTLOCKUP
> >
> > Running the following xfsqa causes a soft lockup.
> The
> > configuration is a x86 with Hyperthreading, 4GB
> RAM
> > and a AHCI connected JBOD. Its 100% reproducible.
> >
> > Any suggestions/inputs on where to start debugging
> the
> > problem would be much appreciated.
> >
> > #! /bin/sh
> > # FS QA Test No. 008
> > #
> > # randholes test
> > #
> >
> > BUG: soft lockup detected on CPU#1!
> > [<4013d525>] softlockup_tick+0x9c/0xaf
> > [<40123246>] update_process_times+0x3d/0x60
> > [<401100ab>] smp_apic_timer_interrupt+0x52/0x58
> > [<40103633>] apic_timer_interrupt+0x1f/0x24
> > [<402a1557>] _spin_lock_irqsave+0x48/0x61
> > [<f8b8fe30>] xfs_iflush_cluster+0x16d/0x31c [xfs]
> > [<f8b9018b>] xfs_iflush+0x1ac/0x271 [xfs]
> > [<f8ba49a1>] xfs_inode_flush+0xd6/0xfa [xfs]
> > [<f8bb13c8>] xfs_fs_write_inode+0x27/0x40 [xfs]
> > [<401789d9>] __writeback_single_inode+0x1b0/0x2ff
> > [<40101ad5>] __switch_to+0x23/0x1f9
> > [<40178f87>] sync_sb_inodes+0x196/0x261
> > [<4017920a>] writeback_inodes+0x67/0xb1
> > [<401465df>] wb_kupdate+0x7b/0xe0
> > [<40146bc3>] pdflush+0x0/0x1b5
> > [<40146ce1>] pdflush+0x11e/0x1b5
> > [<40146564>] wb_kupdate+0x0/0xe0
> > [<4012be6d>] kthread+0xc1/0xec
> > [<4012bdac>] kthread+0x0/0xec
> > [<401038b3>] kernel_thread_helper+0x7/0x10
> > =======================
> >
> > Thanks,
> > Rakesh
> >
> >
> >
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> >
> 
> There were some pretty nasty bugs in 2.6.28 for XFS,
> can you reproduce it on 
> 2.6.28.4?
> 



      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2009-02-11  7:16 raksac
@ 2009-02-11  9:21 ` Justin Piszcz
  2009-02-11 23:33   ` raksac
  2009-02-11 23:34   ` raksac
  2009-02-12 21:49 ` Dave Chinner
  1 sibling, 2 replies; 23+ messages in thread
From: Justin Piszcz @ 2009-02-11  9:21 UTC (permalink / raw)
  To: raksac; +Cc: xfs


On Tue, 10 Feb 2009, raksac@yahoo.com wrote:

>
> Hello,
>
> I am running the 2.6.28 based xfs kernel driver on a
> custom kernel with following kernel config enabled.
>
> CONFIG_PREEMPT
> CONFIG_DETECT_SOFTLOCKUP
>
> Running the following xfsqa causes a soft lockup. The
> configuration is a x86 with Hyperthreading, 4GB RAM
> and a AHCI connected JBOD. Its 100% reproducible.
>
> Any suggestions/inputs on where to start debugging the
> problem would be much appreciated.
>
> #! /bin/sh
> # FS QA Test No. 008
> #
> # randholes test
> #
>
> BUG: soft lockup detected on CPU#1!
> [<4013d525>] softlockup_tick+0x9c/0xaf
> [<40123246>] update_process_times+0x3d/0x60
> [<401100ab>] smp_apic_timer_interrupt+0x52/0x58
> [<40103633>] apic_timer_interrupt+0x1f/0x24
> [<402a1557>] _spin_lock_irqsave+0x48/0x61
> [<f8b8fe30>] xfs_iflush_cluster+0x16d/0x31c [xfs]
> [<f8b9018b>] xfs_iflush+0x1ac/0x271 [xfs]
> [<f8ba49a1>] xfs_inode_flush+0xd6/0xfa [xfs]
> [<f8bb13c8>] xfs_fs_write_inode+0x27/0x40 [xfs]
> [<401789d9>] __writeback_single_inode+0x1b0/0x2ff
> [<40101ad5>] __switch_to+0x23/0x1f9
> [<40178f87>] sync_sb_inodes+0x196/0x261
> [<4017920a>] writeback_inodes+0x67/0xb1
> [<401465df>] wb_kupdate+0x7b/0xe0
> [<40146bc3>] pdflush+0x0/0x1b5
> [<40146ce1>] pdflush+0x11e/0x1b5
> [<40146564>] wb_kupdate+0x0/0xe0
> [<4012be6d>] kthread+0xc1/0xec
> [<4012bdac>] kthread+0x0/0xec
> [<401038b3>] kernel_thread_helper+0x7/0x10
> =======================
>
> Thanks,
> Rakesh
>
>
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

There were some pretty nasty bugs in 2.6.28 for XFS, can you reproduce it on 
2.6.28.4?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* BUG: soft lockup detected on CPU#1!
@ 2009-02-11  7:16 raksac
  2009-02-11  9:21 ` Justin Piszcz
  2009-02-12 21:49 ` Dave Chinner
  0 siblings, 2 replies; 23+ messages in thread
From: raksac @ 2009-02-11  7:16 UTC (permalink / raw)
  To: xfs


Hello,

I am running the 2.6.28 based xfs kernel driver on a
custom kernel with following kernel config enabled.

CONFIG_PREEMPT
CONFIG_DETECT_SOFTLOCKUP

Running the following xfsqa causes a soft lockup. The
configuration is a x86 with Hyperthreading, 4GB RAM
and a AHCI connected JBOD. Its 100% reproducible.

Any suggestions/inputs on where to start debugging the
problem would be much appreciated.

#! /bin/sh
# FS QA Test No. 008
#
# randholes test
#

BUG: soft lockup detected on CPU#1!
 [<4013d525>] softlockup_tick+0x9c/0xaf
 [<40123246>] update_process_times+0x3d/0x60
 [<401100ab>] smp_apic_timer_interrupt+0x52/0x58
 [<40103633>] apic_timer_interrupt+0x1f/0x24
 [<402a1557>] _spin_lock_irqsave+0x48/0x61
 [<f8b8fe30>] xfs_iflush_cluster+0x16d/0x31c [xfs]
 [<f8b9018b>] xfs_iflush+0x1ac/0x271 [xfs]
 [<f8ba49a1>] xfs_inode_flush+0xd6/0xfa [xfs]
 [<f8bb13c8>] xfs_fs_write_inode+0x27/0x40 [xfs]
 [<401789d9>] __writeback_single_inode+0x1b0/0x2ff
 [<40101ad5>] __switch_to+0x23/0x1f9
 [<40178f87>] sync_sb_inodes+0x196/0x261
 [<4017920a>] writeback_inodes+0x67/0xb1
 [<401465df>] wb_kupdate+0x7b/0xe0
 [<40146bc3>] pdflush+0x0/0x1b5
 [<40146ce1>] pdflush+0x11e/0x1b5
 [<40146564>] wb_kupdate+0x0/0xe0
 [<4012be6d>] kthread+0xc1/0xec
 [<4012bdac>] kthread+0x0/0xec
 [<401038b3>] kernel_thread_helper+0x7/0x10
 =======================

Thanks,
Rakesh


      

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* BUG: soft lockup detected on CPU#1!
@ 2007-05-02 16:17 brendan powers
  0 siblings, 0 replies; 23+ messages in thread
From: brendan powers @ 2007-05-02 16:17 UTC (permalink / raw)
  To: linux-kernel

Hello, i'm running debin sarge(3.1) with kernel 2.6.16.7 and came
across this kernel oops. It locked up shortly afterwords. Its a
terminal server so there is a lot of different things going on so i'm
not sure exactly what caused this to happen. Anyone have any ideas?

Here is the log of what happened.

CIFS VFS: Error 0xfffffff3 on cifs_get_inode_info in lookup of \.directory
smbfs: Unrecognized mount option domain
BUG: soft lockup detected on CPU#1!

Pid: 8657, comm:             kio_file
EIP: 0060:[get_offset_pmtmr+22/3661] CPU: 1
EIP is at get_offset_pmtmr+0x16/0xe4d
 EFLAGS: 00000246    Not tainted  (2.6.16.7.resara-opteron #1)
EAX: 00906422 EBX: d4d49de8 ECX: 00906416 EDX: 00001008
ESI: 00905639 EDI: 0090641c EBP: 0000000a DS: 007b ES: 007b
CR0: 8005003b CR2: 091d9000 CR3: 36e1dac0 CR4: 000006f0
 [do_gettimeofday+28/164] do_gettimeofday+0x1c/0xa4
 [getnstimeofday+15/39] getnstimeofday+0xf/0x27
 [ktime_get_ts+24/81] ktime_get_ts+0x18/0x51
 [ktime_get+16/58] ktime_get+0x10/0x3a
 [hrtimer_run_queues+45/225] hrtimer_run_queues+0x2d/0xe1
 [run_timer_softirq+34/387] run_timer_softirq+0x22/0x183
 [__do_softirq+91/196] __do_softirq+0x5b/0xc4
 [do_softirq+45/49] do_softirq+0x2d/0x31
 [apic_timer_interrupt+28/36] apic_timer_interrupt+0x1c/0x24
 [generic_fillattr+117/157] generic_fillattr+0x75/0x9d
 [pg0+948724075/1069974528] cifs_getattr+0x1f/0x26 [cifs]
 [vfs_getattr+65/150] vfs_getattr+0x41/0x96
 [vfs_stat_fd+50/69] vfs_stat_fd+0x32/0x45
 [current_fs_time+72/95] current_fs_time+0x48/0x5f
 [dput+27/281] dput+0x1b/0x119
 [mntput_no_expire+20/113] mntput_no_expire+0x14/0x71
 [vfs_stat+15/19] vfs_stat+0xf/0x13
 [sys_stat64+16/39] sys_stat64+0x10/0x27
 [sys_readlink+19/23] sys_readlink+0x13/0x17
 [syscall_call+7/11] syscall_call+0x7/0xb
BUG: soft lockup detected on CPU#0!

Pid: 8694, comm:             kio_file
EIP: 0060:[generic_fillattr+114/157] CPU: 0
EIP is at generic_fillattr+0x72/0x9d
 EFLAGS: 00000202    Not tainted  (2.6.16.7.resara-opteron #1)
EAX: 0000002d EBX: defc7f64 ECX: c481fb04 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: e8afc740 DS: 007b ES: 007b
CR0: 8005003b CR2: 091d6a8c CR3: 1a457380 CR4: 000006f0
 [pg0+948724075/1069974528] cifs_getattr+0x1f/0x26 [cifs]
 [vfs_getattr+65/150] vfs_getattr+0x41/0x96
 [vfs_stat_fd+50/69] vfs_stat_fd+0x32/0x45
 [__mark_inode_dirty+38/339] __mark_inode_dirty+0x26/0x153
 [dput+27/281] dput+0x1b/0x119
 [mntput_no_expire+20/113] mntput_no_expire+0x14/0x71
 [vfs_stat+15/19] vfs_stat+0xf/0x13
 [sys_stat64+16/39] sys_stat64+0x10/0x27
 [sys_readlink+19/23] sys_readlink+0x13/0x17
 [syscall_call+7/11] syscall_call+0x7/0xb
 CIFS VFS: Send error in read = -13
May  2 09:44:09 localhost last message repeated 9 times
BUG: soft lockup detected on CPU#3!

Pid: 8684, comm:            konqueror
EIP: 0060:[generic_fillattr+117/157] CPU: 3
EIP is at generic_fillattr+0x75/0x9d
 EFLAGS: 00000202    Not tainted  (2.6.16.7.resara-opteron #1)
EAX: 0000002f EBX: d17d5f64 ECX: c481fb04 EDX: 00000001
ESI: 00000000 EDI: 00000000 EBP: e8afc740 DS: 007b ES: 007b
CR0: 8005003b CR2: ab5f7000 CR3: 1a457840 CR4: 000006f0
 [pg0+948724075/1069974528] cifs_getattr+0x1f/0x26 [cifs]
 [vfs_getattr+65/150] vfs_getattr+0x41/0x96
 [vfs_stat_fd+50/69] vfs_stat_fd+0x32/0x45
 [vfs_stat+15/19] vfs_stat+0xf/0x13
 [sys_stat64+16/39] sys_stat64+0x10/0x27
 [slab_destroy+56/91] slab_destroy+0x38/0x5b
 [syscall_call+7/11] syscall_call+0x7/0xb
 CIFS VFS: Send error in read = -13

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2006-07-21 22:53     ` Jochen Heuer
@ 2006-07-24 13:20       ` Steven Rostedt
  0 siblings, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2006-07-24 13:20 UTC (permalink / raw)
  To: Jochen Heuer; +Cc: linux-kernel, Ingo Molnar, nathans, xfs

On Sat, 2006-07-22 at 00:53 +0200, Jochen Heuer wrote:

> 
> Is there anything I can test? Disable irq balancing? Disabling preemption did
> not help. Disabling IO-APIC? What can I do to help isolate the problem because
> it really is annoying and I don't like pushing the reset button. Because if the
> system locks up *really* nothing works. The screen is frozen, no mouse, no
> keyboard, no sys-rq, no network ... nothing.

Jochen, have you tried to enable NMI?  Make sure you have Local APIC
enabled (you should since it's SMP), and on your kernel command line (in
Grub) add "lapic nmi_watchdog=2". (lapic isn't really needed, but I
always add it so I don't forget to when working on UP machines).

Run it again, and if it locks up hard, which probably means its spinning
somewhere with interrupts disabled, the NMI will trigger and should give
you another dump of where it's locked up.

-- Steve



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2006-07-17 14:30 ` Steven Rostedt
  2006-07-17 14:48   ` Jochen Heuer
@ 2006-07-21 22:58   ` Jochen Heuer
  1 sibling, 0 replies; 23+ messages in thread
From: Jochen Heuer @ 2006-07-21 22:58 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, nathans, xfs

Hi everyone.

> > Jul 17 09:23:03 [kernel]  [<c022dcbe>] crypt+0xee/0x1e0
> > Jul 17 09:23:03 [kernel]  [<c022ddef>] crypt_iv_unaligned+0x3f/0xc0
> > Jul 17 09:23:03 [kernel]  [<c022e23d>] cbc_decrypt_iv+0x3d/0x50
> > Jul 17 09:23:03 [kernel]  [<c032f6b7>] crypt_convert_scatterlist+0x117/0x170
> > Jul 17 09:23:03 [kernel]  [<c032f8b2>] crypt_convert+0x142/0x190
> > Jul 17 09:23:03 [kernel]  [<c032fb82>] kcryptd_do_work+0x42/0x60
> > Jul 17 09:23:03 [kernel]  [<c012fcff>] run_workqueue+0x6f/0xe0
> > Jul 17 09:23:03 [kernel]  [<c012fe98>] worker_thread+0x128/0x150
> > Jul 17 09:23:03 [kernel]  [<c0133364>] kthread+0xa4/0xe0
> > Jul 17 09:23:03 [kernel]  [<c01010e5>] kernel_thread_helper+0x5/0x10
> > Jul 17 09:24:17 [kernel] =============================================
> > Jul 17 09:24:17 [kernel] [ INFO: possible recursive locking detected ]
> > Jul 17 09:24:17 [kernel] ---------------------------------------------
> 
> This looks like a separate issue, and something more about fixing
> lockdep not to report it instead of an actual bug (and why I CC'd the
> xfs folks and Ingo).
> 
> Probably XFS needs to tell lockdep about it's nesting. But maybe there
> is a bug that is lying in there somewhere.

I have some more of these. Now they look like this every time I get them:

Jul 19 18:43:15 [kernel] =============================================
Jul 19 18:43:15 [kernel] [ INFO: possible recursive locking detected ]
Jul 19 18:43:15 [kernel] ---------------------------------------------
Jul 19 18:43:15 [kernel] qmail-local/9368 is trying to acquire lock:
Jul 19 18:43:15 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
xfs_ilock+0x60/0xb0
Jul 19 18:43:15 [kernel] but task is already holding lock:
Jul 19 18:43:15 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
xfs_ilock+0x60/0xb0
Jul 19 18:43:15 [kernel] other info that might help us debug this:
Jul 19 18:43:15 [kernel] 2 locks held by qmail-local/9368:
Jul 19 18:43:15 [kernel]  #0:  (&inode->i_mutex){--..}, at: [<c03c2931>]
mutex_lock+0x21/0x30
Jul 19 18:43:15 [kernel]  #1:  (&(&ip->i_lock)->mr_lock){----}, at:
[<c01f63b0>] xfs_ilock+0x60/0xb0
Jul 19 18:43:15 [kernel] stack backtrace:
Jul 19 18:43:15 [kernel]  [<c0103cd2>] show_trace+0x12/0x20
Jul 19 18:43:15 [kernel]  [<c0103de9>] dump_stack+0x19/0x20
Jul 19 18:43:15 [kernel]  [<c01385a9>] print_deadlock_bug+0xb9/0xd0
Jul 19 18:43:15 [kernel]  [<c013862b>] check_deadlock+0x6b/0x80
Jul 19 18:43:15 [kernel]  [<c0139ed4>] __lock_acquire+0x354/0x990
Jul 19 18:43:15 [kernel]  [<c013ac35>] lock_acquire+0x75/0xa0
Jul 19 18:43:15 [kernel]  [<c0136aaf>] down_write+0x3f/0x60
Jul 19 18:43:15 [kernel]  [<c01f63b0>] xfs_ilock+0x60/0xb0
Jul 19 18:43:15 [kernel]  [<c01f5b3a>] xfs_iget_core+0x2aa/0x5b0
Jul 19 18:43:15 [kernel]  [<c01f5f0c>] xfs_iget+0xcc/0x150
Jul 19 18:43:15 [kernel]  [<c0210b38>] xfs_trans_iget+0xa8/0x140
Jul 19 18:43:15 [kernel]  [<c01f80af>] xfs_ialloc+0xaf/0x4c0
Jul 19 18:43:15 [kernel]  [<c021159d>] xfs_dir_ialloc+0x6d/0x280
Jul 19 18:43:15 [kernel]  [<c0217381>] xfs_create+0x241/0x670
Jul 19 18:43:15 [kernel]  [<c022307d>] xfs_vn_mknod+0x1ed/0x2e0
Jul 19 18:43:15 [kernel]  [<c0223182>] xfs_vn_create+0x12/0x20
Jul 19 18:43:15 [kernel]  [<c017514d>] vfs_create+0x7d/0xd0
Jul 19 18:43:15 [kernel]  [<c017542f>] open_namei+0xbf/0x620
Jul 19 18:43:15 [kernel]  [<c016487c>] do_filp_open+0x2c/0x60
Jul 19 18:43:15 [kernel]  [<c0164c00>] do_sys_open+0x50/0xe0
Jul 19 18:43:15 [kernel]  [<c0164cac>] sys_open+0x1c/0x20
Jul 19 18:43:15 [kernel]  [<c0102e15>] sysenter_past_esp+0x56/0x8d

Best regards,

   Jochen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2006-07-17 14:48   ` Jochen Heuer
@ 2006-07-21 22:53     ` Jochen Heuer
  2006-07-24 13:20       ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Jochen Heuer @ 2006-07-21 22:53 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, nathans, xfs

On Mon, Jul 17, 2006 at 04:48:31PM +0200, Jochen Heuer wrote:
> On Mon, Jul 17, 2006 at 10:30:08AM -0400, Steven Rostedt wrote:
> > 
> > Jochen, you didn't say whether or not the 2.6.18-rc2 locked up. I'm
> > assuming it did. But did it?
> 
> Hi Steven,
> 
> no, it did not lock up yet but I did not do any "serious" webbrowsing
> with 2.6.18-rc2 so far.

Hi,

well, it locks up with 2.6.18-rc2 too. Three times today. And always during
webbrowsing ... What's so special about it? Could it be because it accesses
network and disk at the same time?

The system has been pretty busy compiling the last days because I did setup a
new Gentoo system in chroot environment. No problem whatsoever. I did run
2 x mprime for a day and also no problem.

Is there anything I can test? Disable irq balancing? Disabling preemption did
not help. Disabling IO-APIC? What can I do to help isolate the problem because
it really is annoying and I don't like pushing the reset button. Because if the
system locks up *really* nothing works. The screen is frozen, no mouse, no
keyboard, no sys-rq, no network ... nothing.

Thanks for your help and best regards,

   Jochen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2006-07-17 14:30 ` Steven Rostedt
@ 2006-07-17 14:48   ` Jochen Heuer
  2006-07-21 22:53     ` Jochen Heuer
  2006-07-21 22:58   ` Jochen Heuer
  1 sibling, 1 reply; 23+ messages in thread
From: Jochen Heuer @ 2006-07-17 14:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, nathans, xfs

On Mon, Jul 17, 2006 at 10:30:08AM -0400, Steven Rostedt wrote:
> 
> Jochen, you didn't say whether or not the 2.6.18-rc2 locked up. I'm
> assuming it did. But did it?

Hi Steven,

no, it did not lock up yet but I did not do any "serious" webbrowsing
with 2.6.18-rc2 so far.

Best regards,

   Jochen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: BUG: soft lockup detected on CPU#1!
  2006-07-17 12:52 Jochen Heuer
@ 2006-07-17 14:30 ` Steven Rostedt
  2006-07-17 14:48   ` Jochen Heuer
  2006-07-21 22:58   ` Jochen Heuer
  0 siblings, 2 replies; 23+ messages in thread
From: Steven Rostedt @ 2006-07-17 14:30 UTC (permalink / raw)
  To: Jochen Heuer; +Cc: linux-kernel, Ingo Molnar, nathans, xfs

On Mon, 2006-07-17 at 14:52 +0200, Jochen Heuer wrote:
> Hi,
> 
> I have been running 2.6.17 on my desktop system (Asus A8V + Athlon64 X2 3800)
> and I am having severe problems with lookups. These only show up when surfing
> the net. During compiling or mprime runs --> absolutly no problem.
> 
> At first I thought this was related to the S-ATA driver since I got error
> messages like these on the console once before it locked up hard (no sysrq!):
> 
> ata1: command 0xca timeout, stat 0x50 host_stat 0x4
> ata1: status=0x50 { DriveReady SeekComplete }
> ata1: command 0xea timeout, stat 0x50 host_stat 0x0
> ata1: status=0x50 { DriveReady SeekComplete }
> 
> But switching to an IDE drive did not fix the lockups. So I switched to
> 2.6.18-rc2 and today I got the following reported via dmesg:
> 
> Jul 17 09:23:03 [kernel] BUG: soft lockup detected on CPU#1!
> Jul 17 09:23:03 [kernel]  [<c0103cd2>] show_trace+0x12/0x20
> Jul 17 09:23:03 [kernel]  [<c0103de9>] dump_stack+0x19/0x20
> Jul 17 09:23:03 [kernel]  [<c0143e77>] softlockup_tick+0xa7/0xd0
> Jul 17 09:23:03 [kernel]  [<c0129422>] run_local_timers+0x12/0x20
> Jul 17 09:23:03 [kernel]  [<c012923e>] update_process_times+0x6e/0xa0
> Jul 17 09:23:03 [kernel]  [<c011127d>] smp_apic_timer_interrupt+0x6d/0x80
> Jul 17 09:23:03 [kernel]  [<c0103942>] apic_timer_interrupt+0x2a/0x30
> Jul 17 09:23:03 [kernel]  [<c022df93>] cbc_process_decrypt+0x93/0xf0

I wonder if we are stuck in a loop here:

	do {
		u8 *tmp_dst = *dst_p;

		fn(tfm, tmp_dst, src);
		xor(tmp_dst, iv);
		memcpy(iv, src, bsize);
		if (tmp_dst != dst)
			memcpy(dst, tmp_dst, bsize);

		src += bsize;
		dst += bsize;
	} while ((done += bsize) <= nbytes);

But unfortunately, this is a worker thread so we don't know exactly what fn is.

> Jul 17 09:23:03 [kernel]  [<c022dcbe>] crypt+0xee/0x1e0
> Jul 17 09:23:03 [kernel]  [<c022ddef>] crypt_iv_unaligned+0x3f/0xc0
> Jul 17 09:23:03 [kernel]  [<c022e23d>] cbc_decrypt_iv+0x3d/0x50
> Jul 17 09:23:03 [kernel]  [<c032f6b7>] crypt_convert_scatterlist+0x117/0x170
> Jul 17 09:23:03 [kernel]  [<c032f8b2>] crypt_convert+0x142/0x190
> Jul 17 09:23:03 [kernel]  [<c032fb82>] kcryptd_do_work+0x42/0x60
> Jul 17 09:23:03 [kernel]  [<c012fcff>] run_workqueue+0x6f/0xe0
> Jul 17 09:23:03 [kernel]  [<c012fe98>] worker_thread+0x128/0x150
> Jul 17 09:23:03 [kernel]  [<c0133364>] kthread+0xa4/0xe0
> Jul 17 09:23:03 [kernel]  [<c01010e5>] kernel_thread_helper+0x5/0x10
> Jul 17 09:24:17 [kernel] =============================================
> Jul 17 09:24:17 [kernel] [ INFO: possible recursive locking detected ]
> Jul 17 09:24:17 [kernel] ---------------------------------------------

This looks like a separate issue, and something more about fixing
lockdep not to report it instead of an actual bug (and why I CC'd the
xfs folks and Ingo).

Probably XFS needs to tell lockdep about it's nesting. But maybe there
is a bug that is lying in there somewhere.

> Jul 17 09:24:17 [kernel] mv/12680 is trying to acquire lock:
> Jul 17 09:24:17 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
> xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel] but task is already holding lock:
> Jul 17 09:24:17 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
> xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel] other info that might help us debug this:
> Jul 17 09:24:17 [kernel] 4 locks held by mv/12680:
> Jul 17 09:24:17 [kernel]  #0:  (&s->s_vfs_rename_mutex){--..}, at: [<c03c2931>]
> mutex_lock+0x21/0x30
> Jul 17 09:24:17 [kernel]  #1:  (&inode->i_mutex/1){--..}, at: [<c017506b>]
> lock_rename+0xbb/0xd0
> Jul 17 09:24:17 [kernel]  #2:  (&inode->i_mutex/2){--..}, at: [<c0175052>]
> lock_rename+0xa2/0xd0
> Jul 17 09:24:17 [kernel]  #3:  (&(&ip->i_lock)->mr_lock){----}, at:
> [<c01f63b0>] xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel] stack backtrace:
> Jul 17 09:24:17 [kernel]  [<c0103cd2>] show_trace+0x12/0x20
> Jul 17 09:24:17 [kernel]  [<c0103de9>] dump_stack+0x19/0x20
> Jul 17 09:24:17 [kernel]  [<c01385a9>] print_deadlock_bug+0xb9/0xd0
> Jul 17 09:24:17 [kernel]  [<c013862b>] check_deadlock+0x6b/0x80
> Jul 17 09:24:17 [kernel]  [<c0139ed4>] __lock_acquire+0x354/0x990
> Jul 17 09:24:17 [kernel]  [<c013ac35>] lock_acquire+0x75/0xa0
> Jul 17 09:24:17 [kernel]  [<c0136aaf>] down_write+0x3f/0x60
> Jul 17 09:24:17 [kernel]  [<c01f63b0>] xfs_ilock+0x60/0xb0
> Jul 17 09:24:17 [kernel]  [<c0217981>] xfs_lock_inodes+0xb1/0x120
> Jul 17 09:24:17 [kernel]  [<c020ca7b>] xfs_rename+0x20b/0x8e0
> Jul 17 09:24:17 [kernel]  [<c022351a>] xfs_vn_rename+0x3a/0x90
> Jul 17 09:24:17 [kernel]  [<c017687d>] vfs_rename_dir+0xbd/0xd0
> Jul 17 09:24:17 [kernel]  [<c0176a4c>] vfs_rename+0xdc/0x230
> Jul 17 09:24:17 [kernel]  [<c0176d02>] do_rename+0x162/0x190
> Jul 17 09:24:17 [kernel]  [<c0176d9c>] sys_renameat+0x6c/0x80
> Jul 17 09:24:17 [kernel]  [<c0176dd8>] sys_rename+0x28/0x30
> Jul 17 09:24:17 [kernel]  [<c0102e15>] sysenter_past_esp+0x56/0x8d
> 
> I am not sure if these infos are enough to isolate the problem. If you need any
> further infos just let me know.

Hmm, Ingo, do you have a lockdep set of patches for straight 2.6.17.
Perhaps Jochen can run it there and see if it picks up the lockup that
he is experiencing.

Jochen, you didn't say whether or not the 2.6.18-rc2 locked up. I'm
assuming it did. But did it?

-- Steve


^ permalink raw reply	[flat|nested] 23+ messages in thread

* BUG: soft lockup detected on CPU#1!
@ 2006-07-17 12:52 Jochen Heuer
  2006-07-17 14:30 ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Jochen Heuer @ 2006-07-17 12:52 UTC (permalink / raw)
  To: linux-kernel

Hi,

I have been running 2.6.17 on my desktop system (Asus A8V + Athlon64 X2 3800)
and I am having severe problems with lookups. These only show up when surfing
the net. During compiling or mprime runs --> absolutly no problem.

At first I thought this was related to the S-ATA driver since I got error
messages like these on the console once before it locked up hard (no sysrq!):

ata1: command 0xca timeout, stat 0x50 host_stat 0x4
ata1: status=0x50 { DriveReady SeekComplete }
ata1: command 0xea timeout, stat 0x50 host_stat 0x0
ata1: status=0x50 { DriveReady SeekComplete }

But switching to an IDE drive did not fix the lockups. So I switched to
2.6.18-rc2 and today I got the following reported via dmesg:

Jul 17 09:23:03 [kernel] BUG: soft lockup detected on CPU#1!
Jul 17 09:23:03 [kernel]  [<c0103cd2>] show_trace+0x12/0x20
Jul 17 09:23:03 [kernel]  [<c0103de9>] dump_stack+0x19/0x20
Jul 17 09:23:03 [kernel]  [<c0143e77>] softlockup_tick+0xa7/0xd0
Jul 17 09:23:03 [kernel]  [<c0129422>] run_local_timers+0x12/0x20
Jul 17 09:23:03 [kernel]  [<c012923e>] update_process_times+0x6e/0xa0
Jul 17 09:23:03 [kernel]  [<c011127d>] smp_apic_timer_interrupt+0x6d/0x80
Jul 17 09:23:03 [kernel]  [<c0103942>] apic_timer_interrupt+0x2a/0x30
Jul 17 09:23:03 [kernel]  [<c022df93>] cbc_process_decrypt+0x93/0xf0
Jul 17 09:23:03 [kernel]  [<c022dcbe>] crypt+0xee/0x1e0
Jul 17 09:23:03 [kernel]  [<c022ddef>] crypt_iv_unaligned+0x3f/0xc0
Jul 17 09:23:03 [kernel]  [<c022e23d>] cbc_decrypt_iv+0x3d/0x50
Jul 17 09:23:03 [kernel]  [<c032f6b7>] crypt_convert_scatterlist+0x117/0x170
Jul 17 09:23:03 [kernel]  [<c032f8b2>] crypt_convert+0x142/0x190
Jul 17 09:23:03 [kernel]  [<c032fb82>] kcryptd_do_work+0x42/0x60
Jul 17 09:23:03 [kernel]  [<c012fcff>] run_workqueue+0x6f/0xe0
Jul 17 09:23:03 [kernel]  [<c012fe98>] worker_thread+0x128/0x150
Jul 17 09:23:03 [kernel]  [<c0133364>] kthread+0xa4/0xe0
Jul 17 09:23:03 [kernel]  [<c01010e5>] kernel_thread_helper+0x5/0x10
Jul 17 09:24:17 [kernel] =============================================
Jul 17 09:24:17 [kernel] [ INFO: possible recursive locking detected ]
Jul 17 09:24:17 [kernel] ---------------------------------------------
Jul 17 09:24:17 [kernel] mv/12680 is trying to acquire lock:
Jul 17 09:24:17 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
xfs_ilock+0x60/0xb0
Jul 17 09:24:17 [kernel] but task is already holding lock:
Jul 17 09:24:17 [kernel]  (&(&ip->i_lock)->mr_lock){----}, at: [<c01f63b0>]
xfs_ilock+0x60/0xb0
Jul 17 09:24:17 [kernel] other info that might help us debug this:
Jul 17 09:24:17 [kernel] 4 locks held by mv/12680:
Jul 17 09:24:17 [kernel]  #0:  (&s->s_vfs_rename_mutex){--..}, at: [<c03c2931>]
mutex_lock+0x21/0x30
Jul 17 09:24:17 [kernel]  #1:  (&inode->i_mutex/1){--..}, at: [<c017506b>]
lock_rename+0xbb/0xd0
Jul 17 09:24:17 [kernel]  #2:  (&inode->i_mutex/2){--..}, at: [<c0175052>]
lock_rename+0xa2/0xd0
Jul 17 09:24:17 [kernel]  #3:  (&(&ip->i_lock)->mr_lock){----}, at:
[<c01f63b0>] xfs_ilock+0x60/0xb0
Jul 17 09:24:17 [kernel] stack backtrace:
Jul 17 09:24:17 [kernel]  [<c0103cd2>] show_trace+0x12/0x20
Jul 17 09:24:17 [kernel]  [<c0103de9>] dump_stack+0x19/0x20
Jul 17 09:24:17 [kernel]  [<c01385a9>] print_deadlock_bug+0xb9/0xd0
Jul 17 09:24:17 [kernel]  [<c013862b>] check_deadlock+0x6b/0x80
Jul 17 09:24:17 [kernel]  [<c0139ed4>] __lock_acquire+0x354/0x990
Jul 17 09:24:17 [kernel]  [<c013ac35>] lock_acquire+0x75/0xa0
Jul 17 09:24:17 [kernel]  [<c0136aaf>] down_write+0x3f/0x60
Jul 17 09:24:17 [kernel]  [<c01f63b0>] xfs_ilock+0x60/0xb0
Jul 17 09:24:17 [kernel]  [<c0217981>] xfs_lock_inodes+0xb1/0x120
Jul 17 09:24:17 [kernel]  [<c020ca7b>] xfs_rename+0x20b/0x8e0
Jul 17 09:24:17 [kernel]  [<c022351a>] xfs_vn_rename+0x3a/0x90
Jul 17 09:24:17 [kernel]  [<c017687d>] vfs_rename_dir+0xbd/0xd0
Jul 17 09:24:17 [kernel]  [<c0176a4c>] vfs_rename+0xdc/0x230
Jul 17 09:24:17 [kernel]  [<c0176d02>] do_rename+0x162/0x190
Jul 17 09:24:17 [kernel]  [<c0176d9c>] sys_renameat+0x6c/0x80
Jul 17 09:24:17 [kernel]  [<c0176dd8>] sys_rename+0x28/0x30
Jul 17 09:24:17 [kernel]  [<c0102e15>] sysenter_past_esp+0x56/0x8d

I am not sure if these infos are enough to isolate the problem. If you need any
further infos just let me know.

Best regards,

   Jogi

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-02-19  8:05 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-11  7:57 BUG: soft lockup detected on CPU#1! Rakesh
  -- strict thread matches above, loose matches on Subject: below --
2009-02-11  7:16 raksac
2009-02-11  9:21 ` Justin Piszcz
2009-02-11 23:33   ` raksac
2009-02-11 23:36     ` Justin Piszcz
2009-02-12  9:22       ` raksac
2009-02-12 21:55         ` Dave Chinner
2009-02-12 21:59           ` raksac
2009-02-12 22:10             ` Eric Sandeen
2009-02-12 22:16               ` raksac
2009-02-13  4:56                 ` Eric Sandeen
2009-02-19  8:04                   ` raksac
2009-02-13  9:32                 ` Michael Monnerie
2009-02-11 23:34   ` raksac
2009-02-12 21:49 ` Dave Chinner
2009-02-12 21:55   ` raksac
2007-05-02 16:17 brendan powers
2006-07-17 12:52 Jochen Heuer
2006-07-17 14:30 ` Steven Rostedt
2006-07-17 14:48   ` Jochen Heuer
2006-07-21 22:53     ` Jochen Heuer
2006-07-24 13:20       ` Steven Rostedt
2006-07-21 22:58   ` Jochen Heuer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.