All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
@ 2012-02-29 18:20 Rafał Kupka
  2012-03-02 10:43 ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Rafał Kupka @ 2012-02-29 18:20 UTC (permalink / raw)
  To: xfs

Hello,

Spotted on two different machines (one is a KVM guest) during xfs_dump
and some filesystem load.

I can reproduce it by doing
while : ; do xfsdump -f /dev/null -l0 -p1 -J -L "test" -M "test" /; done
simultaneously with some postmark benchmarks (any read/write load would
trigger it).
 
Bad kernels:
deb 3.2.4-1~bpo60+1 Debian/squeeze with backports
linus git tree - 164974a8f2a482f1abcb027c6d1a89dd79b14297 (post v3.3-rc5)

Good kernel:
2.6.32 (deb ver 2.6.32-41) Debian/squeeze (both amd64/i686)
linus git tree - v2.6.38

Backtrace (from kernel 164974a8f2a482f1abcb027c6d1a89dd79b14297):
Feb 29 17:01:37 goscie kernel: [  265.409709] XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
Feb 29 17:01:37 goscie kernel: [  265.410412] ------------[ cut here ]------------
Feb 29 17:01:37 goscie kernel: [  265.410749] kernel BUG at fs/xfs/xfs_message.c:101!
Feb 29 17:01:37 goscie kernel: [  265.411124] invalid opcode: 0000 [#1] SMP 
Feb 29 17:01:37 goscie kernel: [  265.411535] CPU 0 
Feb 29 17:01:37 goscie kernel: [  265.411625] Modules linked in: loop processor i2c_piix4 button i2c_core psmouse serio_raw tpm_tis tpm tpm_bios evdev pcspkr thermal_sys xfs sg ata_generic ata_piix libata scsi_mod virtio_net [last unloaded: scsi_wait_scan]
Feb 29 17:01:37 goscie kernel: [  265.412034] 
Feb 29 17:01:37 goscie kernel: [  265.412034] Pid: 1465, comm: xfsdump Not tainted 3.3.0-rc5xfsdump1+ #1 Bochs Bochs
Feb 29 17:01:37 goscie kernel: [  265.412034] RIP: 0010:[<ffffffffa00913aa>]  [<ffffffffa00913aa>] assfail+0x1d/0x23 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034] RSP: 0018:ffff880009ee5938  EFLAGS: 00010296
Feb 29 17:01:37 goscie kernel: [  265.412034] RAX: 000000000000006e RBX: ffff8800063ec980 RCX: 00000000099b099b
Feb 29 17:01:37 goscie kernel: [  265.412034] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
Feb 29 17:01:37 goscie kernel: [  265.412034] RBP: 0000000000028001 R08: 0000000000000000 R09: ffff88001780d000
Feb 29 17:01:37 goscie kernel: [  265.412034] R10: 0000000000000000 R11: ffff88001780d000 R12: 0000000000000000
Feb 29 17:01:37 goscie kernel: [  265.412034] R13: ffffffffa00dff34 R14: ffff880009ee5a10 R15: ffff88001202b980
Feb 29 17:01:37 goscie kernel: [  265.412034] FS:  0000000000000000(0000) GS:ffff880017c00000(0063) knlGS:00000000f7639700
Feb 29 17:01:37 goscie kernel: [  265.412034] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
Feb 29 17:01:37 goscie kernel: [  265.412034] CR2: 00000000f73a5000 CR3: 0000000014724000 CR4: 00000000000006f0
Feb 29 17:01:37 goscie kernel: [  265.412034] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 29 17:01:37 goscie kernel: [  265.412034] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 29 17:01:37 goscie kernel: [  265.412034] Process xfsdump (pid: 1465, threadinfo ffff880009ee4000, task ffff88000329f120)
Feb 29 17:01:37 goscie kernel: [  265.412034] Stack:
Feb 29 17:01:37 goscie kernel: [  265.412034]  0000000000028001 ffffffffa0085a82 0000000000008000 ffff8800063ec980
Feb 29 17:01:37 goscie kernel: [  265.412034]  0000000000028001 ffffffffa00860d6 0000025c00000000 0000000000000000
Feb 29 17:01:37 goscie kernel: [  265.412034]  ffff880012084800 0000000000000000 0000000000008000 ffffffffa00dff34
Feb 29 17:01:37 goscie kernel: [  265.412034] Call Trace:
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa0085a82>] ? _xfs_buf_read+0x4b/0x86 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00860d6>] ? xfs_buf_read+0x8a/0xc0 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00dff34>] ? xfs_trans_read_buf+0x4b/0x537 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00cab65>] ? xfs_imap_to_bp+0x47/0x209 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00ccdf3>] ? xfs_iread+0x59/0x194 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa008ba2a>] ? xfs_iget+0x408/0x5ea [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa0099015>] ? kmem_zone_zalloc+0x13/0x2e [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa0090c2b>] ? xfs_bulkstat_one_int+0x9a/0x319 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00eafec>] ? xfs_bstime_store_compat+0x4d/0x4d [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00eb790>] ? xfs_bulkstat_one_compat+0x17/0x1c [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00909e0>] ? xfs_bulkstat+0x58c/0x73d [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00eb779>] ? xfs_compat_fssetdm_by_handle+0xc7/0xc7 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00ebc99>] ? xfs_compat_ioc_bulkstat+0x162/0x1cf [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00ec0b2>] ? xfs_file_compat_ioctl+0x3ac/0x486 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa00a7981>] ? xfs_bmapi_read+0x1ad/0x31a [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa0088d5f>] ? xfs_file_aio_read+0x24c/0x263 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa008b323>] ? xfs_iunlock+0xea/0x146 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffffa009600b>] ? xfs_free_eofblocks+0xa5/0x217 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffff81142259>] ? compat_sys_ioctl+0x1a5/0xe28
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffff81071d1e>] ? timekeeping_get_ns+0xd/0x2a
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffff81072986>] ? getnstimeofday+0x4d/0x80
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffff81072a06>] ? do_gettimeofday+0x10/0x2f
Feb 29 17:01:37 goscie kernel: [  265.412034]  [<ffffffff813615b6>] ? cstar_dispatch+0x7/0x21
Feb 29 17:01:37 goscie kernel: [  265.412034] Code: 89 64 24 28 e8 86 fc ff ff 48 83 c4 68 c3 48 89 f1 41 89 d0 48 83 ec 08 48 89 fa 48 c7 c6 8d 7e 0f a0 31 ff 31 c0 e8 8d ff ff ff <0f> 0b eb fe 90 90 45 31 c0 31 c0 48 83 7f 38 00 75 67 e9 bd 00 
Feb 29 17:01:37 goscie kernel: [  265.412034] RIP  [<ffffffffa00913aa>] assfail+0x1d/0x23 [xfs]
Feb 29 17:01:37 goscie kernel: [  265.412034]  RSP <ffff880009ee5938>
Feb 29 17:01:37 goscie kernel: [  265.445134] ---[ end trace 1e67152b46e37c22 ]---

I'm trying to bisect but it would take a while.

root@goscie:~# xfs_info /
meta-data=/dev/disk/by-uuid/07910403-cd4b-4388-bab2-df7372aa7227 isize=256    agcount=9, agsize=114816 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=983552, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

root@goscie:~# grep xfs /proc/mounts 
/dev/disk/by-uuid/07910403-cd4b-4388-bab2-df7372aa7227 / xfs rw,noatime,attr2,noquota 0 0

Regards,
Kupson
PS: Please do CC: me. I'm not subscribed.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-02-29 18:20 XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598 Rafał Kupka
@ 2012-03-02 10:43 ` Christoph Hellwig
  2012-03-03 11:37   ` Rafał Kupka
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2012-03-02 10:43 UTC (permalink / raw)
  To: Rafa?? Kupka; +Cc: xfs

>From the trace this looks like we got incorrect results from xfs_imap,
which got some fairly significant changes between the two kernels
you tested.  Can you check if 2.6.36 shows this issue?  It's the first
kernel that had these big changes in inode lookup.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-02 10:43 ` Christoph Hellwig
@ 2012-03-03 11:37   ` Rafał Kupka
  2012-03-03 11:44     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Rafał Kupka @ 2012-03-03 11:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2012-03-02 at 05:43 -0500, Christoph Hellwig wrote:
Hello,

> From the trace this looks like we got incorrect results from xfs_imap,
> which got some fairly significant changes between the two kernels
> you tested.  Can you check if 2.6.36 shows this issue?  It's the first
> kernel that had these big changes in inode lookup.

I've tested more kernels and here are results (linus git tree, no
additional patches):
- v2.6.36 -- good
- v2.6.38 -- good
- v2.6.39 -- bad, I/O error log entry and locked processes:

Mar  2 20:13:58 goscie kernel: [  431.392301] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0xe6700       ("xfs_trans_read_buf") error 5 buf count 8192
Mar  2 20:13:58 goscie kernel: [  431.427181] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
Mar  2 20:13:59 goscie kernel: [  433.086049] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0xe6700       ("xfs_trans_read_buf") error 5 buf count 8192
Mar  2 20:13:59 goscie kernel: [  433.088638] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
Mar  2 20:16:47 goscie kernel: [  600.584162] INFO: task cron:718 blocked for more than 120 seconds.
Mar  2 20:16:47 goscie kernel: [  600.585520] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 20:16:47 goscie kernel: [  600.587651] cron            D ffff8800174d2180     0   718      1 0x00020000
Mar  2 20:16:47 goscie kernel: [  600.589632]  ffff8800174d2180 0000000000000082 ffffffff8100e0d1 ffff88001822c7a0
Mar  2 20:16:47 goscie kernel: [  600.592651]  0000000000013d40 ffff880013831fd8 ffff880013831fd8 0000000000013d40
Mar  2 20:16:47 goscie kernel: [  600.595718]  ffff8800174d2180 ffff880013830010 ffff88001fc13d40 ffff88001821ed00
Mar  2 20:16:47 goscie kernel: [  600.598314] Call Trace:
Mar  2 20:16:47 goscie kernel: [  600.598640]  [<ffffffff8100e0d1>] ? sched_clock+0x5/0x8
Mar  2 20:16:47 goscie kernel: [  600.599099]  [<ffffffff8139166c>] ? schedule_timeout+0x2d/0xd7
Mar  2 20:16:47 goscie kernel: [  600.599503]  [<ffffffff81391fab>] ? __down_common+0x92/0xe3
Mar  2 20:16:47 goscie kernel: [  600.599935]  [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.600383]  [<ffffffff8105f3bc>] ? down+0x27/0x37
Mar  2 20:16:47 goscie kernel: [  600.600782]  [<ffffffff811d6365>] ? xfs_buf_lock+0x64/0xb0
Mar  2 20:16:47 goscie kernel: [  600.601179]  [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.601581]  [<ffffffff811d7cd6>] ? xfs_buf_get+0x5b/0x14f
Mar  2 20:16:47 goscie kernel: [  600.601986]  [<ffffffff811d7de0>] ? xfs_buf_read+0x16/0xbf
Mar  2 20:16:47 goscie kernel: [  600.602360]  [<ffffffff811ce10a>] ? xfs_trans_read_buf+0x52/0x573
Mar  2 20:16:47 goscie kernel: [  600.602750]  [<ffffffff811b65ee>] ? xfs_imap_to_bp+0x47/0x206
Mar  2 20:16:47 goscie kernel: [  600.603171]  [<ffffffff811d6c56>] ? xfs_buf_rele+0xfa/0x17a
Mar  2 20:16:47 goscie kernel: [  600.603567]  [<ffffffff811b9531>] ? xfs_iread+0x59/0x1cc
Mar  2 20:16:47 goscie kernel: [  600.603983]  [<ffffffff811b3727>] ? xfs_iget+0x3e4/0x611
Mar  2 20:16:47 goscie kernel: [  600.612400]  [<ffffffff811d0d80>] ? xfs_lookup+0xaa/0xd9
Mar  2 20:16:47 goscie kernel: [  600.612757]  [<ffffffff811dc6f3>] ? xfs_vn_lookup+0x3d/0x7c
Mar  2 20:16:47 goscie kernel: [  600.613142]  [<ffffffff810fed3f>] ? d_alloc_and_lookup+0x43/0x60
Mar  2 20:16:47 goscie kernel: [  600.613611]  [<ffffffff8110061c>] ? walk_component+0x161/0x30c
Mar  2 20:16:47 goscie kernel: [  600.613967]  [<ffffffff81101683>] ? path_lookupat+0xad/0x33f
Mar  2 20:16:47 goscie kernel: [  600.614398]  [<ffffffff81033b5a>] ? update_cfs_shares+0x7c/0x149
Mar  2 20:16:47 goscie kernel: [  600.614779]  [<ffffffff81101933>] ? do_path_lookup+0x1e/0x9a
Mar  2 20:16:47 goscie kernel: [  600.615125]  [<ffffffff810ff44d>] ? getname_flags+0x14f/0x1c6
Mar  2 20:16:47 goscie kernel: [  600.615496]  [<ffffffff811023df>] ? user_path_at+0x48/0x7a
Mar  2 20:16:47 goscie kernel: [  600.615857]  [<ffffffff8105e7f5>] ? hrtimer_cancel+0xc/0x16
Mar  2 20:16:47 goscie kernel: [  600.616220]  [<ffffffff81391de2>] ? do_nanosleep+0x77/0xae
Mar  2 20:16:47 goscie kernel: [  600.616583]  [<ffffffff811078e7>] ? dput+0x84/0xe8
Mar  2 20:16:47 goscie kernel: [  600.616909]  [<ffffffff810fa00b>] ? vfs_fstatat+0x43/0x70
Mar  2 20:16:47 goscie kernel: [  600.617247]  [<ffffffff8102dbc0>] ? sys32_stat64+0x11/0x29
Mar  2 20:16:47 goscie kernel: [  600.617621]  [<ffffffff81399180>] ? cstar_dispatch+0x7/0x2e
Mar  2 20:16:47 goscie kernel: [  600.617984] INFO: task xfsdump:1849 blocked for more than 120 seconds.
Mar  2 20:16:47 goscie kernel: [  600.618346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 20:16:47 goscie kernel: [  600.618964] xfsdump         D ffff8800178936e0     0  1849   1826 0x00020000
Mar  2 20:16:47 goscie kernel: [  600.619448]  ffff8800178936e0 0000000000000086 ffff880000000000 ffffffff8160b020
Mar  2 20:16:47 goscie kernel: [  600.620276]  0000000000013d40 ffff88000d841fd8 ffff88000d841fd8 0000000000013d40
Mar  2 20:16:47 goscie kernel: [  600.621068]  ffff8800178936e0 ffff88000d840010 ffffea000053c950 0000000000000008
Mar  2 20:16:47 goscie kernel: [  600.621814] Call Trace:
Mar  2 20:16:47 goscie kernel: [  600.622155]  [<ffffffff8139166c>] ? schedule_timeout+0x2d/0xd7
Mar  2 20:16:47 goscie kernel: [  600.622509]  [<ffffffff811ce10a>] ? xfs_trans_read_buf+0x52/0x573
Mar  2 20:16:47 goscie kernel: [  600.622888]  [<ffffffff8119a361>] ? xfs_btree_check_sblock+0x64/0x15e
Mar  2 20:16:47 goscie kernel: [  600.623308]  [<ffffffff81391fab>] ? __down_common+0x92/0xe3
Mar  2 20:16:47 goscie kernel: [  600.623650]  [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.624065]  [<ffffffff8105f3bc>] ? down+0x27/0x37
Mar  2 20:16:47 goscie kernel: [  600.624398]  [<ffffffff811d6365>] ? xfs_buf_lock+0x64/0xb0
Mar  2 20:16:47 goscie kernel: [  600.624739]  [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8
Mar  2 20:16:47 goscie kernel: [  600.625127]  [<ffffffff811d7cd6>] ? xfs_buf_get+0x5b/0x14f
Mar  2 20:16:47 goscie kernel: [  600.625475]  [<ffffffff811d7de0>] ? xfs_buf_read+0x16/0xbf
Mar  2 20:16:47 goscie kernel: [  600.625820]  [<ffffffff811ce10a>] ? xfs_trans_read_buf+0x52/0x573
Mar  2 20:16:47 goscie kernel: [  600.626198]  [<ffffffff811b65ee>] ? xfs_imap_to_bp+0x47/0x206
Mar  2 20:16:47 goscie kernel: [  600.626550]  [<ffffffff811b9531>] ? xfs_iread+0x59/0x1cc
Mar  2 20:16:47 goscie kernel: [  600.626888]  [<ffffffff811b3727>] ? xfs_iget+0x3e4/0x611
Mar  2 20:16:47 goscie kernel: [  600.627277]  [<ffffffff811d3715>] ? kmem_alloc+0x28/0x9e
Mar  2 20:16:47 goscie kernel: [  600.627641]  [<ffffffff811bc59b>] ? xfs_bulkstat_one_int+0x9a/0x318
Mar  2 20:16:47 goscie kernel: [  600.628000]  [<ffffffff8117f3b4>] ? xfs_bstime_store_compat+0x4d/0x4d
Mar  2 20:16:47 goscie kernel: [  600.628433]  [<ffffffff8117fb58>] ? xfs_bulkstat_one_compat+0x17/0x1c
Mar  2 20:16:47 goscie kernel: [  600.628857]  [<ffffffff811bc350>] ? xfs_bulkstat+0x58c/0x73d
Mar  2 20:16:47 goscie kernel: [  600.629220]  [<ffffffff8117fb41>] ? xfs_compat_fssetdm_by_handle+0xc7/0xc7
Mar  2 20:16:47 goscie kernel: [  600.629635]  [<ffffffff8118006a>] ? xfs_compat_ioc_bulkstat+0x162/0x1cf
Mar  2 20:16:47 goscie kernel: [  600.630080]  [<ffffffff8103680c>] ? task_rq_lock+0x47/0x7a
Mar  2 20:16:47 goscie kernel: [  600.630463]  [<ffffffff81180462>] ? xfs_file_compat_ioctl+0x38b/0x465
Mar  2 20:16:47 goscie kernel: [  600.630829]  [<ffffffff81030962>] ? __wake_up_common+0x41/0x78
Mar  2 20:16:47 goscie kernel: [  600.631177]  [<ffffffff810366c2>] ? __wake_up+0x35/0x46
Mar  2 20:16:47 goscie kernel: [  600.631537]  [<ffffffff812915c0>] ? pty_write+0x48/0x53
Mar  2 20:16:47 goscie kernel: [  600.631895]  [<ffffffff8103e770>] ? perf_event_task_sched_out+0x48/0x54
Mar  2 20:16:47 goscie kernel: [  600.632285]  [<ffffffff8128cc83>] ? n_tty_write+0x322/0x360
Mar  2 20:16:47 goscie kernel: [  600.632635]  [<ffffffff8112dec3>] ? compat_sys_ioctl+0x1bb/0xfb4
Mar  2 20:16:47 goscie kernel: [  600.632993]  [<ffffffff810086b8>] ? __switch_to+0x1c9/0x288
Mar  2 20:16:47 goscie kernel: [  600.633340]  [<ffffffff810379f3>] ? pick_next_task_fair+0xf9/0x10a
Mar  2 20:16:47 goscie kernel: [  600.633707]  [<ffffffff8102602a>] ? pvclock_clocksource_read+0x46/0xb4
Mar  2 20:16:47 goscie kernel: [  600.634079]  [<ffffffff810622ff>] ? timekeeping_get_ns+0xe/0x2e
Mar  2 20:16:47 goscie kernel: [  600.634430]  [<ffffffff81062975>] ? getnstimeofday+0x4d/0x80
Mar  2 20:16:47 goscie kernel: [  600.634778]  [<ffffffff810629f5>] ? do_gettimeofday+0x10/0x2e
Mar  2 20:16:47 goscie kernel: [  600.635136]  [<ffffffff81399180>] ? cstar_dispatch+0x7/0x2e

- git revision 243b422af9ea9af4ead07a8ad54c90d4f9b6081a -- good
It's one patch before "xfs: preallocation transactions do not need to be synchronous"

- git revision c5850150d0b9ae16840c5d9846f9d5a759996a15:
    Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
    
    * 'for-linus' of git://oss.sgi.com/xfs/xfs:
      xfs: stop using the page cache to back the buffer cache
      xfs: register the inode cache shrinker before quotachecks
      xfs: xfs_trans_read_buf() should return an error on failure
      xfs: introduce inode cluster buffer trylocks for xfs_iflush
      vmap: flush vmap aliases when mapping fails
      xfs: preallocation transactions do not need to be synchronous
    
Hung xfsdump (only?) without any additional log entries
(CONFIG_XFS_DEBUG is set):

Mar  3 12:05:57 goscie kernel: [  960.816216] INFO: task xfsdump:1519 blocked for more than 120 seconds.
Mar  3 12:05:57 goscie kernel: [  960.818113] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  3 12:05:57 goscie kernel: [  960.820223] xfsdump         D ffff88001e196950     0  1519   1490 0x00020000
Mar  3 12:05:57 goscie kernel: [  960.821572]  ffff88001e196950 0000000000000086 ffff880000000000 ffffffff8160b020
Mar  3 12:05:57 goscie kernel: [  960.823618]  0000000000013d00 ffff880009903fd8 ffff880009903fd8 0000000000013d00
Mar  3 12:05:57 goscie kernel: [  960.825707]  ffff88001e196950 ffff880009902010 0000000100000000 0000000000000008
Mar  3 12:05:57 goscie kernel: [  960.827746] Call Trace:
Mar  3 12:05:57 goscie kernel: [  960.828672]  [<ffffffff81391493>] ? schedule_timeout+0x2d/0xd7
Mar  3 12:05:57 goscie kernel: [  960.829937]  [<ffffffff811ce4d2>] ? xfs_trans_read_buf+0x52/0x573
Mar  3 12:05:57 goscie kernel: [  960.831118]  [<ffffffff8119a8a5>] ? xfs_btree_check_sblock+0x64/0x15e
Mar  3 12:05:57 goscie kernel: [  960.831459]  [<ffffffff81391dd2>] ? __down_common+0x92/0xe3
Mar  3 12:05:57 goscie kernel: [  960.831780]  [<ffffffff811d7fab>] ? _xfs_buf_find+0x1ed/0x2c8
Mar  3 12:05:57 goscie kernel: [  960.832125]  [<ffffffff8105f2fc>] ? down+0x27/0x37
Mar  3 12:05:57 goscie kernel: [  960.832424]  [<ffffffff811d6776>] ? xfs_buf_lock+0x7c/0xc8
Mar  3 12:05:57 goscie kernel: [  960.832739]  [<ffffffff811d7fab>] ? _xfs_buf_find+0x1ed/0x2c8
Mar  3 12:05:57 goscie kernel: [  960.833068]  [<ffffffff811d80e1>] ? xfs_buf_get+0x5b/0x14f
Mar  3 12:05:57 goscie kernel: [  960.833412]  [<ffffffff811d81eb>] ? xfs_buf_read+0x16/0xbf
Mar  3 12:05:57 goscie kernel: [  960.833737]  [<ffffffff811ce4d2>] ? xfs_trans_read_buf+0x52/0x573
Mar  3 12:05:57 goscie kernel: [  960.834071]  [<ffffffff811b6b2a>] ? xfs_imap_to_bp+0x47/0x206
Mar  3 12:05:57 goscie kernel: [  960.834396]  [<ffffffff811b9a6d>] ? xfs_iread+0x59/0x1cc
Mar  3 12:05:57 goscie kernel: [  960.834706]  [<ffffffff811b3c63>] ? xfs_iget+0x3e4/0x611
Mar  3 12:05:57 goscie kernel: [  960.835022]  [<ffffffff811d3b15>] ? kmem_alloc+0x60/0x9e
Mar  3 12:05:57 goscie kernel: [  960.835338]  [<ffffffff811bca6b>] ? xfs_bulkstat_one_int+0x9a/0x318
Mar  3 12:05:57 goscie kernel: [  960.835677]  [<ffffffff8117f8d4>] ? xfs_bstime_store_compat+0x4d/0x4d
Mar  3 12:05:57 goscie kernel: [  960.836029]  [<ffffffff81180078>] ? xfs_bulkstat_one_compat+0x17/0x1c
Mar  3 12:05:57 goscie kernel: [  960.836377]  [<ffffffff811bc820>] ? xfs_bulkstat+0x58c/0x73d
Mar  3 12:05:57 goscie kernel: [  960.836697]  [<ffffffff81180061>] ? xfs_compat_fssetdm_by_handle+0xc7/0xc7
Mar  3 12:05:57 goscie kernel: [  960.837049]  [<ffffffff8118058a>] ? xfs_compat_ioc_bulkstat+0x162/0x1cf
Mar  3 12:05:57 goscie kernel: [  960.837439]  [<ffffffff810b1d7a>] ? generic_file_aio_read+0x58d/0x5a2
Mar  3 12:05:57 goscie kernel: [  960.837790]  [<ffffffff81180982>] ? xfs_file_compat_ioctl+0x38b/0x465
Mar  3 12:05:57 goscie kernel: [  960.838132]  [<ffffffff811b3219>] ? xfs_iunlock+0xea/0x147
Mar  3 12:05:57 goscie kernel: [  960.838467]  [<ffffffff811d1221>] ? xfs_free_eofblocks+0xaa/0x222
Mar  3 12:05:57 goscie kernel: [  960.838810]  [<ffffffff8112e413>] ? compat_sys_ioctl+0x1bb/0xfb4
Mar  3 12:05:57 goscie kernel: [  960.846656]  [<ffffffff8102602a>] ? pvclock_clocksource_read+0x46/0xb4
Mar  3 12:05:57 goscie kernel: [  960.847015]  [<ffffffff8106222f>] ? timekeeping_get_ns+0xe/0x2e
Mar  3 12:05:57 goscie kernel: [  960.847343]  [<ffffffff810628a5>] ? getnstimeofday+0x4d/0x80
Mar  3 12:05:57 goscie kernel: [  960.847664]  [<ffffffff81062925>] ? do_gettimeofday+0x10/0x2e
Mar  3 12:05:57 goscie kernel: [  960.847994]  [<ffffffff81398fc0>] ? cstar_dispatch+0x7/0x2e

I don't know if it's the same bug - there is no assertion nor I/O error
log entry...

Regards,
Rafał Kupka

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-03 11:37   ` Rafał Kupka
@ 2012-03-03 11:44     ` Christoph Hellwig
  2012-03-05 21:40       ` Rafał Kupka
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2012-03-03 11:44 UTC (permalink / raw)
  To: Rafa?? Kupka; +Cc: Christoph Hellwig, xfs

On Sat, Mar 03, 2012 at 12:37:07PM +0100, Rafa?? Kupka wrote:
> On Fri, 2012-03-02 at 05:43 -0500, Christoph Hellwig wrote:
> Hello,
> 
> > From the trace this looks like we got incorrect results from xfs_imap,
> > which got some fairly significant changes between the two kernels
> > you tested.  Can you check if 2.6.36 shows this issue?  It's the first
> > kernel that had these big changes in inode lookup.
> 
> I've tested more kernels and here are results (linus git tree, no
> additional patches):
> - v2.6.36 -- good
> - v2.6.38 -- good
> - v2.6.39 -- bad, I/O error log entry and locked processes:

> - git revision 243b422af9ea9af4ead07a8ad54c90d4f9b6081a -- good
> It's one patch before "xfs: preallocation transactions do not need to be synchronous"
> 
> - git revision c5850150d0b9ae16840c5d9846f9d5a759996a15:
>     Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
>     
>     * 'for-linus' of git://oss.sgi.com/xfs/xfs:
>       xfs: stop using the page cache to back the buffer cache
>       xfs: register the inode cache shrinker before quotachecks
>       xfs: xfs_trans_read_buf() should return an error on failure

This is the only patch in that area.  Give me a bit more time to
figure out how it could related to your issues.

> Hung xfsdump (only?) without any additional log entries
> (CONFIG_XFS_DEBUG is set):

> I don't know if it's the same bug - there is no assertion nor I/O error
> log entry...

It's the same code path at least..

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-03 11:44     ` Christoph Hellwig
@ 2012-03-05 21:40       ` Rafał Kupka
  2012-03-06 20:30         ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Rafał Kupka @ 2012-03-05 21:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sat, 2012-03-03 at 06:44 -0500, Christoph Hellwig wrote:
> On Sat, Mar 03, 2012 at 12:37:07PM +0100, Rafa?? Kupka wrote:
> > On Fri, 2012-03-02 at 05:43 -0500, Christoph Hellwig wrote:
Hi,

I've completed bisection process. It looks like kernel up to

704b2907 "xfs: register the inode cache shrinker before quotachecks"

is good and first bad commit is:

0e6e847f "xfs: stop using the page cache to back the buffer cache"

I double tested it - 704b2907 can do 60+ loops of xfsdump without any
trouble and 0e6e847f breaks in <10 iterations.

> > - git revision c5850150d0b9ae16840c5d9846f9d5a759996a15:
> >     Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
> >     
> >     * 'for-linus' of git://oss.sgi.com/xfs/xfs:
> >       xfs: stop using the page cache to back the buffer cache
> >       xfs: register the inode cache shrinker before quotachecks
> >       xfs: xfs_trans_read_buf() should return an error on failure
> 
> This is the only patch in that area.  Give me a bit more time to
> figure out how it could related to your issues.

Can I do something more to help?

Regards,
Rafał Kupka

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-05 21:40       ` Rafał Kupka
@ 2012-03-06 20:30         ` Dave Chinner
  2012-03-08 19:10           ` Rafał Kupka
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-03-06 20:30 UTC (permalink / raw)
  To: Rafał Kupka; +Cc: Christoph Hellwig, xfs

On Mon, Mar 05, 2012 at 10:40:12PM +0100, Rafał Kupka wrote:
> On Sat, 2012-03-03 at 06:44 -0500, Christoph Hellwig wrote:
> > On Sat, Mar 03, 2012 at 12:37:07PM +0100, Rafa?? Kupka wrote:
> > > On Fri, 2012-03-02 at 05:43 -0500, Christoph Hellwig wrote:
> Hi,
> 
> I've completed bisection process. It looks like kernel up to
> 
> 704b2907 "xfs: register the inode cache shrinker before quotachecks"
> 
> is good and first bad commit is:
> 
> 0e6e847f "xfs: stop using the page cache to back the buffer cache"

Which changed internals of the buffer cache, but not the external
interfaces.

> I double tested it - 704b2907 can do 60+ loops of xfsdump without any
> trouble and 0e6e847f breaks in <10 iterations.

What is your test case?

Also, it might be worthwhile to capture an event trace (e.g. all the
xfs_buf events) to see what occurs just before the error is
triggered. That might tell us exactl what sequence has occurred
leading up to the error.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-06 20:30         ` Dave Chinner
@ 2012-03-08 19:10           ` Rafał Kupka
  2012-03-26 16:05             ` Rafał Kupka
  0 siblings, 1 reply; 12+ messages in thread
From: Rafał Kupka @ 2012-03-08 19:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, 2012-03-07 at 07:30 +1100, Dave Chinner wrote:
> On Mon, Mar 05, 2012 at 10:40:12PM +0100, Rafał Kupka wrote:
Hi,

> > is good and first bad commit is:
> > 
> > 0e6e847f "xfs: stop using the page cache to back the buffer cache"
> 
> Which changed internals of the buffer cache, but not the external
> interfaces.
> 
> > I double tested it - 704b2907 can do 60+ loops of xfsdump without any
> > trouble and 0e6e847f breaks in <10 iterations.
> 
> What is your test case?

I run xfsdump (to /dev/null) in a loop and execute postmark benchmark at
the same time. It's a minimal Debian/Squeeze installation in KVM guest
but it happened on another system too. An Xen server with 512MB memory
in Dom0.

xfsdump -f /dev/null -l0 -p1 -J -L "test" -M "test" /

and postmark with file (also in loop):
set size 500 1000000
set number 1000
set transactions 1000
run
quit

Typically after few iterations xfsdump is stuck with log messages
mentioned in previous e-mails.

Important thing I recently notice - I'm able to reproduce this bug only
on small memory KVM guests (384M). This very same system started with 1G
memory runs fine for 100+ xfsdump executions.

> Also, it might be worthwhile to capture an event trace (e.g. all the
> xfs_buf events) to see what occurs just before the error is
> triggered. That might tell us exactl what sequence has occurred
> leading up to the error.

I will try to provide this information in a few days.

Best Regards,
Rafal Kupka

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-08 19:10           ` Rafał Kupka
@ 2012-03-26 16:05             ` Rafał Kupka
  2012-03-27  3:28               ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Rafał Kupka @ 2012-03-26 16:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, 2012-03-08 at 20:10 +0100, Rafał Kupka wrote: 
> On Wed, 2012-03-07 at 07:30 +1100, Dave Chinner wrote:

Hi,

Finally I've captured this event trace. Hope it will help.

> > Also, it might be worthwhile to capture an event trace (e.g. all the
> > xfs_buf events) to see what occurs just before the error is
> > triggered. That might tell us exactl what sequence has occurred
> > leading up to the error.
> 
> I will try to provide this information in a few days.

Kernel version - Linus tree up to:
0e6e847f "xfs: stop using the page cache to back the buffer cache"

It's uniprocessor virtual machine (KVM) with 384 MB RAM.

Full event trace is available at https://gist.github.com/2206109
(sorry for it's *.perl extension, my mistake).

Kernel log entry:
[  495.197571] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80       ("xfs_trans_read_buf") error 5 buf count 8192
[  495.198283] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
[  501.540477] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80       ("xfs_trans_read_buf") error 5 buf count 8192
[  501.543047] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.

It's possible that this "I/O error" is caused by memory pressure?

Related trace line:
xfsdump-2498  [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply

Events related to "bno 0x13d80":
xfsdump-2498  [000] 24195.782266: xfs_buf_get: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_read
xfsdump-2498  [000] 24195.782266: xfs_buf_read: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_trans_read_buf
xfsdump-2498  [000] 24195.782267: xfs_buf_iorequest: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller _xfs_buf_read
xfsdump-2498  [000] 24195.782267: xfs_buf_hold: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
xfsdump-2498  [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply
xfsdump-2498  [000] 24195.782270: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
xfsdump-2498  [000] 24195.782985: xfs_buf_unlock: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_buf_relse
xfsdump-2498  [000] 24195.782986: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_trans_read_buf

Do this data put some light on this issue?

Regards,
Rafał Kupka

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-26 16:05             ` Rafał Kupka
@ 2012-03-27  3:28               ` Dave Chinner
  2012-03-27 10:17                 ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-03-27  3:28 UTC (permalink / raw)
  To: Rafał Kupka; +Cc: Christoph Hellwig, xfs

On Mon, Mar 26, 2012 at 06:05:25PM +0200, Rafał Kupka wrote:
> On Thu, 2012-03-08 at 20:10 +0100, Rafał Kupka wrote: 
> > On Wed, 2012-03-07 at 07:30 +1100, Dave Chinner wrote:
> 
> Hi,
> 
> Finally I've captured this event trace. Hope it will help.
> 
> > > Also, it might be worthwhile to capture an event trace (e.g. all the
> > > xfs_buf events) to see what occurs just before the error is
> > > triggered. That might tell us exactl what sequence has occurred
> > > leading up to the error.
> > 
> > I will try to provide this information in a few days.
> 
> Kernel version - Linus tree up to:
> 0e6e847f "xfs: stop using the page cache to back the buffer cache"
> 
> It's uniprocessor virtual machine (KVM) with 384 MB RAM.

Ok, so limited memory and no parallelism. Can you send me you
.config so I can have a look at other various config options?

> Full event trace is available at https://gist.github.com/2206109
> (sorry for it's *.perl extension, my mistake).
> 
> Kernel log entry:
> [  495.197571] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80       ("xfs_trans_read_buf") error 5 buf count 8192
> [  495.198283] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> [  501.540477] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80       ("xfs_trans_read_buf") error 5 buf count 8192
> [  501.543047] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> 
> It's possible that this "I/O error" is caused by memory pressure?
> 
> Related trace line:
> xfsdump-2498  [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply
> 
> Events related to "bno 0x13d80":
> xfsdump-2498  [000] 24195.782266: xfs_buf_get: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_read
> xfsdump-2498  [000] 24195.782266: xfs_buf_read: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_trans_read_buf
> xfsdump-2498  [000] 24195.782267: xfs_buf_iorequest: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller _xfs_buf_read
> xfsdump-2498  [000] 24195.782267: xfs_buf_hold: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
> xfsdump-2498  [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply
> xfsdump-2498  [000] 24195.782270: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
> xfsdump-2498  [000] 24195.782985: xfs_buf_unlock: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_buf_relse
> xfsdump-2498  [000] 24195.782986: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_trans_read_buf
> 
> Do this data put some light on this issue?

Yes, that it was errored out because the bio that was built from the
buffer was zero length. That means the buffer was not set up
properly in the first place. Hmmm, this is unusual - the two lines
above your traces:

xfs_buf_cond_lock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|TRYLOCK|PAGES caller _xfs_buf_find
xfs_buf_find: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
xfs_buf_get: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_read
xfs_buf_read: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_trans_read_buf

The lookup found an uninitialised buffer in the cache. That's
exceedingly curious - the match is done on bp->b_file_offset, and
after it has been inserted it the bno is then filled out.

Oh, wow, I think I see the bug. The key is the initial state of that
uninitialised buffer: READ|ASYNC|TRYLOCK|PAGES, and the state of
the it from the xfs_buf_get() trace READ|LOCK|DONT_BLOCK. Those two
sets of flags do not match up - the initial set indicate the buffer
has been inserted into the cache for readahead, the second set
indicate a blocking read. 

You've only got a small amount of memory, so I suspect that a memory
allocation has failed in a specific place to cause this. I'm not
exactly sure how the second thread trips over it and initialises it
somewhat (there's 3 initialisation steps to a buffer) but the
initial failure is probably this:

	async readhead started
	_xfs_buf_find fails
	allocate new buffer A
	partially initialise A
	_xfs_buf_find
	   fail to find match
	   insert buffer A into rbtree
	   returns buffer A
	A == A
	   fails allocate pages to buffer
	unlock buffer
	release buffer

	<uninitialised buffer now sitting in cache. THIS IS WRONG!>


That would lead to a second lookup finding it and using it, but I'm
still not quite sure how the buffer bno gets initialised. I can
construct valid scenarios that will cause it to happen, but none
match the event trace you've posted.

And just looking at the way we clean up this error, this race
condition could lead to a use after free if we immediately free
if we fail to allocate the pages to the buffer. Messy. I'll have to
think on this a bit - maybe a "broken" state is necessary.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-27  3:28               ` Dave Chinner
@ 2012-03-27 10:17                 ` Dave Chinner
  2012-03-30 16:02                   ` Rafał Kupka
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-03-27 10:17 UTC (permalink / raw)
  To: Rafał Kupka; +Cc: Christoph Hellwig, xfs

On Tue, Mar 27, 2012 at 02:28:43PM +1100, Dave Chinner wrote:
> On Mon, Mar 26, 2012 at 06:05:25PM +0200, Rafał Kupka wrote:
> > On Thu, 2012-03-08 at 20:10 +0100, Rafał Kupka wrote: 
> > > On Wed, 2012-03-07 at 07:30 +1100, Dave Chinner wrote:
> > 
> > Hi,
> > 
> > Finally I've captured this event trace. Hope it will help.
> > 
> > > > Also, it might be worthwhile to capture an event trace (e.g. all the
> > > > xfs_buf events) to see what occurs just before the error is
> > > > triggered. That might tell us exactl what sequence has occurred
> > > > leading up to the error.
> > > 
> > > I will try to provide this information in a few days.
> > 
> > Kernel version - Linus tree up to:
> > 0e6e847f "xfs: stop using the page cache to back the buffer cache"
> > 
> > It's uniprocessor virtual machine (KVM) with 384 MB RAM.
> 
> Ok, so limited memory and no parallelism. Can you send me you
> .config so I can have a look at other various config options?
> 
> > Full event trace is available at https://gist.github.com/2206109
> > (sorry for it's *.perl extension, my mistake).
> > 
> > Kernel log entry:
> > [  495.197571] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80       ("xfs_trans_read_buf") error 5 buf count 8192
> > [  495.198283] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> > [  501.540477] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0x13d80       ("xfs_trans_read_buf") error 5 buf count 8192
> > [  501.543047] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> > 
> > It's possible that this "I/O error" is caused by memory pressure?
> > 
> > Related trace line:
> > xfsdump-2498  [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply
> > 
> > Events related to "bno 0x13d80":
> > xfsdump-2498  [000] 24195.782266: xfs_buf_get: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_read
> > xfsdump-2498  [000] 24195.782266: xfs_buf_read: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_trans_read_buf
> > xfsdump-2498  [000] 24195.782267: xfs_buf_iorequest: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller _xfs_buf_read
> > xfsdump-2498  [000] 24195.782267: xfs_buf_hold: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
> > xfsdump-2498  [000] 24195.782269: xfs_buf_ioerror: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 error 5 flags READ|TRYLOCK|PAGES caller _xfs_buf_ioapply
> > xfsdump-2498  [000] 24195.782270: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 3 pincount 0 lock 0 flags READ|TRYLOCK|PAGES caller xfs_buf_iorequest
> > xfsdump-2498  [000] 24195.782985: xfs_buf_unlock: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_buf_relse
> > xfsdump-2498  [000] 24195.782986: xfs_buf_rele: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 1 flags READ|TRYLOCK|PAGES caller xfs_trans_read_buf
> > 
> > Do this data put some light on this issue?
> 
> Yes, that it was errored out because the bio that was built from the
> buffer was zero length. That means the buffer was not set up
> properly in the first place. Hmmm, this is unusual - the two lines
> above your traces:
> 
> xfs_buf_cond_lock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|TRYLOCK|PAGES caller _xfs_buf_find
> xfs_buf_find: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
> xfs_buf_get: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_read
> xfs_buf_read: dev 254:2 bno 0x13d80 len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_trans_read_buf
> 
> The lookup found an uninitialised buffer in the cache. That's
> exceedingly curious - the match is done on bp->b_file_offset, and
> after it has been inserted it the bno is then filled out.
> 
> Oh, wow, I think I see the bug. The key is the initial state of that
> uninitialised buffer: READ|ASYNC|TRYLOCK|PAGES, and the state of
> the it from the xfs_buf_get() trace READ|LOCK|DONT_BLOCK. Those two
> sets of flags do not match up - the initial set indicate the buffer
> has been inserted into the cache for readahead, the second set
> indicate a blocking read. 
> 
> You've only got a small amount of memory, so I suspect that a memory
> allocation has failed in a specific place to cause this. I'm not
> exactly sure how the second thread trips over it and initialises it
> somewhat (there's 3 initialisation steps to a buffer) but the
> initial failure is probably this:
> 
> 	async readhead started
> 	_xfs_buf_find fails
> 	allocate new buffer A
> 	partially initialise A
> 	_xfs_buf_find
> 	   fail to find match
> 	   insert buffer A into rbtree
> 	   returns buffer A
> 	A == A
> 	   fails allocate pages to buffer
> 	unlock buffer
> 	release buffer
> 
> 	<uninitialised buffer now sitting in cache. THIS IS WRONG!>
> 
> 
> That would lead to a second lookup finding it and using it, but I'm
> still not quite sure how the buffer bno gets initialised. I can
> construct valid scenarios that will cause it to happen, but none
> match the event trace you've posted.
> 
> And just looking at the way we clean up this error, this race
> condition could lead to a use after free if we immediately free
> if we fail to allocate the pages to the buffer. Messy. I'll have to
> think on this a bit - maybe a "broken" state is necessary.

So here's a patch that implements a "broken" state. It's not pretty,
but I think it will fix the race you look to be hitting.

Can you run your test with this patch while running a trace for the
new xfs_buf_*broken events to verify you are hitting that code and
that it doesn't trigger EIO problems anymore?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: fix buffer lookup race on allocation failure

From: Dave Chinner <dchinner@redhat.com>

When memory allocation fails to add the page array or tht epages to
a buffer during xfs_buf_get(), the buffer is left in the cache in a
partially initialised state. There is enough state left for the next
lookup on that buffer to find the buffer, and for the buffer to then
be used without finishing the initialisation.  As a result, when an
attempt to do IO on the buffer occurs, it fails with EIO because
there are no pages attached to the buffer.

We cannot remove the buffer from the cache immediately and free it,
because there may already be a racing lookup that is blocked on the
buffer lock. hence the moment we unlock the buffer to then free it,
the other user is woken and we have a use-after-free situation.

Hence we have to mark the buffer as "broken" and check that after we
have gained the buffer lock on a cache hit lookup. This enables
racing lookups to avoid the broken buffer and drop their references,
allowing the buffer to be freed.

This however, doesn't solve the problem completely - there may be a
delay in the buffer getting freed (e.g. pre-emption), so when we try
the lookup a second time with a new buffer to insert into the tree,
if we find the broken buffer again, drop the buffer lock, sleep for
a short while, and try the lookup again. When the broken bufer is
finally removed from the cache we will make forwards progress.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c   |   33 ++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_buf.h   |    2 ++
 fs/xfs/xfs_trace.h |    2 ++
 3 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 6819b51..3588460 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -441,6 +441,7 @@ _xfs_buf_find(
 	ASSERT(!(range_base & (xfs_off_t)btp->bt_smask));
 
 	/* get tree root */
+lookup_again:
 	pag = xfs_perag_get(btp->bt_mount,
 				xfs_daddr_to_agno(btp->bt_mount, ioff));
 
@@ -505,6 +506,24 @@ found:
 	}
 
 	/*
+	 * If the buffer initialisation was not completed correctly after
+	 * insertion into the cache we can't use this buffer and we are pinning
+	 * it preventing it from being removed from the cache. Hence just avoid
+	 * the buffer if it is broken, allowing it to be reclaimed and removed
+	 * from the cache. If we are here with a new buffer to insert into the
+	 * cache, we want to try the lookup again so that the expected "insert
+	 * on lookup failure" semantics are preserved for this case.
+	 */
+	if (bp->b_flags & XBF_BROKEN) {
+		trace_xfs_buf_find_broken(bp, flags, _RET_IP_);
+		xfs_buf_relse(bp);
+		if (!new_bp)
+			return NULL;
+		delay(1);
+		goto lookup_again;
+	}
+
+	/*
 	 * if the buffer is stale, clear all the external state associated with
 	 * it. We need to keep flags such as how we allocated the buffer memory
 	 * intact here.
@@ -552,8 +571,20 @@ xfs_buf_get(
 
 	if (bp == new_bp) {
 		error = xfs_buf_allocate_memory(bp, flags);
-		if (error)
+		if (error) {
+			/*
+			 * The buffer has been inserted into the cache now, so
+			 * will be visible to other callers. Once we unlock the
+			 * buffer, someone else can grab it out of the tree.
+			 * It's a broken buffer, so we have to ensure that it is
+			 * noticed - just freeing the buffer here can result in
+			 * use-after-free if someone is already waiting on the
+			 * buffer lock.
+			 */
+			bp->b_flags |= XBF_BROKEN;
+			trace_xfs_buf_get_broken(bp, flags, _RET_IP_);
 			goto no_buffer;
+		}
 	} else
 		kmem_zone_free(xfs_buf_zone, new_bp);
 
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index df7ffb0..b4545b9 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -52,6 +52,7 @@ typedef enum {
 #define XBF_DONE	(1 << 5) /* all pages in the buffer uptodate */
 #define XBF_DELWRI	(1 << 6) /* buffer has dirty pages */
 #define XBF_STALE	(1 << 7) /* buffer has been staled, do not find it */
+#define XBF_BROKEN	(1 << 8) /* buffer is broken, do not find it */
 
 /* I/O hints for the BIO layer */
 #define XBF_SYNCIO	(1 << 10)/* treat this buffer as synchronous I/O */
@@ -79,6 +80,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_DONE,		"DONE" }, \
 	{ XBF_DELWRI,		"DELWRI" }, \
 	{ XBF_STALE,		"STALE" }, \
+	{ XBF_BROKEN,		"BROKEN" }, \
 	{ XBF_SYNCIO,		"SYNCIO" }, \
 	{ XBF_FUA,		"FUA" }, \
 	{ XBF_FLUSH,		"FLUSH" }, \
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 270775f..e6f7d75 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -386,7 +386,9 @@ DEFINE_EVENT(xfs_buf_flags_class, name, \
 	TP_PROTO(struct xfs_buf *bp, unsigned flags, unsigned long caller_ip), \
 	TP_ARGS(bp, flags, caller_ip))
 DEFINE_BUF_FLAGS_EVENT(xfs_buf_find);
+DEFINE_BUF_FLAGS_EVENT(xfs_buf_find_broken);
 DEFINE_BUF_FLAGS_EVENT(xfs_buf_get);
+DEFINE_BUF_FLAGS_EVENT(xfs_buf_get_broken);
 DEFINE_BUF_FLAGS_EVENT(xfs_buf_read);
 
 TRACE_EVENT(xfs_buf_ioerror,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-27 10:17                 ` Dave Chinner
@ 2012-03-30 16:02                   ` Rafał Kupka
  2012-04-04  9:19                     ` Rafał Kupka
  0 siblings, 1 reply; 12+ messages in thread
From: Rafał Kupka @ 2012-03-30 16:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Tue, 2012-03-27 at 21:17 +1100, Dave Chinner wrote:
Hello,

My kernel config - http://pastebin.com/QvqFVWUw

Mount options (/proc/mounts):
/dev/vda2 / xfs rw,noatime,attr2,noquota 0 0

> Can you run your test with this patch while running a trace for the
> new xfs_buf_*broken events to verify you are hitting that code and
> that it doesn't trigger EIO problems anymore?

I've applied this patch to linus/v3.3 kernel. There is no more I/O
errors but xfsdump still hangs.

Trace recorded just after hung up:
# tracer: nop
#
# entries-in-buffer/entries-written: 78/78   #P:1
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
         xfsdump-1534  [000] .... 28544.651904: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651908: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651910: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651911: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651912: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651914: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651915: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651916: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651918: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651919: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651921: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651923: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651924: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651926: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651927: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651928: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651930: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651931: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651932: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651934: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651935: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651937: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651939: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651940: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651941: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651942: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651943: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651945: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651946: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651947: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651949: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651950: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651951: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651953: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651955: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651956: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651958: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651960: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651962: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651963: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651964: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651965: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.651966: xfs_buf_get_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 1 pincount 0 lock 0 flags READ|READ_AHEAD|ASYNC|TRYLOCK|DONT_BLOCK caller xfs_buf_read
         xfsdump-1534  [000] .... 28544.661785: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.661788: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.663330: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.667218: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.671297: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.675416: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.682188: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.683811: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.687418: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.693069: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.695414: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.699315: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.703417: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.707416: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.711416: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.715338: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.719413: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.723412: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.727410: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 28544.731408: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get

Later, this lines keep repeating, there is no more xfs_buf_get_broken: entries in trace file:

    xfsaild/vda2-156   [000] .... 29791.119322: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 29791.119332: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 29791.123296: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
    xfsaild/vda2-156   [000] .... 29791.123306: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|DONT_BLOCK caller xfs_buf_get
    xfsaild/vda2-156   [000] .... 29791.127341: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|TRYLOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1534  [000] .... 29791.127351: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get

Regards,
Rafal Kupka

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598
  2012-03-30 16:02                   ` Rafał Kupka
@ 2012-04-04  9:19                     ` Rafał Kupka
  0 siblings, 0 replies; 12+ messages in thread
From: Rafał Kupka @ 2012-04-04  9:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, 2012-03-30 at 18:02 +0200, Rafał Kupka wrote:
> On Tue, 2012-03-27 at 21:17 +1100, Dave Chinner wrote:
Hi,

I've collected more data.

> I've applied this patch to linus/v3.3 kernel. There is no more I/O
> errors but xfsdump still hangs.
[other details in my previous e-mail]

This new code is executed but broken struct xfs_buf is never reclaimed
from the cache:

         xfsdump-1455  [000] .... 40888.509350: xfs_perag_get: dev 254:2 agno 1 refcount 441 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.509355: xfs_perag_put: dev 254:2 agno 1 refcount 440 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.509356: xfs_buf_trylock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.509357: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1455  [000] .... 40888.509358: xfs_buf_unlock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller xfs_buf_relse
         xfsdump-1455  [000] .... 40888.509359: xfs_buf_rele: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513340: xfs_perag_get: dev 254:2 agno 1 refcount 441 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513345: xfs_perag_put: dev 254:2 agno 1 refcount 440 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513346: xfs_buf_trylock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513347: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1455  [000] .... 40888.513348: xfs_buf_unlock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller xfs_buf_relse
         xfsdump-1455  [000] .... 40888.513349: xfs_buf_rele: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517336: xfs_perag_get: dev 254:2 agno 1 refcount 441 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517341: xfs_perag_put: dev 254:2 agno 1 refcount 440 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517342: xfs_buf_trylock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517343: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1455  [000] .... 40888.517344: xfs_buf_unlock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller xfs_buf_relse
         xfsdump-1455  [000] .... 40888.517345: xfs_buf_rele: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find

As I lurked to XFS mailing list I see that better fix is to never put
half-initialized buf into rbtree.
http://article.gmane.org/gmane.comp.file-systems.xfs.general/44356

I've tried another simple patch. If these broken XBF_READ_AHEAD buffers
can be picked up by non read-ahead code (as long as it's in rbtree)
maybe it's not reasonable to treat them specially in
xfs_buf_allocate_memory().

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e24b320..d18c070 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -341,11 +341,13 @@ use_alloc_page:
 retry:
                page = alloc_page(gfp_mask);
                if (unlikely(page == NULL)) {
+                       /*
                        if (flags & XBF_READ_AHEAD) {
                                bp->b_page_count = i;
                                error = ENOMEM;
                                goto out_free_pages;
                        }
+                        */
 
                        /*
                         * This could deadlock.

Patch is probably broken by Evolution but you can see what I've done.

XFS patched this way survived my test for a long(er) time without hangs.

Retries counter in /proc/fs/xfs/stat raises slowly as expected.
"buf 13869637 753868 13946555 17174 8848657 92215 555 146160 90802"

There is one "XFS: possible memory allocation deadlock" message as you
can see in kernel logs below.

Yet there is another (really?) assertion:
Apr  4 02:58:22 goscie kernel: [    5.617480] sshd (892): /proc/892/oom_adj is deprecated, please use /proc/892/oom_score_adj instead.
Apr  4 04:35:14 goscie kernel: [ 5817.096240] XFS: possible memory allocation deadlock in xfs_buf_allocate_memory (mode:0x1200)
Apr  4 04:41:24 goscie kernel: [ 6187.984118] XFS: Assertion failed: ((iflags & BMV_IF_DELALLOC) != 0) || (map[i].br_startblock != DELAYSTARTBLOCK), file: fs/xfs/xfs_bmap.c, line: 5612
Apr  4 04:41:24 goscie kernel: [ 6187.986603] ------------[ cut here ]------------
Apr  4 04:41:24 goscie kernel: [ 6187.987608] kernel BUG at fs/xfs/xfs_message.c:101!
Apr  4 04:41:24 goscie kernel: [ 6187.988050] invalid opcode: 0000 [#1] SMP
Apr  4 04:41:24 goscie kernel: [ 6187.988050] CPU 0
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Modules linked in: loop tpm_tis psmouse i2c_piix4 tpm i2c_core evdev pcspkr tpm_bios serio_raw processor thermal_sys button xfs sg ata_generic ata_piix libata scsi_mod virtio_net [last unloaded: scsi_wait_scan]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Pid: 3019, comm: xfsdump Not tainted 3.3.0xfsdump1-00001-g26e1ccc-dirty #4 Bochs Bochs
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RIP: 0010:[<ffffffffa009043e>]  [<ffffffffa009043e>] assfail+0x1d/0x23 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RSP: 0000:ffff880011c71af8  EFLAGS: 00010296
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RAX: 00000000000000a0 RBX: ffff88000317b6a0 RCX: 0000000007b707b7
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RBP: ffff880011c71bb8 R08: 0000000000000000 R09: ffffffff8167d50a
Apr  4 04:41:24 goscie kernel: [ 6187.988050] R10: 0000000000000000 R11: ffff88001780d000 R12: 0000000000000000
Apr  4 04:41:24 goscie kernel: [ 6187.988050] R13: ffff8800153080f0 R14: ffff880003130800 R15: ffff8800031cd840
Apr  4 04:41:24 goscie kernel: [ 6187.988050] FS:  0000000000000000(0000) GS:ffff880017c00000(0063) knlGS:00000000f765c700
Apr  4 04:41:24 goscie kernel: [ 6187.988050] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Apr  4 04:41:24 goscie kernel: [ 6187.988050] CR2: 00000000f586541c CR3: 0000000012155000 CR4: 00000000000006f0
Apr  4 04:41:24 goscie kernel: [ 6187.988050] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr  4 04:41:24 goscie kernel: [ 6187.988050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Process xfsdump (pid: 3019, threadinfo ffff880011c70000, task ffff8800120dd750)
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Stack:
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  ffff880011c71bb8 ffffffffa00a6fdc ffff8800031cd988 00000005115338c0
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  0000001000000005 0000000000000008 ffff88000317b600 000000000000c000
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  ffff880015308000 0000000800000000 ffffffffa008b2d2 000000008115ecf6
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Call Trace:
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa00a6fdc>] ? xfs_getbmap+0x45b/0x5ae [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008b2d2>] ? xfs_open_by_handle+0x182/0x182 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008b3f0>] ? xfs_ioc_getbmapx+0x4e/0x85 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008c8f6>] ? xfs_file_ioctl+0x43f/0x751 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff810be2f2>] ? __lock_page_killable+0x5d/0x63
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81106cc7>] ? __dentry_open+0x1d6/0x2c8
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81105ca6>] ? fd_install+0x27/0x4f
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008b2ad>] ? xfs_open_by_handle+0x15d/0x182 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa00eb0f7>] ? xfs_file_compat_ioctl+0x355/0x486 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008a3b7>] ? xfs_iunlock+0xea/0x146 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa009509f>] ? xfs_free_eofblocks+0xa5/0x217 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81142365>] ? compat_sys_ioctl+0x1a5/0xe28
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81071d92>] ? timekeeping_get_ns+0xd/0x2a
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff810729fa>] ? getnstimeofday+0x4d/0x80
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81072a7a>] ? do_gettimeofday+0x10/0x2f
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81361a36>] ? cstar_dispatch+0x7/0x21
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Code: 89 64 24 28 e8 86 fc ff ff 48 83 c4 68 c3 48 89 f1 41 89 d0 48 83 ec 08 48 89 fa 48 c7 c6 7b 6f 0f a0 31 ff 31 c0 e8 8d ff ff ff <0f> 0b eb fe 90 90 45 31 c0 31 c0 48 83 7f 38 00 75 67 e9 bd 00
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RIP  [<ffffffffa009043e>] assfail+0x1d/0x23 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  RSP <ffff880011c71af8>
Apr  4 04:41:24 goscie kernel: [ 6188.005126] ---[ end trace 00a3f5114a46a7b5 ]---

Should I investigate further or it is related to my (broken) patch?

Best Regards,
Rafal Kupka

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-04-04  9:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-29 18:20 XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598 Rafał Kupka
2012-03-02 10:43 ` Christoph Hellwig
2012-03-03 11:37   ` Rafał Kupka
2012-03-03 11:44     ` Christoph Hellwig
2012-03-05 21:40       ` Rafał Kupka
2012-03-06 20:30         ` Dave Chinner
2012-03-08 19:10           ` Rafał Kupka
2012-03-26 16:05             ` Rafał Kupka
2012-03-27  3:28               ` Dave Chinner
2012-03-27 10:17                 ` Dave Chinner
2012-03-30 16:02                   ` Rafał Kupka
2012-04-04  9:19                     ` Rafał Kupka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.