All of lore.kernel.org
 help / color / mirror / Atom feed
* [ASSERT failure] transaction reservations changes bad?
@ 2013-03-12  6:20 Dave Chinner
  2013-03-12  6:25 ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-03-12  6:20 UTC (permalink / raw)
  To: xfs

Folks,

I just got this ASSERT failure running xfstests on a 3.1.8 xfsprogs
and a 3.9-rc1 kernel running test 297:

[ 2593.733005] XFS: Assertion failed: BTOBB(need_bytes) < log->l_logBBsize, file: fs/xfs/xfs_log.c, line: 1451
[ 2593.736988] ------------[ cut here ]------------
[ 2593.738177] kernel BUG at fs/xfs/xfs_message.c:100!
[ 2593.739192] invalid opcode: 0000 [#1] SMP
[ 2593.740081] Modules linked in:
[ 2593.740786] CPU 0
[ 2593.741221] Pid: 1385, comm: mkdir Not tainted 3.8.0-dgc+ #472 Bochs Bochs
[ 2593.742630] RIP: 0010:[<ffffffff8146a1f2>]  [<ffffffff8146a1f2>] assfail+0x22/0x30
[ 2593.744173] RSP: 0018:ffff880038c23c98  EFLAGS: 00010292
[ 2593.745234] RAX: 000000000000005f RBX: ffff88003cfec800 RCX: 0000000000005251
[ 2593.746635] RDX: 0000000000005151 RSI: 0000000000000096 RDI: 0000000000000246
[ 2593.748048] RBP: ffff880038c23c98 R08: 000000000000000a R09: 0000000000000889
[ 2593.749471] R10: 0000000000000000 R11: 0000000000000888 R12: 0000000000000dcb
[ 2593.750862] R13: ffff88003cfec800 R14: 0000000000000003 R15: 0000000000000001
[ 2593.752276] FS:  00007f0e64cb97a0(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 2593.753895] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2593.755039] CR2: 000000000066c004 CR3: 000000003cf26000 CR4: 00000000000006f0
[ 2593.756472] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2593.757877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2593.759285] Process mkdir (pid: 1385, threadinfo ffff880038c22000, task ffff88003cb483c0)
[ 2593.760914] Stack:
[ 2593.761345]  ffff880038c23cb8 ffffffff814c9a11 ffff88002fa170a8 ffff880007ac7420
[ 2593.762881]  ffff880038c23d18 ffffffff814cb608 0000000000000000 000111b838c23fd8
[ 2593.764429]  ffff880038c23fd8 ffff8800055733d0 ffff880005573000 0000000000000000
[ 2593.765964] Call Trace:
[ 2593.766449]  [<ffffffff814c9a11>] xlog_grant_push_ail+0x41/0xf0
[ 2593.767577]  [<ffffffff814cb608>] xfs_log_reserve+0xe8/0x270
[ 2593.768676]  [<ffffffff814c740e>] xfs_trans_reserve+0x2ee/0x300
[ 2593.769832]  [<ffffffff814c6ec7>] ? _xfs_trans_alloc+0x37/0xa0
[ 2593.770950]  [<ffffffff8146ed05>] xfs_create+0x185/0x660
[ 2593.771975]  [<ffffffff8118cadd>] ? lookup_real+0x1d/0x60
[ 2593.773028]  [<ffffffff814678c6>] xfs_vn_mknod+0xa6/0x1b0
[ 2593.774093]  [<ffffffff814679e6>] xfs_vn_mkdir+0x16/0x20
[ 2593.775112]  [<ffffffff81190ece>] vfs_mkdir+0x9e/0xf0
[ 2593.776092]  [<ffffffff81192862>] sys_mkdirat+0x62/0xe0
[ 2593.777111]  [<ffffffff811928f9>] sys_mkdir+0x19/0x20
[ 2593.778084]  [<ffffffff81be0f99>] system_call_fastpath+0x16/0x1b
[ 2593.779233] Code: e8 f4 fb ff ff 0f 0b 66 90 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 89 e5 48 89 fa 48 c7 c6 40 f9 f2 81 31 ff 31 c0 e8 1e fc ff ff <0f> 0b 66 66 66
[ 2593.784186] RIP  [<ffffffff8146a1f2>] assfail+0x22/0x30
[ 2593.785224]  RSP <ffff880038c23c98>
[ 2593.786366] ---[ end trace bf0cb9ec29b256a6 ]---

This implies that the permanent transaction reservation ended up
larger than the log itself:

$ sudo xfs_info /mnt/scratch/
[sudo] password for dave: 
meta-data=/dev/vdb               isize=256    agcount=16, agsize=1441792 blks
         =                       sectsz=512   attr=2
data     =                       bsize=512    blocks=23068672, imaxpct=25
         =                       sunit=512    swidth=6144 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=512    blocks=2560, version=2
         =                       sectsz=512   sunit=512 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Can someone please check that the before/after mkdir transaction
reservation sizes are unchanged for such a configuration?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-12  6:20 [ASSERT failure] transaction reservations changes bad? Dave Chinner
@ 2013-03-12  6:25 ` Dave Chinner
  2013-03-12  8:08   ` Jeff Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-03-12  6:25 UTC (permalink / raw)
  To: xfs

On Tue, Mar 12, 2013 at 05:20:02PM +1100, Dave Chinner wrote:
> Folks,
> 
> I just got this ASSERT failure running xfstests on a 3.1.8 xfsprogs
> and a 3.9-rc1 kernel running test 297:

FYI, it's 100% reproducable here with:

# sudo MKFS_OPTIONS="-b size=512" ./check 297

(reproduced on multiple VMs now with the same command line)

Cheers,

Dave.

> 
> [ 2593.733005] XFS: Assertion failed: BTOBB(need_bytes) < log->l_logBBsize, file: fs/xfs/xfs_log.c, line: 1451
> [ 2593.736988] ------------[ cut here ]------------
> [ 2593.738177] kernel BUG at fs/xfs/xfs_message.c:100!
> [ 2593.739192] invalid opcode: 0000 [#1] SMP
> [ 2593.740081] Modules linked in:
> [ 2593.740786] CPU 0
> [ 2593.741221] Pid: 1385, comm: mkdir Not tainted 3.8.0-dgc+ #472 Bochs Bochs
> [ 2593.742630] RIP: 0010:[<ffffffff8146a1f2>]  [<ffffffff8146a1f2>] assfail+0x22/0x30
> [ 2593.744173] RSP: 0018:ffff880038c23c98  EFLAGS: 00010292
> [ 2593.745234] RAX: 000000000000005f RBX: ffff88003cfec800 RCX: 0000000000005251
> [ 2593.746635] RDX: 0000000000005151 RSI: 0000000000000096 RDI: 0000000000000246
> [ 2593.748048] RBP: ffff880038c23c98 R08: 000000000000000a R09: 0000000000000889
> [ 2593.749471] R10: 0000000000000000 R11: 0000000000000888 R12: 0000000000000dcb
> [ 2593.750862] R13: ffff88003cfec800 R14: 0000000000000003 R15: 0000000000000001
> [ 2593.752276] FS:  00007f0e64cb97a0(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> [ 2593.753895] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2593.755039] CR2: 000000000066c004 CR3: 000000003cf26000 CR4: 00000000000006f0
> [ 2593.756472] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2593.757877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 2593.759285] Process mkdir (pid: 1385, threadinfo ffff880038c22000, task ffff88003cb483c0)
> [ 2593.760914] Stack:
> [ 2593.761345]  ffff880038c23cb8 ffffffff814c9a11 ffff88002fa170a8 ffff880007ac7420
> [ 2593.762881]  ffff880038c23d18 ffffffff814cb608 0000000000000000 000111b838c23fd8
> [ 2593.764429]  ffff880038c23fd8 ffff8800055733d0 ffff880005573000 0000000000000000
> [ 2593.765964] Call Trace:
> [ 2593.766449]  [<ffffffff814c9a11>] xlog_grant_push_ail+0x41/0xf0
> [ 2593.767577]  [<ffffffff814cb608>] xfs_log_reserve+0xe8/0x270
> [ 2593.768676]  [<ffffffff814c740e>] xfs_trans_reserve+0x2ee/0x300
> [ 2593.769832]  [<ffffffff814c6ec7>] ? _xfs_trans_alloc+0x37/0xa0
> [ 2593.770950]  [<ffffffff8146ed05>] xfs_create+0x185/0x660
> [ 2593.771975]  [<ffffffff8118cadd>] ? lookup_real+0x1d/0x60
> [ 2593.773028]  [<ffffffff814678c6>] xfs_vn_mknod+0xa6/0x1b0
> [ 2593.774093]  [<ffffffff814679e6>] xfs_vn_mkdir+0x16/0x20
> [ 2593.775112]  [<ffffffff81190ece>] vfs_mkdir+0x9e/0xf0
> [ 2593.776092]  [<ffffffff81192862>] sys_mkdirat+0x62/0xe0
> [ 2593.777111]  [<ffffffff811928f9>] sys_mkdir+0x19/0x20
> [ 2593.778084]  [<ffffffff81be0f99>] system_call_fastpath+0x16/0x1b
> [ 2593.779233] Code: e8 f4 fb ff ff 0f 0b 66 90 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 89 e5 48 89 fa 48 c7 c6 40 f9 f2 81 31 ff 31 c0 e8 1e fc ff ff <0f> 0b 66 66 66
> [ 2593.784186] RIP  [<ffffffff8146a1f2>] assfail+0x22/0x30
> [ 2593.785224]  RSP <ffff880038c23c98>
> [ 2593.786366] ---[ end trace bf0cb9ec29b256a6 ]---
> 
> This implies that the permanent transaction reservation ended up
> larger than the log itself:
> 
> $ sudo xfs_info /mnt/scratch/
> [sudo] password for dave: 
> meta-data=/dev/vdb               isize=256    agcount=16, agsize=1441792 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=512    blocks=23068672, imaxpct=25
>          =                       sunit=512    swidth=6144 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=512    blocks=2560, version=2
>          =                       sectsz=512   sunit=512 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Can someone please check that the before/after mkdir transaction
> reservation sizes are unchanged for such a configuration?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 
> 

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-12  6:25 ` Dave Chinner
@ 2013-03-12  8:08   ` Jeff Liu
  2013-03-12 10:31     ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Liu @ 2013-03-12  8:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On 03/12/2013 02:25 PM, Dave Chinner wrote:
> On Tue, Mar 12, 2013 at 05:20:02PM +1100, Dave Chinner wrote:
>> Folks,
>>
>> I just got this ASSERT failure running xfstests on a 3.1.8 xfsprogs
>> and a 3.9-rc1 kernel running test 297:
> 
> FYI, it's 100% reproducable here with:
> 
> # sudo MKFS_OPTIONS="-b size=512" ./check 297
> 
> (reproduced on multiple VMs now with the same command line)
> 
> Cheers,
> 
> Dave.
> 
>>
>> [ 2593.733005] XFS: Assertion failed: BTOBB(need_bytes) < log->l_logBBsize, file: fs/xfs/xfs_log.c, line: 1451
>> [ 2593.736988] ------------[ cut here ]------------
>> [ 2593.738177] kernel BUG at fs/xfs/xfs_message.c:100!
>> [ 2593.739192] invalid opcode: 0000 [#1] SMP
>> [ 2593.740081] Modules linked in:
>> [ 2593.740786] CPU 0
>> [ 2593.741221] Pid: 1385, comm: mkdir Not tainted 3.8.0-dgc+ #472 Bochs Bochs
>> [ 2593.742630] RIP: 0010:[<ffffffff8146a1f2>]  [<ffffffff8146a1f2>] assfail+0x22/0x30
>> [ 2593.744173] RSP: 0018:ffff880038c23c98  EFLAGS: 00010292
>> [ 2593.745234] RAX: 000000000000005f RBX: ffff88003cfec800 RCX: 0000000000005251
>> [ 2593.746635] RDX: 0000000000005151 RSI: 0000000000000096 RDI: 0000000000000246
>> [ 2593.748048] RBP: ffff880038c23c98 R08: 000000000000000a R09: 0000000000000889
>> [ 2593.749471] R10: 0000000000000000 R11: 0000000000000888 R12: 0000000000000dcb
>> [ 2593.750862] R13: ffff88003cfec800 R14: 0000000000000003 R15: 0000000000000001
>> [ 2593.752276] FS:  00007f0e64cb97a0(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
>> [ 2593.753895] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 2593.755039] CR2: 000000000066c004 CR3: 000000003cf26000 CR4: 00000000000006f0
>> [ 2593.756472] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 2593.757877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [ 2593.759285] Process mkdir (pid: 1385, threadinfo ffff880038c22000, task ffff88003cb483c0)
>> [ 2593.760914] Stack:
>> [ 2593.761345]  ffff880038c23cb8 ffffffff814c9a11 ffff88002fa170a8 ffff880007ac7420
>> [ 2593.762881]  ffff880038c23d18 ffffffff814cb608 0000000000000000 000111b838c23fd8
>> [ 2593.764429]  ffff880038c23fd8 ffff8800055733d0 ffff880005573000 0000000000000000
>> [ 2593.765964] Call Trace:
>> [ 2593.766449]  [<ffffffff814c9a11>] xlog_grant_push_ail+0x41/0xf0
>> [ 2593.767577]  [<ffffffff814cb608>] xfs_log_reserve+0xe8/0x270
>> [ 2593.768676]  [<ffffffff814c740e>] xfs_trans_reserve+0x2ee/0x300
>> [ 2593.769832]  [<ffffffff814c6ec7>] ? _xfs_trans_alloc+0x37/0xa0
>> [ 2593.770950]  [<ffffffff8146ed05>] xfs_create+0x185/0x660
>> [ 2593.771975]  [<ffffffff8118cadd>] ? lookup_real+0x1d/0x60
>> [ 2593.773028]  [<ffffffff814678c6>] xfs_vn_mknod+0xa6/0x1b0
>> [ 2593.774093]  [<ffffffff814679e6>] xfs_vn_mkdir+0x16/0x20
>> [ 2593.775112]  [<ffffffff81190ece>] vfs_mkdir+0x9e/0xf0
>> [ 2593.776092]  [<ffffffff81192862>] sys_mkdirat+0x62/0xe0
>> [ 2593.777111]  [<ffffffff811928f9>] sys_mkdir+0x19/0x20
>> [ 2593.778084]  [<ffffffff81be0f99>] system_call_fastpath+0x16/0x1b
>> [ 2593.779233] Code: e8 f4 fb ff ff 0f 0b 66 90 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 89 e5 48 89 fa 48 c7 c6 40 f9 f2 81 31 ff 31 c0 e8 1e fc ff ff <0f> 0b 66 66 66
>> [ 2593.784186] RIP  [<ffffffff8146a1f2>] assfail+0x22/0x30
>> [ 2593.785224]  RSP <ffff880038c23c98>
>> [ 2593.786366] ---[ end trace bf0cb9ec29b256a6 ]---
>>
>> This implies that the permanent transaction reservation ended up
>> larger than the log itself:
>>
>> $ sudo xfs_info /mnt/scratch/
>> [sudo] password for dave: 
>> meta-data=/dev/vdb               isize=256    agcount=16, agsize=1441792 blks
>>          =                       sectsz=512   attr=2
>> data     =                       bsize=512    blocks=23068672, imaxpct=25
>>          =                       sunit=512    swidth=6144 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>> log      =internal               bsize=512    blocks=2560, version=2
>>          =                       sectsz=512   sunit=512 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>
>> Can someone please check that the before/after mkdir transaction
>> reservation sizes are unchanged for such a configuration?
I just did a quick verification. 

# mkfs.xfs -V
mkfs.xfs version 3.1.8

# uname -a
Linux koala 3.9.0-rc1 #80 SMP Tue Mar 12 15:06:39 CST 2013 x86_64 x86_64 x86_64 GNU/Linux

# mkfs.xfs -f -b size=512 /dev/sda6
meta-data=/dev/sda6              isize=256    agcount=4, agsize=5242880 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=512    blocks=20971520, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=512    blocks=20480, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


The reservation size does not changed, both are 70072 bytes:

[  230.905092] xfs_calc_mkdir_reservation: res=70072 bytes.

However, I can always reproducing this issue with '"MKFS_OPTIONS=-b size=512" ./check 297' as well.


Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-12  8:08   ` Jeff Liu
@ 2013-03-12 10:31     ` Dave Chinner
  2013-03-12 11:05       ` Jeff Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-03-12 10:31 UTC (permalink / raw)
  To: Jeff Liu; +Cc: xfs

On Tue, Mar 12, 2013 at 04:08:20PM +0800, Jeff Liu wrote:
> Hi Dave,
> 
> On 03/12/2013 02:25 PM, Dave Chinner wrote:
> > On Tue, Mar 12, 2013 at 05:20:02PM +1100, Dave Chinner wrote:
> >> Folks,
> >>
> >> I just got this ASSERT failure running xfstests on a 3.1.8 xfsprogs
> >> and a 3.9-rc1 kernel running test 297:
> > 
> > FYI, it's 100% reproducable here with:
> > 
> > # sudo MKFS_OPTIONS="-b size=512" ./check 297
> > 
> > (reproduced on multiple VMs now with the same command line)
....
> >> This implies that the permanent transaction reservation ended up
> >> larger than the log itself:
> >>
> >> $ sudo xfs_info /mnt/scratch/
> >> [sudo] password for dave: 
> >> meta-data=/dev/vdb               isize=256    agcount=16, agsize=1441792 blks
> >>          =                       sectsz=512   attr=2
> >> data     =                       bsize=512    blocks=23068672, imaxpct=25
> >>          =                       sunit=512    swidth=6144 blks
> >> naming   =version 2              bsize=4096   ascii-ci=0
> >> log      =internal               bsize=512    blocks=2560, version=2
> >>          =                       sectsz=512   sunit=512 blks, lazy-count=1
> >> realtime =none                   extsz=4096   blocks=0, rtextents=0
> >>
> >> Can someone please check that the before/after mkdir transaction
> >> reservation sizes are unchanged for such a configuration?
> I just did a quick verification. 
> 
> # mkfs.xfs -V
> mkfs.xfs version 3.1.8
> 
> # uname -a
> Linux koala 3.9.0-rc1 #80 SMP Tue Mar 12 15:06:39 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
> 
> # mkfs.xfs -f -b size=512 /dev/sda6
> meta-data=/dev/sda6              isize=256    agcount=4, agsize=5242880 blks
>          =                       sectsz=512   attr=2, projid32bit=0
> data     =                       bsize=512    blocks=20971520, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=512    blocks=20480, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0

That's a different mkfs.xfs config to what test 297 is using.
Different log size, different AG count, no log stripe unit, etc.
297 is using:

scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b <dev>

And when I add the extra MKFS_OPTIONS in it actually is:

# mkfs.xfs -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b <dev>

> The reservation size does not changed, both are 70072 bytes:
> 
> [  230.905092] xfs_calc_mkdir_reservation: res=70072 bytes.

And it's not just the calculation that I'm worried about here - it's
the actual reservation that ends up in the ticket that matters as
that is fed into the code that has triggered the assert. The value
in the ticket takes into account log stripe units and other
roundings, so it's typically much larger than just the reservation
calculation itself...

> However, I can always reproducing this issue with
> '"MKFS_OPTIONS=-b size=512" ./check 297' as well.

Can you check that it also fails on kernels prior to the reservation
changes? That will rule out it being a recent regression...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-12 10:31     ` Dave Chinner
@ 2013-03-12 11:05       ` Jeff Liu
  2013-03-12 11:56         ` Jeff Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Liu @ 2013-03-12 11:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 03/12/2013 06:31 PM, Dave Chinner wrote:
> On Tue, Mar 12, 2013 at 04:08:20PM +0800, Jeff Liu wrote:
>> Hi Dave,
>>
>> On 03/12/2013 02:25 PM, Dave Chinner wrote:
>>> On Tue, Mar 12, 2013 at 05:20:02PM +1100, Dave Chinner wrote:
>>>> Folks,
>>>>
>>>> I just got this ASSERT failure running xfstests on a 3.1.8 xfsprogs
>>>> and a 3.9-rc1 kernel running test 297:
>>>
>>> FYI, it's 100% reproducable here with:
>>>
>>> # sudo MKFS_OPTIONS="-b size=512" ./check 297
>>>
>>> (reproduced on multiple VMs now with the same command line)
> ....
>>>> This implies that the permanent transaction reservation ended up
>>>> larger than the log itself:
>>>>
>>>> $ sudo xfs_info /mnt/scratch/
>>>> [sudo] password for dave: 
>>>> meta-data=/dev/vdb               isize=256    agcount=16, agsize=1441792 blks
>>>>          =                       sectsz=512   attr=2
>>>> data     =                       bsize=512    blocks=23068672, imaxpct=25
>>>>          =                       sunit=512    swidth=6144 blks
>>>> naming   =version 2              bsize=4096   ascii-ci=0
>>>> log      =internal               bsize=512    blocks=2560, version=2
>>>>          =                       sectsz=512   sunit=512 blks, lazy-count=1
>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>
>>>> Can someone please check that the before/after mkdir transaction
>>>> reservation sizes are unchanged for such a configuration?
>> I just did a quick verification. 
>>
>> # mkfs.xfs -V
>> mkfs.xfs version 3.1.8
>>
>> # uname -a
>> Linux koala 3.9.0-rc1 #80 SMP Tue Mar 12 15:06:39 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
>>
>> # mkfs.xfs -f -b size=512 /dev/sda6
>> meta-data=/dev/sda6              isize=256    agcount=4, agsize=5242880 blks
>>          =                       sectsz=512   attr=2, projid32bit=0
>> data     =                       bsize=512    blocks=20971520, imaxpct=25
>>          =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>> log      =internal log           bsize=512    blocks=20480, version=2
>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> That's a different mkfs.xfs config to what test 297 is using.
> Different log size, different AG count, no log stripe unit, etc.
> 297 is using:
> 
> scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b <dev>
> 
> And when I add the extra MKFS_OPTIONS in it actually is:
> 
> # mkfs.xfs -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b <dev>
> 
>> The reservation size does not changed, both are 70072 bytes:
>>
>> [  230.905092] xfs_calc_mkdir_reservation: res=70072 bytes.
> 
> And it's not just the calculation that I'm worried about here - it's
> the actual reservation that ends up in the ticket that matters as
> that is fed into the code that has triggered the assert. The value
> in the ticket takes into account log stripe units and other
> roundings, so it's typically much larger than just the reservation
> calculation itself...
> 
>> However, I can always reproducing this issue with
>> '"MKFS_OPTIONS=-b size=512" ./check 297' as well.
> 
> Can you check that it also fails on kernels prior to the reservation
> changes? That will rule out it being a recent regression...
Sure, verified against a new built 3.8.0 kernel, so this should be a
regression issue.
$ uname -a
Linux koala 3.8.0 #81 SMP Tue Mar 12 18:03:44 CST 2013 x86_64 x86_64
x86_64 GNU/Linux

commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Feb 18 15:58:34 2013 -0800

Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-12 11:05       ` Jeff Liu
@ 2013-03-12 11:56         ` Jeff Liu
  2013-03-12 12:05           ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Liu @ 2013-03-12 11:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 03/12/2013 07:05 PM, Jeff Liu wrote:
> On 03/12/2013 06:31 PM, Dave Chinner wrote:
>> On Tue, Mar 12, 2013 at 04:08:20PM +0800, Jeff Liu wrote:
>>> Hi Dave,
>>>
>>> On 03/12/2013 02:25 PM, Dave Chinner wrote:
>>>> On Tue, Mar 12, 2013 at 05:20:02PM +1100, Dave Chinner wrote:
>>>>> Folks,
>>>>>
>>>>> I just got this ASSERT failure running xfstests on a 3.1.8 xfsprogs
>>>>> and a 3.9-rc1 kernel running test 297:
>>>>
>>>> FYI, it's 100% reproducable here with:
>>>>
>>>> # sudo MKFS_OPTIONS="-b size=512" ./check 297
>>>>
>>>> (reproduced on multiple VMs now with the same command line)
>> ....
>>>>> This implies that the permanent transaction reservation ended up
>>>>> larger than the log itself:
>>>>>
>>>>> $ sudo xfs_info /mnt/scratch/
>>>>> [sudo] password for dave: 
>>>>> meta-data=/dev/vdb               isize=256    agcount=16, agsize=1441792 blks
>>>>>          =                       sectsz=512   attr=2
>>>>> data     =                       bsize=512    blocks=23068672, imaxpct=25
>>>>>          =                       sunit=512    swidth=6144 blks
>>>>> naming   =version 2              bsize=4096   ascii-ci=0
>>>>> log      =internal               bsize=512    blocks=2560, version=2
>>>>>          =                       sectsz=512   sunit=512 blks, lazy-count=1
>>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>>
>>>>> Can someone please check that the before/after mkdir transaction
>>>>> reservation sizes are unchanged for such a configuration?
>>> I just did a quick verification. 
>>>
>>> # mkfs.xfs -V
>>> mkfs.xfs version 3.1.8
>>>
>>> # uname -a
>>> Linux koala 3.9.0-rc1 #80 SMP Tue Mar 12 15:06:39 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> # mkfs.xfs -f -b size=512 /dev/sda6
>>> meta-data=/dev/sda6              isize=256    agcount=4, agsize=5242880 blks
>>>          =                       sectsz=512   attr=2, projid32bit=0
>>> data     =                       bsize=512    blocks=20971520, imaxpct=25
>>>          =                       sunit=0      swidth=0 blks
>>> naming   =version 2              bsize=4096   ascii-ci=0
>>> log      =internal log           bsize=512    blocks=20480, version=2
>>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>
>> That's a different mkfs.xfs config to what test 297 is using.
>> Different log size, different AG count, no log stripe unit, etc.
>> 297 is using:
>>
>> scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b <dev>
>>
>> And when I add the extra MKFS_OPTIONS in it actually is:
>>
>> # mkfs.xfs -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b <dev>
>>
>>> The reservation size does not changed, both are 70072 bytes:
>>>
>>> [  230.905092] xfs_calc_mkdir_reservation: res=70072 bytes.
>>
>> And it's not just the calculation that I'm worried about here - it's
>> the actual reservation that ends up in the ticket that matters as
>> that is fed into the code that has triggered the assert. The value
>> in the ticket takes into account log stripe units and other
>> roundings, so it's typically much larger than just the reservation
>> calculation itself...
>>
>>> However, I can always reproducing this issue with
>>> '"MKFS_OPTIONS=-b size=512" ./check 297' as well.
>>
>> Can you check that it also fails on kernels prior to the reservation
>> changes? That will rule out it being a recent regression...
> Sure, verified against a new built 3.8.0 kernel, so this should be a
> regression issue.
> $ uname -a
> Linux koala 3.8.0 #81 SMP Tue Mar 12 18:03:44 CST 2013 x86_64 x86_64
> x86_64 GNU/Linux
> 
> commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Mon Feb 18 15:58:34 2013 -0800
More info, 3.7.0 is the oldest kernel on my environment, I ran into the
same problem.

Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-12 11:56         ` Jeff Liu
@ 2013-03-12 12:05           ` Dave Chinner
  2013-03-26 10:14             ` Jeff Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-03-12 12:05 UTC (permalink / raw)
  To: Jeff Liu; +Cc: xfs

On Tue, Mar 12, 2013 at 07:56:35PM +0800, Jeff Liu wrote:
> More info, 3.7.0 is the oldest kernel on my environment, I ran into the
> same problem.

Thanks for following up so quickly, Jeff. So the problem is that a
new test is tripping over a bug that has been around for a while,
not that it is a new regression.

OK, so I'll expunge that from my testing for the moment as I don't
ahve time to dig in and find out what the cause is right now. If
anyone else wants to.... :)

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-12 12:05           ` Dave Chinner
@ 2013-03-26 10:14             ` Jeff Liu
  2013-03-26 16:44               ` Mark Tinguely
  2013-03-27  2:03               ` Dave Chinner
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff Liu @ 2013-03-26 10:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 03/12/2013 08:05 PM, Dave Chinner wrote:
> On Tue, Mar 12, 2013 at 07:56:35PM +0800, Jeff Liu wrote:
>> More info, 3.7.0 is the oldest kernel on my environment, I ran into the
>> same problem.
> 
> Thanks for following up so quickly, Jeff. So the problem is that a
> new test is tripping over a bug that has been around for a while,
> not that it is a new regression.
> 
> OK, so I'll expunge that from my testing for the moment as I don't
> ahve time to dig in and find out what the cause is right now. If
> anyone else wants to.... :)

I did some further tests to nail down this issue, just posting the analysis result here,
it might be of some use when we revising it again.

The disk is formated with Dave's previous comments, i.e.
mkfs.xfs -f -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b /dev/xxx

First of all, looks this bug stayed in hiding for years since I can reproduce it between upstream
3.0 to 3.9.0-rc3, the oldest kernel I have tried is 2.6.39 which has the same problem.

The following tests are running against kernel with CONFIG_XFS_DEBUG is enabled.

1) Create directory will cause kernel panic, observed same info to Dave's previous posts.

[ 3846.705416] XFS: Assertion failed: BTOBB(need_bytes) < log->l_logBBsize, file: fs/xfs/xfs_log.c, line:
[ 3846.705444] ------------[ cut here ]------------
[ 3846.705447] kernel BUG at fs/xfs/xfs_message.c:100!
[ 3846.705450] invalid opcode: 0000 [#1] SMP [ 3846.705454] Modules linked in: xfs(O) libcrc32c ........
[ 3846.705487] CPU 0 [ 3846.705493] Pid: 2188, comm: mkdir Tainted: G           O 3.9.0-rc3 #1 innotek GmbH VirtualBox
[ 3846.705496] RIP: 0010:[<ffffffffa021d632>]  [<ffffffffa021d632>] assfail+0x22/0x30 [xfs]
[ 3846.705534] RSP: 0018:ffff880079d61c78  EFLAGS: 00010296
[ 3846.705537] RAX: 000000000000005f RBX: ffff880079c1b800 RCX: 0000000000000006
[ 3846.705539] RDX: 0000000000000007 RSI: 0000000000000007 RDI: 0000000000000246
[ 3846.705542] RBP: ffff880079d61c78 R08: 000000000000000a R09: 0000000000000000
[ 3846.705544] R10: 0000000000000245 R11: 0000000000000244 R12: ffff880079c1b9d8
[ 3846.705547] R13: 0000000000000dcb R14: 0000000000000003 R15: 00000000000111b8
[ 3846.705550] FS:  00007f4da7f317c0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 3846.705553] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3846.705556] CR2: 00000000004044d0 CR3: 00000000793fd000 CR4: 00000000000006f0
[ 3846.705570] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3846.705573] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3846.705576] Process mkdir (pid: 2188, threadinfo ffff880079d60000, task ffff880030270000)
[ 3846.705578] Stack:
[ 3846.705581]  ffff880079d61cb8 ffffffffa0275ecb ffffffffa0225cd7 ffffffffa0298210
[ 3846.705586]  0000000000000007 ffff880079bdb730 ffff880079c1b800 0000000000000001
[ 3846.705591]  ffff880079d61d18 ffffffffa0277f48 ffff880079d61d18 ffff880079dd3040
[ 3846.705595] Call Trace:
[ 3846.705634]  [<ffffffffa0275ecb>] xlog_grant_push_ail+0x8b/0x1c0 [xfs]
[ 3846.705663]  [<ffffffffa0225cd7>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
[ 3846.705698]  [<ffffffffa0277f48>] xfs_log_reserve+0x208/0x3e0 [xfs]
[ 3846.705731]  [<ffffffffa0273747>] xfs_trans_reserve+0x327/0x360 [xfs]
[ 3846.705762]  [<ffffffffa02731c7>] ? _xfs_trans_alloc+0x37/0xa0 [xfs]
[ 3846.705790]  [<ffffffffa0222c34>] xfs_create+0x184/0x6f0 [xfs]
[ 3846.705816]  [<ffffffffa021ae4b>] xfs_vn_mknod+0xcb/0x1e0 [xfs]
[ 3846.705841]  [<ffffffffa021af76>] xfs_vn_mkdir+0x16/0x20 [xfs]
[ 3846.705850]  [<ffffffff811a3f77>] vfs_mkdir+0xa7/0x120
[ 3846.705856]  [<ffffffff811a497b>] sys_mkdirat+0xab/0xf0
[ 3846.705861]  [<ffffffff811a49d9>] sys_mkdir+0x19/0x20
[ 3846.705869]  [<ffffffff816a5dd9>] system_call_fastpath+0x16/0x1b
[ 3846.705871] Code: e8 f4 fb ff ff 0f 0b 66 90 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 c7 c6 0
[ 3846.705918] RIP  [<ffffffffa021d632>] assfail+0x22/0x30 [xfs]
[ 3846.705943]  RSP <ffff880079d61c78>
[ 3846.705948] ---[ end trace 7b3a9ee81a9920f1 ]---


2) For a refreshed formated XFS partition, create a regular file via `touch /xfs/test_file`, the process
will hang and the following syslog were observed:

Mar 26 14:15:35 linux64bit kernel: [  841.804353] INFO: task kworker/0:2:2118 blocked for more than 120 seconds.
Mar 26 14:15:35 linux64bit kernel: [  841.804361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 26 14:15:35 linux64bit kernel: [  841.804364] kworker/0:2     D ffffffff8180cfa0     0  2118      2 0x00000000
Mar 26 14:15:35 linux64bit kernel: [  841.804372]  ffff880079bfdc18 0000000000000046 ffff880076bdc5c0 0000000000014e40
Mar 26 14:15:35 linux64bit kernel: [  841.804378]  ffff880062b31720 ffff880079bfdfd8 ffff880079bfdfd8 ffff880079bfdfd8
Mar 26 14:15:35 linux64bit kernel: [  841.804382]  ffffffff81c13440 ffff880062b31720 ffff880079bfdc38 ffff880079165cf0
Mar 26 14:15:35 linux64bit kernel: [  841.804387] Call Trace:
Mar 26 14:15:35 linux64bit kernel: [  841.804400]  [<ffffffff8169c399>] schedule+0x29/0x70
Mar 26 14:15:35 linux64bit kernel: [  841.804444]  [<ffffffffa037e15a>] xlog_grant_head_wait+0x15a/0x2b0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804466]  [<ffffffff81692bf3>] ? printk+0x61/0x63
Mar 26 14:15:35 linux64bit kernel: [  841.804500]  [<ffffffffa037e414>] xlog_grant_head_check+0x164/0x1d0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804530]  [<ffffffffa037ff8e>] xfs_log_reserve+0x24e/0x3e0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804561]  [<ffffffffa037b747>] xfs_trans_reserve+0x327/0x360 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804589]  [<ffffffffa032dd84>] ? kmem_zone_zalloc+0x34/0x50 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804613]  [<ffffffffa031bbb0>] xfs_fs_log_dummy+0x80/0x130 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804643]  [<ffffffffa0381524>] xfs_log_worker+0x54/0x120 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804650]  [<ffffffff8107805b>] process_one_work+0x16b/0x400
Mar 26 14:15:35 linux64bit kernel: [  841.804655]  [<ffffffff81078ce9>] worker_thread+0x119/0x360
Mar 26 14:15:35 linux64bit kernel: [  841.804660]  [<ffffffff81078bd0>] ? manage_workers+0x350/0x350
Mar 26 14:15:35 linux64bit kernel: [  841.804665]  [<ffffffff8107e200>] kthread+0xc0/0xd0
Mar 26 14:15:35 linux64bit kernel: [  841.804669]  [<ffffffff8107e140>] ? flush_kthread_worker+0xb0/0xb0
Mar 26 14:15:35 linux64bit kernel: [  841.804675]  [<ffffffff816a5d2c>] ret_from_fork+0x7c/0xb0
Mar 26 14:15:35 linux64bit kernel: [  841.804679]  [<ffffffff8107e140>] ? flush_kthread_worker+0xb0/0xb0
Mar 26 14:15:35 linux64bit kernel: [  841.804685] INFO: task touch:2323 blocked for more than 120 seconds.
Mar 26 14:15:35 linux64bit kernel: [  841.804687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 26 14:15:35 linux64bit kernel: [  841.804689] touch           D ffffffff8180cfa0     0  2323   2128 0x00000000
Mar 26 14:15:35 linux64bit kernel: [  841.804694]  ffff8800791419b8 0000000000000086 ffff880076bdc5c0 0000000000014e40
Mar 26 14:15:35 linux64bit kernel: [  841.804699]  ffff88003032ae40 ffff880079141fd8 ffff880079141fd8 ffff880079141fd8
Mar 26 14:15:35 linux64bit kernel: [  841.804703]  ffffffff81c13440 ffff88003032ae40 ffff8800791419d8 ffff880079993000
Mar 26 14:15:35 linux64bit kernel: [  841.804707] Call Trace:
Mar 26 14:15:35 linux64bit kernel: [  841.804713]  [<ffffffff8169c399>] schedule+0x29/0x70
Mar 26 14:15:35 linux64bit kernel: [  841.804740]  [<ffffffffa037e15a>] xlog_grant_head_wait+0x15a/0x2b0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804745]  [<ffffffff81692bf3>] ? printk+0x61/0x63
Mar 26 14:15:35 linux64bit kernel: [  841.804772]  [<ffffffffa037e461>] xlog_grant_head_check+0x1b1/0x1d0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804798]  [<ffffffffa037ff8e>] xfs_log_reserve+0x24e/0x3e0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804826]  [<ffffffffa037b747>] xfs_trans_reserve+0x327/0x360 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804854]  [<ffffffffa032ac34>] xfs_create+0x184/0x6f0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804880]  [<ffffffffa0322e4b>] xfs_vn_mknod+0xcb/0x1e0 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804904]  [<ffffffffa0322f93>] xfs_vn_create+0x13/0x20 [xfs]
Mar 26 14:15:35 linux64bit kernel: [  841.804913]  [<ffffffff811a29c5>] vfs_create+0xb5/0x130
Mar 26 14:15:35 linux64bit kernel: [  841.804918]  [<ffffffff811a37ca>] do_last+0xd8a/0xec0
Mar 26 14:15:35 linux64bit kernel: [  841.804926]  [<ffffffff812fe01c>] ? apparmor_file_alloc_security+0x2c/0x60
Mar 26 14:15:35 linux64bit kernel: [  841.804932]  [<ffffffff811a39b8>] path_openat+0xb8/0x4c0
Mar 26 14:15:35 linux64bit kernel: [  841.804937]  [<ffffffff81158741>] ? handle_mm_fault+0x271/0x390
Mar 26 14:15:35 linux64bit kernel: [  841.804943]  [<ffffffff811a4552>] do_filp_open+0x42/0xa0
Mar 26 14:15:35 linux64bit kernel: [  841.804949]  [<ffffffff811b1675>] ? __alloc_fd+0xe5/0x170
Mar 26 14:15:35 linux64bit kernel: [  841.804955]  [<ffffffff811941de>] do_sys_open+0xfe/0x1e0
Mar 26 14:15:35 linux64bit kernel: [  841.804960]  [<ffffffff811942e1>] sys_open+0x21/0x30
Mar 26 14:15:35 linux64bit kernel: [  841.804966]  [<ffffffff816a5dd9>] system_call_fastpath+0x16/0x1b

It most likely a deadlock or dead loop was occurred according to above tracing info.
Then I did some investigation to check what had happened in transaction log space reservations
between mount a refreshed formated XFS partition and create a new file by adding some debugging code.

As per the current disk formating, the totally log space is 2560 * 1024 = 2621440 bytes.
By default, there are 1310720 bytes free space on both l_reserve_head.grant and l_write_head.grant separately.

Firstly, I observed a non-permanent CIL ticket allocation at xlog_cil_init_post_recovery() for xfs_log_mount(),
and it has no affect to our situation.

Secondly, xfs_vn_update_time() is invoked per mount the partition, and hence the XFS_TRANS_FSYNC_TS transaction
was happened, it need 532916 bytes for the space reservation, then give back 208 bytes at xlog_ungrant_log_space()
after xfs_trans_commit().  Therefore, it only reserved (532916 - 208) bytes in this transaction, now there has
1310720 - (532916 - 208) = 778012 bytes left at l_reserve_head.grant.

Finally, creating a file will make XFS_TRANS_CREATE happened, and it need to reserve 1205208 bytes, however, there
only has 778012 bytes can be reserved, so that the process hang at the loop of xlog_grant_head_wait() and then call
schedule() to let other processes wake up and running.

In this situation, xfs_log_worker() will up and running, as this is the 1st time it was issued, the call chains was
shown as following:
........
......
xlog_log_force()
  __xfs_log_force() -> xfs_sync()
    xlog_cil_force_lsn()
      xlog_cil_push()
        xlog_log_done()
          xlog_ungrant_log_space()
	    xlog_log_space_wake()
	      xlog_grant_head_wake()...

Up to now, there should left 1306396 bytes in l_reserve_head.grant since I observed this CHECKPOINT transaction
gave back 528384 bytes at xlog_ungrant_log_space(), so that 528384 + 778012 = 1306396 bytes, but that's is not
exactly, because it reserved 257820 bytes at xfs_sync(), so the left space is 1306396 - 257820 = 1048576 bytes.

Once xfs_log_worker() run completed, the previous XFS_TRANS_CREATE is revived at the point after schedule(),
but the free space(1048576) is less than the required(1205208), as a result, it got blocked again.

Periodicity, xfs_log_worker() comes up, but this time it gone through XFS_TRANS_DUMMY instead.  In this case, the
current free space is enough for this reservations because it requires 533172 bytes only, however, it also hang at
the point of xlog_grant_head_wait() before schedule() because there only has one log transaction related process
belong to the file creation would be scheduled, but this process will be blocked again due to the lack of log space.

The question is that even if XFS_TRANS_DUMMY was not got involved, XFS_TRANS_CREATE would still be blocked as the log
space reservation can not be satisfied.  IMHO, looks the major cause is related to the 'sunit' parameter, since it would
affect the log space unit calculations by '2*log->l_mp->m_sb.sb_logsunit' at xlog_ticket_alloc().  However, we don't include
this factor into consideration at mkfs or mount stage, should we take it into account?


Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-26 10:14             ` Jeff Liu
@ 2013-03-26 16:44               ` Mark Tinguely
  2013-03-28 12:58                 ` Jeff Liu
  2013-03-27  2:03               ` Dave Chinner
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Tinguely @ 2013-03-26 16:44 UTC (permalink / raw)
  To: Jeff Liu; +Cc: xfs

On 03/26/13 05:14, Jeff Liu wrote:
> On 03/12/2013 08:05 PM, Dave Chinner wrote:
>> On Tue, Mar 12, 2013 at 07:56:35PM +0800, Jeff Liu wrote:
>>> More info, 3.7.0 is the oldest kernel on my environment, I ran into the
>>> same problem.
>>
>> Thanks for following up so quickly, Jeff. So the problem is that a
>> new test is tripping over a bug that has been around for a while,
>> not that it is a new regression.
>>
>> OK, so I'll expunge that from my testing for the moment as I don't
>> ahve time to dig in and find out what the cause is right now. If
>> anyone else wants to.... :)
>
> I did some further tests to nail down this issue, just posting the analysis result here,
> it might be of some use when we revising it again.
>
> The disk is formated with Dave's previous comments, i.e.
> mkfs.xfs -f -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b /dev/xxx
>
> First of all, looks this bug stayed in hiding for years since I can reproduce it between upstream
> 3.0 to 3.9.0-rc3, the oldest kernel I have tried is 2.6.39 which has the same problem.
>
> The following tests are running against kernel with CONFIG_XFS_DEBUG is enabled.
>
> 1) Create directory will cause kernel panic, observed same info to Dave's previous posts.
>
> [ 3846.705416] XFS: Assertion failed: BTOBB(need_bytes)<  log->l_logBBsize, file: fs/xfs/xfs_log.c, line:

Thanks Jeff for looking at this. I did not look beyond printing the 
BTOBB(need_bytes) and log->l_logBBsize.

okay the reservation is larger than the logspace defined.

  Is the create directory reserving too much?
  Is the minimum log size too small?

I did not dig too deep into all the possible directory conversion that 
could happen when one more entry was put in the directory.

>
> 2) For a refreshed formated XFS partition, create a regular file via `touch /xfs/test_file`, the process
> will hang and the following syslog were observed:
>
> Mar 26 14:15:35 linux64bit kernel: [  841.804353] INFO: task kworker/0:2:2118 blocked for more than 120 seconds.
> Mar 26 14:15:35 linux64bit kernel: [  841.804361] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 26 14:15:35 linux64bit kernel: [  841.804364] kworker/0:2     D ffffffff8180cfa0     0  2118      2 0x00000000
> Mar 26 14:15:35 linux64bit kernel: [  841.804372]  ffff880079bfdc18 0000000000000046 ffff880076bdc5c0 0000000000014e40
> Mar 26 14:15:35 linux64bit kernel: [  841.804378]  ffff880062b31720 ffff880079bfdfd8 ffff880079bfdfd8 ffff880079bfdfd8
> Mar 26 14:15:35 linux64bit kernel: [  841.804382]  ffffffff81c13440 ffff880062b31720 ffff880079bfdc38 ffff880079165cf0
> Mar 26 14:15:35 linux64bit kernel: [  841.804387] Call Trace:
> Mar 26 14:15:35 linux64bit kernel: [  841.804400]  [<ffffffff8169c399>] schedule+0x29/0x70
> Mar 26 14:15:35 linux64bit kernel: [  841.804444]  [<ffffffffa037e15a>] xlog_grant_head_wait+0x15a/0x2b0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804466]  [<ffffffff81692bf3>] ? printk+0x61/0x63
> Mar 26 14:15:35 linux64bit kernel: [  841.804500]  [<ffffffffa037e414>] xlog_grant_head_check+0x164/0x1d0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804530]  [<ffffffffa037ff8e>] xfs_log_reserve+0x24e/0x3e0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804561]  [<ffffffffa037b747>] xfs_trans_reserve+0x327/0x360 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804589]  [<ffffffffa032dd84>] ? kmem_zone_zalloc+0x34/0x50 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804613]  [<ffffffffa031bbb0>] xfs_fs_log_dummy+0x80/0x130 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804643]  [<ffffffffa0381524>] xfs_log_worker+0x54/0x120 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804650]  [<ffffffff8107805b>] process_one_work+0x16b/0x400
> Mar 26 14:15:35 linux64bit kernel: [  841.804655]  [<ffffffff81078ce9>] worker_thread+0x119/0x360
> Mar 26 14:15:35 linux64bit kernel: [  841.804660]  [<ffffffff81078bd0>] ? manage_workers+0x350/0x350
> Mar 26 14:15:35 linux64bit kernel: [  841.804665]  [<ffffffff8107e200>] kthread+0xc0/0xd0
> Mar 26 14:15:35 linux64bit kernel: [  841.804669]  [<ffffffff8107e140>] ? flush_kthread_worker+0xb0/0xb0
> Mar 26 14:15:35 linux64bit kernel: [  841.804675]  [<ffffffff816a5d2c>] ret_from_fork+0x7c/0xb0
> Mar 26 14:15:35 linux64bit kernel: [  841.804679]  [<ffffffff8107e140>] ? flush_kthread_worker+0xb0/0xb0
> Mar 26 14:15:35 linux64bit kernel: [  841.804685] INFO: task touch:2323 blocked for more than 120 seconds.
> Mar 26 14:15:35 linux64bit kernel: [  841.804687] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar 26 14:15:35 linux64bit kernel: [  841.804689] touch           D ffffffff8180cfa0     0  2323   2128 0x00000000
> Mar 26 14:15:35 linux64bit kernel: [  841.804694]  ffff8800791419b8 0000000000000086 ffff880076bdc5c0 0000000000014e40
> Mar 26 14:15:35 linux64bit kernel: [  841.804699]  ffff88003032ae40 ffff880079141fd8 ffff880079141fd8 ffff880079141fd8
> Mar 26 14:15:35 linux64bit kernel: [  841.804703]  ffffffff81c13440 ffff88003032ae40 ffff8800791419d8 ffff880079993000
> Mar 26 14:15:35 linux64bit kernel: [  841.804707] Call Trace:
> Mar 26 14:15:35 linux64bit kernel: [  841.804713]  [<ffffffff8169c399>] schedule+0x29/0x70
> Mar 26 14:15:35 linux64bit kernel: [  841.804740]  [<ffffffffa037e15a>] xlog_grant_head_wait+0x15a/0x2b0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804745]  [<ffffffff81692bf3>] ? printk+0x61/0x63
> Mar 26 14:15:35 linux64bit kernel: [  841.804772]  [<ffffffffa037e461>] xlog_grant_head_check+0x1b1/0x1d0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804798]  [<ffffffffa037ff8e>] xfs_log_reserve+0x24e/0x3e0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804826]  [<ffffffffa037b747>] xfs_trans_reserve+0x327/0x360 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804854]  [<ffffffffa032ac34>] xfs_create+0x184/0x6f0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804880]  [<ffffffffa0322e4b>] xfs_vn_mknod+0xcb/0x1e0 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804904]  [<ffffffffa0322f93>] xfs_vn_create+0x13/0x20 [xfs]
> Mar 26 14:15:35 linux64bit kernel: [  841.804913]  [<ffffffff811a29c5>] vfs_create+0xb5/0x130
> Mar 26 14:15:35 linux64bit kernel: [  841.804918]  [<ffffffff811a37ca>] do_last+0xd8a/0xec0
> Mar 26 14:15:35 linux64bit kernel: [  841.804926]  [<ffffffff812fe01c>] ? apparmor_file_alloc_security+0x2c/0x60
> Mar 26 14:15:35 linux64bit kernel: [  841.804932]  [<ffffffff811a39b8>] path_openat+0xb8/0x4c0
> Mar 26 14:15:35 linux64bit kernel: [  841.804937]  [<ffffffff81158741>] ? handle_mm_fault+0x271/0x390
> Mar 26 14:15:35 linux64bit kernel: [  841.804943]  [<ffffffff811a4552>] do_filp_open+0x42/0xa0
> Mar 26 14:15:35 linux64bit kernel: [  841.804949]  [<ffffffff811b1675>] ? __alloc_fd+0xe5/0x170
> Mar 26 14:15:35 linux64bit kernel: [  841.804955]  [<ffffffff811941de>] do_sys_open+0xfe/0x1e0
> Mar 26 14:15:35 linux64bit kernel: [  841.804960]  [<ffffffff811942e1>] sys_open+0x21/0x30
> Mar 26 14:15:35 linux64bit kernel: [  841.804966]  [<ffffffff816a5dd9>] system_call_fastpath+0x16/0x1b
>
> It most likely a deadlock or dead loop was occurred according to above tracing info.
> Then I did some investigation to check what had happened in transaction log space reservations
> between mount a refreshed formated XFS partition and create a new file by adding some debugging code.
>
> As per the current disk formating, the totally log space is 2560 * 1024 = 2621440 bytes.
> By default, there are 1310720 bytes free space on both l_reserve_head.grant and l_write_head.grant separately.

The reserve and write grant use the whole space. Look at the xlog and 
CIL CTX values when hung.

>
> Firstly, I observed a non-permanent CIL ticket allocation at xlog_cil_init_post_recovery() for xfs_log_mount(),
> and it has no affect to our situation.
>
> Secondly, xfs_vn_update_time() is invoked per mount the partition, and hence the XFS_TRANS_FSYNC_TS transaction
> was happened, it need 532916 bytes for the space reservation, then give back 208 bytes at xlog_ungrant_log_space()
> after xfs_trans_commit().  Therefore, it only reserved (532916 - 208) bytes in this transaction, now there has
> 1310720 - (532916 - 208) = 778012 bytes left at l_reserve_head.grant.
>
> Finally, creating a file will make XFS_TRANS_CREATE happened, and it need to reserve 1205208 bytes, however, there
> only has 778012 bytes can be reserved, so that the process hang at the loop of xlog_grant_head_wait() and then call
> schedule() to let other processes wake up and running.
>
> In this situation, xfs_log_worker() will up and running, as this is the 1st time it was issued, the call chains was
> shown as following:
> ........
> ......
> xlog_log_force()
>    __xfs_log_force() ->  xfs_sync()
>      xlog_cil_force_lsn()
>        xlog_cil_push()
>          xlog_log_done()
>            xlog_ungrant_log_space()
> 	    xlog_log_space_wake()
> 	      xlog_grant_head_wake()...
>
> Up to now, there should left 1306396 bytes in l_reserve_head.grant since I observed this CHECKPOINT transaction
> gave back 528384 bytes at xlog_ungrant_log_space(), so that 528384 + 778012 = 1306396 bytes, but that's is not
> exactly, because it reserved 257820 bytes at xfs_sync(), so the left space is 1306396 - 257820 = 1048576 bytes.
>
> Once xfs_log_worker() run completed, the previous XFS_TRANS_CREATE is revived at the point after schedule(),
> but the free space(1048576) is less than the required(1205208), as a result, it got blocked again.
>
> Periodicity, xfs_log_worker() comes up, but this time it gone through XFS_TRANS_DUMMY instead.  In this case, the
> current free space is enough for this reservations because it requires 533172 bytes only, however, it also hang at
> the point of xlog_grant_head_wait() before schedule() because there only has one log transaction related process
> belong to the file creation would be scheduled, but this process will be blocked again due to the lack of log space.

Yes, it will get blocked behind the previous allocations.

>
> The question is that even if XFS_TRANS_DUMMY was not got involved, XFS_TRANS_CREATE would still be blocked as the log
> space reservation can not be satisfied.  IMHO, looks the major cause is related to the 'sunit' parameter, since it would
> affect the log space unit calculations by '2*log->l_mp->m_sb.sb_logsunit' at xlog_ticket_alloc().  However, we don't include
> this factor into consideration at mkfs or mount stage, should we take it into account?
>

There is at least one other (controversial) log (over-) allocation, but 
this is not involved here.

It is easy to hang small logs. I classify them as soft and hard hands.

Soft hangs can be restarted with a new transactions (touch/mkdir) that 
request grant space and forces a push the AIL up to the last_sync value.

This looks like a hard hang. Most of the time, the CIL, CTX ticket and 
any permanent tickets can enough hold space that a simple AIL push 
cannot get it going again. Tickets have to allocate for the worse case. 
Most time they give back this worse case space when the CIL is pushed.

2 log blocksize or log stripe unit for rounding is added to every 
allocation, so that is already counted in the "bytes_needed".

>
> Thanks,
> -Jeff

Like I said before, I had not looked at this beyond what blocks needed 
versus blocks in the log. I am willing to help.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-26 10:14             ` Jeff Liu
  2013-03-26 16:44               ` Mark Tinguely
@ 2013-03-27  2:03               ` Dave Chinner
  2013-03-28 15:16                 ` Jeff Liu
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2013-03-27  2:03 UTC (permalink / raw)
  To: Jeff Liu; +Cc: xfs

On Tue, Mar 26, 2013 at 06:14:30PM +0800, Jeff Liu wrote:
> On 03/12/2013 08:05 PM, Dave Chinner wrote:
> > On Tue, Mar 12, 2013 at 07:56:35PM +0800, Jeff Liu wrote:
> >> More info, 3.7.0 is the oldest kernel on my environment, I ran into the
> >> same problem.
> > 
> > Thanks for following up so quickly, Jeff. So the problem is that a
> > new test is tripping over a bug that has been around for a while,
> > not that it is a new regression.
> > 
> > OK, so I'll expunge that from my testing for the moment as I don't
> > ahve time to dig in and find out what the cause is right now. If
> > anyone else wants to.... :)
> 
> I did some further tests to nail down this issue, just posting the analysis result here,
> it might be of some use when we revising it again.
> 
> The disk is formated with Dave's previous comments, i.e.
> mkfs.xfs -f -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b /dev/xxx
> 
> First of all, looks this bug stayed in hiding for years since I can reproduce it between upstream
> 3.0 to 3.9.0-rc3, the oldest kernel I have tried is 2.6.39 which has the same problem.

If you mount 2.6.39 with "-o nodelaylog", does the problem go away?

> IMHO, looks the major cause is related to the 'sunit' parameter,
> since it would affect the log space unit calculations by
> '2*log->l_mp->m_sb.sb_logsunit' at xlog_ticket_alloc().  However,
> we don't include this factor into consideration at mkfs or mount
> stage, should we take it into account?

That's what I suspected was the problem. i.e. that the log was too
small for the given configuration.

The question is this: how much space do we need to reserve. I'm
thinking a minimum of 4*lsu - 2*lsu for the existing CIL context, and
another 2*lsu for any queued ticket waiting for space to come
available.

I haven't thought a lot about it, though, and I have a little demon
sitting on my shoulder nagging me about specific thresholds whether
they need to play a part in this. e.g. no single transaction can be
larger than half the log; AIL push thresholds of 25% of log space;
background CIL commit threshold of 12.5% of the log...

So it's not immediately clear to me how much bigger the log needs to
be...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-26 16:44               ` Mark Tinguely
@ 2013-03-28 12:58                 ` Jeff Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Liu @ 2013-03-28 12:58 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On 03/27/2013 12:44 AM, Mark Tinguely wrote:
> On 03/26/13 05:14, Jeff Liu wrote:
>> On 03/12/2013 08:05 PM, Dave Chinner wrote:
>>> On Tue, Mar 12, 2013 at 07:56:35PM +0800, Jeff Liu wrote:
>>>> More info, 3.7.0 is the oldest kernel on my environment, I ran into the
>>>> same problem.
>>>
>>> Thanks for following up so quickly, Jeff. So the problem is that a
>>> new test is tripping over a bug that has been around for a while,
>>> not that it is a new regression.
>>>
>>> OK, so I'll expunge that from my testing for the moment as I don't
>>> ahve time to dig in and find out what the cause is right now. If
>>> anyone else wants to.... :)
>>
>> I did some further tests to nail down this issue, just posting the analysis result here,
>> it might be of some use when we revising it again.
>>
>> The disk is formated with Dave's previous comments, i.e.
>> mkfs.xfs -f -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b /dev/xxx
>>
>> First of all, looks this bug stayed in hiding for years since I can reproduce it between upstream
>> 3.0 to 3.9.0-rc3, the oldest kernel I have tried is 2.6.39 which has the same problem.
>>
>> The following tests are running against kernel with CONFIG_XFS_DEBUG is enabled.
>>
>> 1) Create directory will cause kernel panic, observed same info to Dave's previous posts.
>>
>> [ 3846.705416] XFS: Assertion failed: BTOBB(need_bytes)<  log->l_logBBsize, file: fs/xfs/xfs_log.c, line:
> 
> Thanks Jeff for looking at this. I did not look beyond printing the 
> BTOBB(need_bytes) and log->l_logBBsize.
Sorry for my late response.
> 
> okay the reservation is larger than the logspace defined.
> 
>   Is the create directory reserving too much?
Yes, it need 3531 blocks, but only 2560 blocks is reserved for the logspace.
>   Is the minimum log size too small?
927 blocks(i.e. max_tr_res * XFS_MIN_LOG_FACTOR) which can be detected at mkfs stage.
> 
> I did not dig too deep into all the possible directory conversion that 
> could happen when one more entry was put in the directory.
> 
>>
>> 2) For a refreshed formated XFS partition, create a regular file via `touch /xfs/test_file`, the process
>> will hang and the following syslog were observed:
>>
>> Mar 26 14:15:35 linux64bit kernel: [  841.804353] INFO: task kworker/0:2:2118 blocked for more than 120 seconds.
>> Mar 26 14:15:35 linux64bit kernel: [  841.804361] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar 26 14:15:35 linux64bit kernel: [  841.804364] kworker/0:2     D ffffffff8180cfa0     0  2118      2 0x00000000
>> Mar 26 14:15:35 linux64bit kernel: [  841.804372]  ffff880079bfdc18 0000000000000046 ffff880076bdc5c0 0000000000014e40
>> Mar 26 14:15:35 linux64bit kernel: [  841.804378]  ffff880062b31720 ffff880079bfdfd8 ffff880079bfdfd8 ffff880079bfdfd8
>> Mar 26 14:15:35 linux64bit kernel: [  841.804382]  ffffffff81c13440 ffff880062b31720 ffff880079bfdc38 ffff880079165cf0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804387] Call Trace:
>> Mar 26 14:15:35 linux64bit kernel: [  841.804400]  [<ffffffff8169c399>] schedule+0x29/0x70
>> Mar 26 14:15:35 linux64bit kernel: [  841.804444]  [<ffffffffa037e15a>] xlog_grant_head_wait+0x15a/0x2b0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804466]  [<ffffffff81692bf3>] ? printk+0x61/0x63
>> Mar 26 14:15:35 linux64bit kernel: [  841.804500]  [<ffffffffa037e414>] xlog_grant_head_check+0x164/0x1d0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804530]  [<ffffffffa037ff8e>] xfs_log_reserve+0x24e/0x3e0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804561]  [<ffffffffa037b747>] xfs_trans_reserve+0x327/0x360 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804589]  [<ffffffffa032dd84>] ? kmem_zone_zalloc+0x34/0x50 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804613]  [<ffffffffa031bbb0>] xfs_fs_log_dummy+0x80/0x130 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804643]  [<ffffffffa0381524>] xfs_log_worker+0x54/0x120 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804650]  [<ffffffff8107805b>] process_one_work+0x16b/0x400
>> Mar 26 14:15:35 linux64bit kernel: [  841.804655]  [<ffffffff81078ce9>] worker_thread+0x119/0x360
>> Mar 26 14:15:35 linux64bit kernel: [  841.804660]  [<ffffffff81078bd0>] ? manage_workers+0x350/0x350
>> Mar 26 14:15:35 linux64bit kernel: [  841.804665]  [<ffffffff8107e200>] kthread+0xc0/0xd0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804669]  [<ffffffff8107e140>] ? flush_kthread_worker+0xb0/0xb0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804675]  [<ffffffff816a5d2c>] ret_from_fork+0x7c/0xb0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804679]  [<ffffffff8107e140>] ? flush_kthread_worker+0xb0/0xb0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804685] INFO: task touch:2323 blocked for more than 120 seconds.
>> Mar 26 14:15:35 linux64bit kernel: [  841.804687] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar 26 14:15:35 linux64bit kernel: [  841.804689] touch           D ffffffff8180cfa0     0  2323   2128 0x00000000
>> Mar 26 14:15:35 linux64bit kernel: [  841.804694]  ffff8800791419b8 0000000000000086 ffff880076bdc5c0 0000000000014e40
>> Mar 26 14:15:35 linux64bit kernel: [  841.804699]  ffff88003032ae40 ffff880079141fd8 ffff880079141fd8 ffff880079141fd8
>> Mar 26 14:15:35 linux64bit kernel: [  841.804703]  ffffffff81c13440 ffff88003032ae40 ffff8800791419d8 ffff880079993000
>> Mar 26 14:15:35 linux64bit kernel: [  841.804707] Call Trace:
>> Mar 26 14:15:35 linux64bit kernel: [  841.804713]  [<ffffffff8169c399>] schedule+0x29/0x70
>> Mar 26 14:15:35 linux64bit kernel: [  841.804740]  [<ffffffffa037e15a>] xlog_grant_head_wait+0x15a/0x2b0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804745]  [<ffffffff81692bf3>] ? printk+0x61/0x63
>> Mar 26 14:15:35 linux64bit kernel: [  841.804772]  [<ffffffffa037e461>] xlog_grant_head_check+0x1b1/0x1d0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804798]  [<ffffffffa037ff8e>] xfs_log_reserve+0x24e/0x3e0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804826]  [<ffffffffa037b747>] xfs_trans_reserve+0x327/0x360 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804854]  [<ffffffffa032ac34>] xfs_create+0x184/0x6f0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804880]  [<ffffffffa0322e4b>] xfs_vn_mknod+0xcb/0x1e0 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804904]  [<ffffffffa0322f93>] xfs_vn_create+0x13/0x20 [xfs]
>> Mar 26 14:15:35 linux64bit kernel: [  841.804913]  [<ffffffff811a29c5>] vfs_create+0xb5/0x130
>> Mar 26 14:15:35 linux64bit kernel: [  841.804918]  [<ffffffff811a37ca>] do_last+0xd8a/0xec0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804926]  [<ffffffff812fe01c>] ? apparmor_file_alloc_security+0x2c/0x60
>> Mar 26 14:15:35 linux64bit kernel: [  841.804932]  [<ffffffff811a39b8>] path_openat+0xb8/0x4c0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804937]  [<ffffffff81158741>] ? handle_mm_fault+0x271/0x390
>> Mar 26 14:15:35 linux64bit kernel: [  841.804943]  [<ffffffff811a4552>] do_filp_open+0x42/0xa0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804949]  [<ffffffff811b1675>] ? __alloc_fd+0xe5/0x170
>> Mar 26 14:15:35 linux64bit kernel: [  841.804955]  [<ffffffff811941de>] do_sys_open+0xfe/0x1e0
>> Mar 26 14:15:35 linux64bit kernel: [  841.804960]  [<ffffffff811942e1>] sys_open+0x21/0x30
>> Mar 26 14:15:35 linux64bit kernel: [  841.804966]  [<ffffffff816a5dd9>] system_call_fastpath+0x16/0x1b
>>
>> It most likely a deadlock or dead loop was occurred according to above tracing info.
>> Then I did some investigation to check what had happened in transaction log space reservations
>> between mount a refreshed formated XFS partition and create a new file by adding some debugging code.
>>
>> As per the current disk formating, the totally log space is 2560 * 1024 = 2621440 bytes.
>> By default, there are 1310720 bytes free space on both l_reserve_head.grant and l_write_head.grant separately.
> 
> The reserve and write grant use the whole space. Look at the xlog and 
> CIL CTX values when hung.
But the maximum reservation size can not larger than a half the log.
>>
>> Firstly, I observed a non-permanent CIL ticket allocation at xlog_cil_init_post_recovery() for xfs_log_mount(),
>> and it has no affect to our situation.
>>
>> Secondly, xfs_vn_update_time() is invoked per mount the partition, and hence the XFS_TRANS_FSYNC_TS transaction
>> was happened, it need 532916 bytes for the space reservation, then give back 208 bytes at xlog_ungrant_log_space()
>> after xfs_trans_commit().  Therefore, it only reserved (532916 - 208) bytes in this transaction, now there has
>> 1310720 - (532916 - 208) = 778012 bytes left at l_reserve_head.grant.
>>
>> Finally, creating a file will make XFS_TRANS_CREATE happened, and it need to reserve 1205208 bytes, however, there
>> only has 778012 bytes can be reserved, so that the process hang at the loop of xlog_grant_head_wait() and then call
>> schedule() to let other processes wake up and running.
>>
>> In this situation, xfs_log_worker() will up and running, as this is the 1st time it was issued, the call chains was
>> shown as following:
>> ........
>> ......
>> xlog_log_force()
>>    __xfs_log_force() ->  xfs_sync()
>>      xlog_cil_force_lsn()
>>        xlog_cil_push()
>>          xlog_log_done()
>>            xlog_ungrant_log_space()
>> 	    xlog_log_space_wake()
>> 	      xlog_grant_head_wake()...
>>
>> Up to now, there should left 1306396 bytes in l_reserve_head.grant since I observed this CHECKPOINT transaction
>> gave back 528384 bytes at xlog_ungrant_log_space(), so that 528384 + 778012 = 1306396 bytes, but that's is not
>> exactly, because it reserved 257820 bytes at xfs_sync(), so the left space is 1306396 - 257820 = 1048576 bytes.
>>
>> Once xfs_log_worker() run completed, the previous XFS_TRANS_CREATE is revived at the point after schedule(),
>> but the free space(1048576) is less than the required(1205208), as a result, it got blocked again.
>>
>> Periodicity, xfs_log_worker() comes up, but this time it gone through XFS_TRANS_DUMMY instead.  In this case, the
>> current free space is enough for this reservations because it requires 533172 bytes only, however, it also hang at
>> the point of xlog_grant_head_wait() before schedule() because there only has one log transaction related process
>> belong to the file creation would be scheduled, but this process will be blocked again due to the lack of log space.
> 
> Yes, it will get blocked behind the previous allocations.
> 
>>
>> The question is that even if XFS_TRANS_DUMMY was not got involved, XFS_TRANS_CREATE would still be blocked as the log
>> space reservation can not be satisfied.  IMHO, looks the major cause is related to the 'sunit' parameter, since it would
>> affect the log space unit calculations by '2*log->l_mp->m_sb.sb_logsunit' at xlog_ticket_alloc().  However, we don't include
>> this factor into consideration at mkfs or mount stage, should we take it into account?
>>
> 
> There is at least one other (controversial) log (over-) allocation, but 
> this is not involved here.
> 
> It is easy to hang small logs. I classify them as soft and hard hands.
> 
> Soft hangs can be restarted with a new transactions (touch/mkdir) that 
> request grant space and forces a push the AIL up to the last_sync value.
> 
> This looks like a hard hang. Most of the time, the CIL, CTX ticket and 
> any permanent tickets can enough hold space that a simple AIL push 
> cannot get it going again. Tickets have to allocate for the worse case. 
> Most time they give back this worse case space when the CIL is pushed.
> 
> 2 log blocksize or log stripe unit for rounding is added to every 
> allocation, so that is already counted in the "bytes_needed".
Thanks for the clarification. Yes, there are particular additional bytes
are padded to every allocation for rounding, however, that's belong to the
runtime space reservation requests IMHO, but we ran into this issue because
the specified log space is too small for current configuration.

Regards,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-27  2:03               ` Dave Chinner
@ 2013-03-28 15:16                 ` Jeff Liu
  2013-03-29  3:00                   ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Liu @ 2013-03-28 15:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 03/27/2013 10:03 AM, Dave Chinner wrote:
> On Tue, Mar 26, 2013 at 06:14:30PM +0800, Jeff Liu wrote:
>> On 03/12/2013 08:05 PM, Dave Chinner wrote:
>>> On Tue, Mar 12, 2013 at 07:56:35PM +0800, Jeff Liu wrote:
>>>> More info, 3.7.0 is the oldest kernel on my environment, I ran into the
>>>> same problem.
>>>
>>> Thanks for following up so quickly, Jeff. So the problem is that a
>>> new test is tripping over a bug that has been around for a while,
>>> not that it is a new regression.
>>>
>>> OK, so I'll expunge that from my testing for the moment as I don't
>>> ahve time to dig in and find out what the cause is right now. If
>>> anyone else wants to.... :)
>>
>> I did some further tests to nail down this issue, just posting the analysis result here,
>> it might be of some use when we revising it again.
>>
>> The disk is formated with Dave's previous comments, i.e.
>> mkfs.xfs -f -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b /dev/xxx
>>
>> First of all, looks this bug stayed in hiding for years since I can reproduce it between upstream
>> 3.0 to 3.9.0-rc3, the oldest kernel I have tried is 2.6.39 which has the same problem.
> 
> If you mount 2.6.39 with "-o nodelaylog", does the problem go away?
touch file is ok, but create directory still cause the assertion failure.
> 
>> IMHO, looks the major cause is related to the 'sunit' parameter,
>> since it would affect the log space unit calculations by
>> '2*log->l_mp->m_sb.sb_logsunit' at xlog_ticket_alloc().  However,
>> we don't include this factor into consideration at mkfs or mount
>> stage, should we take it into account?
> 
> That's what I suspected was the problem. i.e. that the log was too
> small for the given configuration.
> 
> The question is this: how much space do we need to reserve. I'm
> thinking a minimum of 4*lsu - 2*lsu for the existing CIL context, and
> another 2*lsu for any queued ticket waiting for space to come
> available.
> 
> I haven't thought a lot about it, though, and I have a little demon
> sitting on my shoulder nagging me about specific thresholds whether
> they need to play a part in this. e.g. no single transaction can be
> larger than half the log; AIL push thresholds of 25% of log space;
> background CIL commit threshold of 12.5% of the log...
> 
> So it's not immediately clear to me how much bigger the log needs to
> be...
I still need some time to understand the space reservation strategy to
figure them out. :(

Thanks,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ASSERT failure] transaction reservations changes bad?
  2013-03-28 15:16                 ` Jeff Liu
@ 2013-03-29  3:00                   ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2013-03-29  3:00 UTC (permalink / raw)
  To: Jeff Liu; +Cc: xfs

On Thu, Mar 28, 2013 at 11:16:17PM +0800, Jeff Liu wrote:
> On 03/27/2013 10:03 AM, Dave Chinner wrote:
> > On Tue, Mar 26, 2013 at 06:14:30PM +0800, Jeff Liu wrote:
> >> On 03/12/2013 08:05 PM, Dave Chinner wrote:
> >>> On Tue, Mar 12, 2013 at 07:56:35PM +0800, Jeff Liu wrote:
> >>>> More info, 3.7.0 is the oldest kernel on my environment, I ran into the
> >>>> same problem.
> >>>
> >>> Thanks for following up so quickly, Jeff. So the problem is that a
> >>> new test is tripping over a bug that has been around for a while,
> >>> not that it is a new regression.
> >>>
> >>> OK, so I'll expunge that from my testing for the moment as I don't
> >>> ahve time to dig in and find out what the cause is right now. If
> >>> anyone else wants to.... :)
> >>
> >> I did some further tests to nail down this issue, just posting the analysis result here,
> >> it might be of some use when we revising it again.
> >>
> >> The disk is formated with Dave's previous comments, i.e.
> >> mkfs.xfs -f -b size=512 -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b /dev/xxx
> >>
> >> First of all, looks this bug stayed in hiding for years since I can reproduce it between upstream
> >> 3.0 to 3.9.0-rc3, the oldest kernel I have tried is 2.6.39 which has the same problem.
> > 
> > If you mount 2.6.39 with "-o nodelaylog", does the problem go away?
> touch file is ok, but create directory still cause the assertion failure.

So it is not related to the way that delayed logging steals
reservations for the CIL context checkpoint as the problem still
occurs when delayed logging is disabled. That means it is likely to
be related only to the log stripe unit being set....

> >> IMHO, looks the major cause is related to the 'sunit' parameter,
> >> since it would affect the log space unit calculations by
> >> '2*log->l_mp->m_sb.sb_logsunit' at xlog_ticket_alloc().  However,
> >> we don't include this factor into consideration at mkfs or mount
> >> stage, should we take it into account?
> > 
> > That's what I suspected was the problem. i.e. that the log was too
> > small for the given configuration.
> > 
> > The question is this: how much space do we need to reserve. I'm
> > thinking a minimum of 4*lsu - 2*lsu for the existing CIL context, and
> > another 2*lsu for any queued ticket waiting for space to come
> > available.
> > 
> > I haven't thought a lot about it, though, and I have a little demon
> > sitting on my shoulder nagging me about specific thresholds whether
> > they need to play a part in this. e.g. no single transaction can be
> > larger than half the log; AIL push thresholds of 25% of log space;
> > background CIL commit threshold of 12.5% of the log...
> > 
> > So it's not immediately clear to me how much bigger the log needs to
> > be...
> I still need some time to understand the space reservation strategy to
> figure them out. :(

OK. Thanks for digging into this, Jeff.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-03-29  3:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-12  6:20 [ASSERT failure] transaction reservations changes bad? Dave Chinner
2013-03-12  6:25 ` Dave Chinner
2013-03-12  8:08   ` Jeff Liu
2013-03-12 10:31     ` Dave Chinner
2013-03-12 11:05       ` Jeff Liu
2013-03-12 11:56         ` Jeff Liu
2013-03-12 12:05           ` Dave Chinner
2013-03-26 10:14             ` Jeff Liu
2013-03-26 16:44               ` Mark Tinguely
2013-03-28 12:58                 ` Jeff Liu
2013-03-27  2:03               ` Dave Chinner
2013-03-28 15:16                 ` Jeff Liu
2013-03-29  3:00                   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.