All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
@ 2019-01-08 13:32 Mark Syms
  2019-01-09 12:05 ` Tim Smith
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Syms @ 2019-01-08 13:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

We've seen this in testing with 4.19.

Full trace is at the bottom.

Looking at the code though it looks like it will assert if the value of change is equal to the number of blocks currently allocated to the inode. Is this expected or should the assert be using >= instead of > ?

Thanks,

	Mark

--------------

2018-12-24 00:07:03 UTC] [ 3366.209273] kernel BUG at fs/gfs2/inode.h:64!

[2018-12-24 00:07:03 UTC] [ 3366.209305] invalid opcode: 0000 [#1] SMP NOPTI

[2018-12-24 00:07:03 UTC] [ 3366.209327] CPU: 10 PID: 30664 Comm: kworker/10:1 Tainted: G           O      4.19.0+0 #1

[2018-12-24 00:07:03 UTC] [ 3366.209352] Hardware name: HP ProLiant BL420c Gen8, BIOS I30 11/03/2014

[2018-12-24 00:07:03 UTC] [ 3366.209397] Workqueue: delete_workqueue delete_work_func [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209437] RIP: e030:gfs2_add_inode_blocks.part.33+0x10/0x12 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209460] Code: 00 89 c2 48 c7 c7 80 bc 4c c0 31 c0 e8 ec 44 c1 c0 e9 68 fc ff ff 0f 0b 0f 0b 48 8b 47 28 48 8b b8 08 04 00 00 e8 bf dd ff ff <0f> 0b 0f 0b 0f 0b 66 66 66 66 90 0f 0b 48 8b 47 28 48 8b b8 08 04

[2018-12-24 00:07:03 UTC] [ 3366.209512] RSP: e02b:ffffc9004a53fbf0 EFLAGS: 00010246

[2018-12-24 00:07:03 UTC] [ 3366.209532] RAX: 0000000000000043 RBX: ffff8881709a4000 RCX: 0000000000000000

[2018-12-24 00:07:03 UTC] [ 3366.209556] RDX: 0000000000000000 RSI: ffff88818d496428 RDI: ffff88818d496428

[2018-12-24 00:07:03 UTC] [ 3366.209581] RBP: ffffc9004a53fdb0 R08: 0000000000000000 R09: 0000000000000553

[2018-12-24 00:07:03 UTC] [ 3366.209605] R10: 0000000000000000 R11: ffffc9004a53f968 R12: ffff888156a303d8

[2018-12-24 00:07:03 UTC] [ 3366.209629] R13: 0000000000f47524 R14: ffff8881709a3ac0 R15: ffff888188d31110

[2018-12-24 00:07:03 UTC] [ 3366.209683] FS:  00007fa73976e700(0000) GS:ffff88818d480000(0000) knlGS:0000000000000000

[2018-12-24 00:07:03 UTC] [ 3366.209709] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033

[2018-12-24 00:07:03 UTC] [ 3366.209730] CR2: 00007fca0def3000 CR3: 0000000187fee000 CR4: 0000000000042660

[2018-12-24 00:07:03 UTC] [ 3366.209802] Call Trace:

[2018-12-24 00:07:03 UTC] [ 3366.209835]  punch_hole+0xf4f/0x10d0 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209866]  ? __switch_to_asm+0x40/0x70

[2018-12-24 00:07:03 UTC] [ 3366.209907]  ? __switch_to_asm+0x40/0x70

[2018-12-24 00:07:03 UTC] [ 3366.209924]  ? __switch_to_asm+0x40/0x70

[2018-12-24 00:07:03 UTC] [ 3366.209949]  ? punch_hole+0xb3e/0x10d0 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209983]  gfs2_evict_inode+0x57b/0x680 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.210036]  ? __inode_wait_for_writeback+0x75/0xe0

[2018-12-24 00:07:03 UTC] [ 3366.210066]  ? gfs2_evict_inode+0x1c7/0x680 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.210090]  evict+0xc6/0x1a0

[2018-12-24 00:07:03 UTC] [ 3366.210151]  delete_work_func+0x60/0x70 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.210177]  process_one_work+0x165/0x370

[2018-12-24 00:07:03 UTC] [ 3366.210197]  worker_thread+0x49/0x3e0

[2018-12-24 00:07:03 UTC] [ 3366.210216]  kthread+0xf8/0x130

[2018-12-24 00:07:03 UTC] [ 3366.210235]  ? rescuer_thread+0x310/0x310

[2018-12-24 00:07:03 UTC] [ 3366.210284]  ? kthread_bind+0x10/0x10

[2018-12-24 00:07:04 UTC] [ 3366.210301]  ret_from_fork+0x35/0x40

[2018-12-24 00:07:04 UTC] [ 3366.210319] Modules linked in: tun nfsv3 nfs_acl nfs lockd grace fscache gfs2 dlm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter dm_multipath sunrpc sb_edac intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper dm_mod sg hpilo lpc_ich intel_rapl_perf psmouse ipmi_si ipmi_devintf ipmi_msghandler nls_utf8 isofs acpi_power_meter loop xen_wdt ip_tables x_tables sd_mod uhci_hcd serio_raw ehci_pci ehci_hcd hpsa igb(O) scsi_transport_sas scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod ipv6 crc_ccitt

[2018-12-24 00:07:04 UTC] [ 3366.210663] ---[ end trace d36fcbebd28b36e9 ]---



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-08 13:32 [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64 Mark Syms
@ 2019-01-09 12:05 ` Tim Smith
  2019-01-09 13:31   ` Andreas Gruenbacher
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Smith @ 2019-01-09 12:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote:
> Hi,
> 
> We've seen this in testing with 4.19.
> 
> Full trace is at the bottom.
> 
> Looking at the code though it looks like it will assert if the value of
> change is equal to the number of blocks currently allocated to the inode.
> Is this expected or should the assert be using >= instead of > ?

It's weirder. Looking at

static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
{
	gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > -change));
	change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
	inode->i_blocks += change;
}

where the BUG is happening, "inode->i_blocks" is counting of 512b blocks and
"change" is in units of whatever-the-superblock-said which will probably be
counting 4096b blocks, so the comparison makes no sense.

It should probably read

static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
{
	change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
	gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= -change));
	inode->i_blocks += change;
}

so I'll make a patch for that unless someone wants to correct me.

As for why it's triggering, given the current scale difference I can only
assume it's an attempt to remove blocks from something which has none left.
I'd suspect a boundary case in sweep_bh_for_rgrps().

(It *is* possible currently to remove 1 4096b block from something which has
8 512b blocks so you can get to the state of "inode->i_blocks == 0)" It would
only blow up all the time if you did a "mkfs.gfs2 -b 512" which I don't
imagine anyone does that often...)

> Thanks,
> 
> 	Mark
> 
> --------------
> 
> 2018-12-24 00:07:03 UTC] [ 3366.209273] kernel BUG at fs/gfs2/inode.h:64!

[snip]

> [2018-12-24 00:07:03 UTC] [ 3366.209802] Call Trace:
> [2018-12-24 00:07:03 UTC] [ 3366.209835]  punch_hole+0xf4f/0x10d0 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.209866]  ? __switch_to_asm+0x40/0x70
> [2018-12-24 00:07:03 UTC] [ 3366.209907]  ? __switch_to_asm+0x40/0x70
> [2018-12-24 00:07:03 UTC] [ 3366.209924]  ? __switch_to_asm+0x40/0x70
> [2018-12-24 00:07:03 UTC] [ 3366.209949]  ? punch_hole+0xb3e/0x10d0 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.209983]  gfs2_evict_inode+0x57b/0x680 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.210036]  ? __inode_wait_for_writeback+0x75/0xe0
> [2018-12-24 00:07:03 UTC] [ 3366.210066]  ? gfs2_evict_inode+0x1c7/0x680 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.210090]  evict+0xc6/0x1a0
> [2018-12-24 00:07:03 UTC] [ 3366.210151]  delete_work_func+0x60/0x70 [gfs2]
> [2018-12-24 00:07:03 UTC] [ 3366.210177]  process_one_work+0x165/0x370
> [2018-12-24 00:07:03 UTC] [ 3366.210197]  worker_thread+0x49/0x3e0
> [2018-12-24 00:07:03 UTC] [ 3366.210216]  kthread+0xf8/0x130
> [2018-12-24 00:07:03 UTC] [ 3366.210235]  ? rescuer_thread+0x310/0x310
> [2018-12-24 00:07:03 UTC] [ 3366.210284]  ? kthread_bind+0x10/0x10
> [2018-12-24 00:07:04 UTC] [ 3366.210301]  ret_from_fork+0x35/0x40


-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 12:05 ` Tim Smith
@ 2019-01-09 13:31   ` Andreas Gruenbacher
  2019-01-09 13:43     ` Mark Syms
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Gruenbacher @ 2019-01-09 13:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Mark and Tim,

On Wed, 9 Jan 2019 at 13:05, Tim Smith <tim.smith@citrix.com> wrote:
> On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote:
> > Hi,
> >
> > We've seen this in testing with 4.19.
> >
> > Full trace is at the bottom.
> >
> > Looking at the code though it looks like it will assert if the value of
> > change is equal to the number of blocks currently allocated to the inode.
> > Is this expected or should the assert be using >= instead of > ?
>
> It's weirder. Looking at
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> {
>         gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > -change));
>         change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
>         inode->i_blocks += change;
> }
>
> where the BUG is happening, "inode->i_blocks" is counting of 512b blocks and
> "change" is in units of whatever-the-superblock-said which will probably be
> counting 4096b blocks, so the comparison makes no sense.

indeed. I wonder why this hasn't been discovered long ago.

> It should probably read
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> {
>         change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
>         gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= -change));
>         inode->i_blocks += change;
> }
>
> so I'll make a patch for that unless someone wants to correct me.

You can just shift by (inode->i_blkbits - 9).

Can you still trigger the assert with this fix?

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 13:31   ` Andreas Gruenbacher
@ 2019-01-09 13:43     ` Mark Syms
  2019-01-09 15:35       ` Andreas Gruenbacher
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Syms @ 2019-01-09 13:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

We don't yet know how the assert got triggered as we've only seen it once and in the original form it looks like it would be very hard to trigger in any normal case (given that in default usage i_blocks should be at least 8 times what any putative value for change could be). So, for the assert to have triggered we've been asked to remove at least 8 times the number of blocks currently allocated to the inode. Possible causes could be a double release or some other higher level bug that will require further investigation to uncover.

	Mark.

-----Original Message-----
From: Andreas Gruenbacher <agruenba@redhat.com> 
Sent: 09 January 2019 13:31
To: Tim Smith <tim.smith@citrix.com>
Cc: cluster-devel <cluster-devel@redhat.com>; Igor Druzhinin <igor.druzhinin@citrix.com>; Mark Syms <Mark.Syms@citrix.com>
Subject: Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

Mark and Tim,

On Wed, 9 Jan 2019 at 13:05, Tim Smith <tim.smith@citrix.com> wrote:
> On Tuesday, 8 January 2019 13:32:20 GMT Mark Syms wrote:
> > Hi,
> >
> > We've seen this in testing with 4.19.
> >
> > Full trace is at the bottom.
> >
> > Looking at the code though it looks like it will assert if the value 
> > of change is equal to the number of blocks currently allocated to the inode.
> > Is this expected or should the assert be using >= instead of > ?
>
> It's weirder. Looking at
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 
> change) {
>         gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks > -change));
>         change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
>         inode->i_blocks += change;
> }
>
> where the BUG is happening, "inode->i_blocks" is counting of 512b 
> blocks and "change" is in units of whatever-the-superblock-said which 
> will probably be counting 4096b blocks, so the comparison makes no sense.

indeed. I wonder why this hasn't been discovered long ago.

> It should probably read
>
> static inline void gfs2_add_inode_blocks(struct inode *inode, s64 
> change) {
>         change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
>         gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >= -change));
>         inode->i_blocks += change;
> }
>
> so I'll make a patch for that unless someone wants to correct me.

You can just shift by (inode->i_blkbits - 9).

Can you still trigger the assert with this fix?

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 13:43     ` Mark Syms
@ 2019-01-09 15:35       ` Andreas Gruenbacher
  2019-01-09 17:14         ` Tim Smith
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Gruenbacher @ 2019-01-09 15:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 9 Jan 2019 at 14:43, Mark Syms <Mark.Syms@citrix.com> wrote:

> We don't yet know how the assert got triggered as we've only seen it once
> and in the original form it looks like it would be very hard to trigger in
> any normal case (given that in default usage i_blocks should be at least 8
> times what any putative value for change could be). So, for the assert to
> have triggered we've been asked to remove at least 8 times the number of
> blocks currently allocated to the inode. Possible causes could be a double
> release or some other higher level bug that will require further
> investigation to uncover.
>

The following change has at least survived xfstests:

--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
inode *inode)

 static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
 {
-    gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
-change));
-    change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
+    change <<= inode->i_blkbits - 9;
+    gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
     inode->i_blocks += change;
 }

Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20190109/3102115b/attachment.htm>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 15:35       ` Andreas Gruenbacher
@ 2019-01-09 17:14         ` Tim Smith
  2019-01-09 17:18           ` Steven Whitehouse
  2019-01-09 17:24           ` Andreas Gruenbacher
  0 siblings, 2 replies; 10+ messages in thread
From: Tim Smith @ 2019-01-09 17:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:
> On Wed, 9 Jan 2019 at 14:43, Mark Syms <Mark.Syms@citrix.com> wrote:
> > We don't yet know how the assert got triggered as we've only seen it once
> > and in the original form it looks like it would be very hard to trigger in
> > any normal case (given that in default usage i_blocks should be at least 8
> > times what any putative value for change could be). So, for the assert to
> > have triggered we've been asked to remove at least 8 times the number of
> > blocks currently allocated to the inode. Possible causes could be a double
> > release or some other higher level bug that will require further
> > investigation to uncover.
> 
> The following change has at least survived xfstests:
> 
> --- a/fs/gfs2/inode.h
> +++ b/fs/gfs2/inode.h
> @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
> inode *inode)
> 
>  static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
>  {
> -    gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
> -change));
> -    change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> +    change <<= inode->i_blkbits - 9;
> +    gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
> inode->i_blocks += change;
>  }
> 
> Andreas

I'll use 

change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);

for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a 
bit.

Given what it was like before, either i_blocks was already 0 or -change 
somehow became stupidly huge. Anything else looks like it would be hard to 
reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize == 
GFS2_BASIC_BLOCK) which we don't do.

I'll try to work out what could have caused it and see if we can provoke it 
again.

Out of curiosity I did a few tests where I created a file on GFS2, copied /
dev/null on top of it, and then ran stat on the file. It seems like GFS2 never 
frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks 
in use for a zero-length file depending on the underlying filesystem block 
size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so 
maybe some corner case where it *is* trying to do that is the root of the 
problem.

-- 
Tim Smith <tim.smith@citrix.com>




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 17:14         ` Tim Smith
@ 2019-01-09 17:18           ` Steven Whitehouse
  2019-01-09 17:24           ` Andreas Gruenbacher
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Whitehouse @ 2019-01-09 17:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

On 09/01/2019 17:14, Tim Smith wrote:
> On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:
>> On Wed, 9 Jan 2019 at 14:43, Mark Syms <Mark.Syms@citrix.com> wrote:
>>> We don't yet know how the assert got triggered as we've only seen it once
>>> and in the original form it looks like it would be very hard to trigger in
>>> any normal case (given that in default usage i_blocks should be at least 8
>>> times what any putative value for change could be). So, for the assert to
>>> have triggered we've been asked to remove at least 8 times the number of
>>> blocks currently allocated to the inode. Possible causes could be a double
>>> release or some other higher level bug that will require further
>>> investigation to uncover.
>> The following change has at least survived xfstests:
>>
>> --- a/fs/gfs2/inode.h
>> +++ b/fs/gfs2/inode.h
>> @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
>> inode *inode)
>>
>>   static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
>>   {
>> -    gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
>> -change));
>> -    change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
>> +    change <<= inode->i_blkbits - 9;
>> +    gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
>> inode->i_blocks += change;
>>   }
>>
>> Andreas
> I'll use
>
> change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);
>
> for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a
> bit.
>
> Given what it was like before, either i_blocks was already 0 or -change
> somehow became stupidly huge. Anything else looks like it would be hard to
> reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize ==
> GFS2_BASIC_BLOCK) which we don't do.
>
> I'll try to work out what could have caused it and see if we can provoke it
> again.
>
> Out of curiosity I did a few tests where I created a file on GFS2, copied /
> dev/null on top of it, and then ran stat on the file. It seems like GFS2 never
> frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks
> in use for a zero-length file depending on the underlying filesystem block
> size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so
> maybe some corner case where it *is* trying to do that is the root of the
> problem.
>
That is because gfs2 uses a block for each inode, so it makes sense to 
account for it in that way. For ext* the inodes are in separate areas of 
the disk, and they only use up a partial block, and they are also 
counted separately too. So it is a historical design difference I think,

Steve.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 17:14         ` Tim Smith
  2019-01-09 17:18           ` Steven Whitehouse
@ 2019-01-09 17:24           ` Andreas Gruenbacher
  2019-01-09 18:43             ` Mark Syms
  1 sibling, 1 reply; 10+ messages in thread
From: Andreas Gruenbacher @ 2019-01-09 17:24 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 9 Jan 2019 at 18:14, Tim Smith <tim.smith@citrix.com> wrote:
>
> On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:
> > On Wed, 9 Jan 2019 at 14:43, Mark Syms <Mark.Syms@citrix.com> wrote:
> > > We don't yet know how the assert got triggered as we've only seen it once
> > > and in the original form it looks like it would be very hard to trigger in
> > > any normal case (given that in default usage i_blocks should be at least 8
> > > times what any putative value for change could be). So, for the assert to
> > > have triggered we've been asked to remove at least 8 times the number of
> > > blocks currently allocated to the inode. Possible causes could be a double
> > > release or some other higher level bug that will require further
> > > investigation to uncover.
> >
> > The following change has at least survived xfstests:
> >
> > --- a/fs/gfs2/inode.h
> > +++ b/fs/gfs2/inode.h
> > @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
> > inode *inode)
> >
> >  static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> >  {
> > -    gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
> > -change));
> > -    change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> > +    change <<= inode->i_blkbits - 9;
> > +    gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
> > inode->i_blocks += change;
> >  }
> >
> > Andreas
>
> I'll use
>
> change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);
>
> for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a
> bit.
>
> Given what it was like before, either i_blocks was already 0 or -change
> somehow became stupidly huge. Anything else looks like it would be hard to
> reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize ==
> GFS2_BASIC_BLOCK) which we don't do.
>
> I'll try to work out what could have caused it and see if we can provoke it
> again.
>
> Out of curiosity I did a few tests where I created a file on GFS2, copied /
> dev/null on top of it, and then ran stat on the file. It seems like GFS2 never
> frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks
> in use for a zero-length file depending on the underlying filesystem block
> size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so
> maybe some corner case where it *is* trying to do that is the root of the
> problem.

Yes, that's intentional. An empty file without extended attributes
consumes one block. Extended attributes consume at least one
additional block.

Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 17:24           ` Andreas Gruenbacher
@ 2019-01-09 18:43             ` Mark Syms
  2019-01-09 20:00               ` Andreas Gruenbacher
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Syms @ 2019-01-09 18:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

So, actually the original comparison in the assert (dodgy scaling factor not withstanding) was probably correct in that we don't ever want to remove all blocks from the inode as one of them is used for the inode itself? Or do we still think it should allow for change to be the negative of current blocks?

Mark
________________________________
From: Andreas Gruenbacher <agruenba@redhat.com>
Sent: Wednesday, 9 January 2019 17:24
To: Tim Smith <tim.smith@citrix.com>
CC: Mark Syms <Mark.Syms@citrix.com>,cluster-devel <cluster-devel@redhat.com>,Igor Druzhinin <igor.druzhinin@citrix.com>
Subject: Re: [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64


On Wed, 9 Jan 2019 at 18:14, Tim Smith <tim.smith@citrix.com> wrote:
>
> On Wednesday, 9 January 2019 15:35:05 GMT Andreas Gruenbacher wrote:
> > On Wed, 9 Jan 2019 at 14:43, Mark Syms <Mark.Syms@citrix.com> wrote:
> > > We don't yet know how the assert got triggered as we've only seen it once
> > > and in the original form it looks like it would be very hard to trigger in
> > > any normal case (given that in default usage i_blocks should be at least 8
> > > times what any putative value for change could be). So, for the assert to
> > > have triggered we've been asked to remove at least 8 times the number of
> > > blocks currently allocated to the inode. Possible causes could be a double
> > > release or some other higher level bug that will require further
> > > investigation to uncover.
> >
> > The following change has at least survived xfstests:
> >
> > --- a/fs/gfs2/inode.h
> > +++ b/fs/gfs2/inode.h
> > @@ -61,8 +61,8 @@ static inline u64 gfs2_get_inode_blocks(const struct
> > inode *inode)
> >
> >  static inline void gfs2_add_inode_blocks(struct inode *inode, s64 change)
> >  {
> > -    gfs2_assert(GFS2_SB(inode), (change >= 0 || inode->i_blocks >
> > -change));
> > -    change *= (GFS2_SB(inode)->sd_sb.sb_bsize/GFS2_BASIC_BLOCK);
> > +    change <<= inode->i_blkbits - 9;
> > +    gfs2_assert(GFS2_SB(inode), change >= 0 || inode->i_blocks >= -change);
> > inode->i_blocks += change;
> >  }
> >
> > Andreas
>
> I'll use
>
> change <<= (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT);
>
> for consistency with the gfs2_get/set_inode_blocks(). I'll send the patch in a
> bit.
>
> Given what it was like before, either i_blocks was already 0 or -change
> somehow became stupidly huge. Anything else looks like it would be hard to
> reproduce without mkfs.gfs2 -b 512 (so that GFS2_SB(inode)->sd_sb.sb_bsize ==
> GFS2_BASIC_BLOCK) which we don't do.
>
> I'll try to work out what could have caused it and see if we can provoke it
> again.
>
> Out of curiosity I did a few tests where I created a file on GFS2, copied /
> dev/null on top of it, and then ran stat on the file. It seems like GFS2 never
> frees the last allocation on truncate; stat always reports 1, 2, 4 or 8 blocks
> in use for a zero-length file depending on the underlying filesystem block
> size, unlike (say) ext3/4 where it reports 0. I presume this is intentional so
> maybe some corner case where it *is* trying to do that is the root of the
> problem.

Yes, that's intentional. An empty file without extended attributes
consumes one block. Extended attributes consume at least one
additional block.

Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20190109/8e7dc1ff/attachment.htm>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64
  2019-01-09 18:43             ` Mark Syms
@ 2019-01-09 20:00               ` Andreas Gruenbacher
  0 siblings, 0 replies; 10+ messages in thread
From: Andreas Gruenbacher @ 2019-01-09 20:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 9 Jan 2019 at 19:44, Mark Syms <Mark.Syms@citrix.com> wrote:
>
> So, actually the original comparison in the assert (dodgy scaling factor not withstanding) was probably correct in that we don't ever want to remove all blocks from the inode as one of them is used for the inode itself? Or do we still think it should allow for change to be the negative of current blocks?

It doesn't make a difference; all we care about is that we don't go
negative. I think the updated comparison better reflects that goal.

Andreas



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-09 20:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-08 13:32 [Cluster-devel] kernel BUG at fs/gfs2/inode.h:64 Mark Syms
2019-01-09 12:05 ` Tim Smith
2019-01-09 13:31   ` Andreas Gruenbacher
2019-01-09 13:43     ` Mark Syms
2019-01-09 15:35       ` Andreas Gruenbacher
2019-01-09 17:14         ` Tim Smith
2019-01-09 17:18           ` Steven Whitehouse
2019-01-09 17:24           ` Andreas Gruenbacher
2019-01-09 18:43             ` Mark Syms
2019-01-09 20:00               ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.