* Possible UDF locking error? @ 2019-03-23 20:14 Steve Magnani 2019-03-25 16:42 ` Jan Kara 0 siblings, 1 reply; 5+ messages in thread From: Steve Magnani @ 2019-03-23 20:14 UTC (permalink / raw) To: Jan Kara, linux-kernel, linux-fsdevel Hi, I have been hunting a UDF bug that occasionally results in generation of an Allocation Extent Descriptor with an incorrect tagLocation. So far I haven't been able to see a path through the code that could cause that. But, I noticed some inconsistency in locking during AED generation and wonder if it could result in random corruption. The function udf_update_inode() has this general pattern: bh = udf_tgetblk(...); // calls sb_getblk() lock_buffer(bh); memset(bh->b_data, 0, inode->i_sb->s_blocksize); // <snip>other code to populate FE/EFE data in the block</snip> set_buffer_uptodate(bh); unlock_buffer(bh); mark_buffer_dirty(bh); This I can understand - the lock is held for as long as the buffer contents are being assembled. In contrast, udf_setup_indirect_aext(), which constructs an AED, has this sequence: bh = udf_tgetblk(...); // calls sb_getblk() lock_buffer(bh); memset(bh->b_data, 0, inode->i_sb->s_blocksize); set_buffer_uptodate(bh); unlock_buffer(bh); mark_buffer_dirty_inode(bh); // <snip>other code to populate AED data in the block</snip> In this case the population of the block occurs without the protection of the lock. Because the block has been marked dirty, does this mean that writeback could occur at any point during population? There is one path through udf_setup_indirect_aext() where mark_buffer_dirty_inode() gets called again after population is complete, which I suppose could heal a partial writeout, but there is also another path in which the buffer does not get marked dirty again. Regards, ------------------------------------------------------------------------ Steven J. Magnani "I claim this network for MARS! www.digidescorp.com Earthling, return my space modulator!" #include <standard.disclaimer> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible UDF locking error? 2019-03-23 20:14 Possible UDF locking error? Steve Magnani @ 2019-03-25 16:42 ` Jan Kara 2019-03-25 18:23 ` Steve Magnani 2019-03-30 19:49 ` Steve Magnani 0 siblings, 2 replies; 5+ messages in thread From: Jan Kara @ 2019-03-25 16:42 UTC (permalink / raw) To: Steve Magnani; +Cc: Jan Kara, linux-kernel, linux-fsdevel Hi! On Sat 23-03-19 15:14:05, Steve Magnani wrote: > I have been hunting a UDF bug that occasionally results in generation > of an Allocation Extent Descriptor with an incorrect tagLocation. So > far I haven't been able to see a path through the code that could > cause that. But, I noticed some inconsistency in locking during > AED generation and wonder if it could result in random corruption. > > The function udf_update_inode() has this general pattern: > > bh = udf_tgetblk(...); // calls sb_getblk() > lock_buffer(bh); > memset(bh->b_data, 0, inode->i_sb->s_blocksize); > // <snip>other code to populate FE/EFE data in the block</snip> > set_buffer_uptodate(bh); > unlock_buffer(bh); > mark_buffer_dirty(bh); > > This I can understand - the lock is held for as long as the buffer > contents are being assembled. > > In contrast, udf_setup_indirect_aext(), which constructs an AED, > has this sequence: > > bh = udf_tgetblk(...); // calls sb_getblk() > lock_buffer(bh); > memset(bh->b_data, 0, inode->i_sb->s_blocksize); > > set_buffer_uptodate(bh); > unlock_buffer(bh); > mark_buffer_dirty_inode(bh); > > // <snip>other code to populate AED data in the block</snip> > > In this case the population of the block occurs without > the protection of the lock. > > Because the block has been marked dirty, does this mean that > writeback could occur at any point during population? Yes. Thanks for noticing this! > There is one path through udf_setup_indirect_aext() where > mark_buffer_dirty_inode() gets called again after population is > complete, which I suppose could heal a partial writeout, but there is > also another path in which the buffer does not get marked dirty again. Generally, we add new extents to the created indirect extent which dirties the buffer and that should fix the problem. But you are definitely right that the code is suspicious and should be fixed. Will you send a patch? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible UDF locking error? 2019-03-25 16:42 ` Jan Kara @ 2019-03-25 18:23 ` Steve Magnani 2019-03-30 19:49 ` Steve Magnani 1 sibling, 0 replies; 5+ messages in thread From: Steve Magnani @ 2019-03-25 18:23 UTC (permalink / raw) To: Jan Kara; +Cc: linux-kernel, linux-fsdevel On 3/25/19 11:42 AM, Jan Kara wrote: > Hi! > > On Sat 23-03-19 15:14:05, Steve Magnani wrote: >> ... >> >> In contrast, udf_setup_indirect_aext(), which constructs an AED, >> has this sequence: >> >> bh = udf_tgetblk(...); // calls sb_getblk() >> lock_buffer(bh); >> memset(bh->b_data, 0, inode->i_sb->s_blocksize); >> >> set_buffer_uptodate(bh); >> unlock_buffer(bh); >> mark_buffer_dirty_inode(bh); >> >> // <snip>other code to populate AED data in the block</snip> >> >> In this case the population of the block occurs without >> the protection of the lock. >> >> Because the block has been marked dirty, does this mean that >> writeback could occur at any point during population? > Yes. Thanks for noticing this! > >> There is one path through udf_setup_indirect_aext() where >> mark_buffer_dirty_inode() gets called again after population is >> complete, which I suppose could heal a partial writeout, but there is >> also another path in which the buffer does not get marked dirty again. > Generally, we add new extents to the created indirect extent which dirties > the buffer and that should fix the problem. But you are definitely right > that the code is suspicious and should be fixed. Will you send a patch? > > Honza Sure. There's at least one other place where it looked like there might be a similar issue. Steve ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible UDF locking error? 2019-03-25 16:42 ` Jan Kara 2019-03-25 18:23 ` Steve Magnani @ 2019-03-30 19:49 ` Steve Magnani 2019-04-03 8:07 ` Jan Kara 1 sibling, 1 reply; 5+ messages in thread From: Steve Magnani @ 2019-03-30 19:49 UTC (permalink / raw) To: Jan Kara; +Cc: linux-kernel, linux-fsdevel Jan - On 3/25/19 11:42 AM, Jan Kara wrote: > Hi! > > On Sat 23-03-19 15:14:05, Steve Magnani wrote: >> I have been hunting a UDF bug that occasionally results in generation >> of an Allocation Extent Descriptor with an incorrect tagLocation. So >> far I haven't been able to see a path through the code that could >> cause that. But, I noticed some inconsistency in locking during >> AED generation and wonder if it could result in random corruption. >> >> The function udf_update_inode() has this general pattern: >> >> bh = udf_tgetblk(...); // calls sb_getblk() >> lock_buffer(bh); >> memset(bh->b_data, 0, inode->i_sb->s_blocksize); >> // <snip>other code to populate FE/EFE data in the block</snip> >> set_buffer_uptodate(bh); >> unlock_buffer(bh); >> mark_buffer_dirty(bh); >> >> This I can understand - the lock is held for as long as the buffer >> contents are being assembled. >> >> In contrast, udf_setup_indirect_aext(), which constructs an AED, >> has this sequence: >> >> bh = udf_tgetblk(...); // calls sb_getblk() >> lock_buffer(bh); >> memset(bh->b_data, 0, inode->i_sb->s_blocksize); >> >> set_buffer_uptodate(bh); >> unlock_buffer(bh); >> mark_buffer_dirty_inode(bh); >> >> // <snip>other code to populate AED data in the block</snip> >> >> In this case the population of the block occurs without >> the protection of the lock. >> >> Because the block has been marked dirty, does this mean that >> writeback could occur at any point during population? > Yes. Thanks for noticing this! > >> There is one path through udf_setup_indirect_aext() where >> mark_buffer_dirty_inode() gets called again after population is >> complete, which I suppose could heal a partial writeout, but there is >> also another path in which the buffer does not get marked dirty again. > Generally, we add new extents to the created indirect extent which dirties > the buffer and that should fix the problem. But you are definitely right > that the code is suspicious and should be fixed. Will you send a patch? I did a little archaeology to see how the code evolved to this point. It's been like this a long time. I also did some research to understand why filesystems use lock_buffer() sometimes but not others. For example, the FAT driver never calls it. I ran across this thread from 2011: https://lkml.org/lkml/2011/5/16/402 ...from which I conclude that while it is correct in a strict sense to hold a lock on a buffer any time its contents are being modified, performance considerations make it preferable (or at least reasonable) to make some modifications without a lock provided it's known that a subsequent write-out will "fix" any potential partial write out before anyone else tries to read the block. I doubt that UDF sees common use with DIF/DIX block devices, which might make a decision in favor of performance a little easier. Since the FAT driver doesn't contain Darrick's proposed changes I assume a decision was made that performance was more important there. Certainly the call to udf_setup_indirect_aext() from udf_add_aext() meets those criteria. But udf_table_free_blocks() may not dirty the AED block. So if this looks reasonable I will resend as a formal patch: --- a/fs/udf/inode.c 2019-03-30 11:28:38.637759458 -0500 +++ b/fs/udf/inode.c 2019-03-30 11:33:00.357761250 -0500 @@ -1873,9 +1873,6 @@ int udf_setup_indirect_aext(struct inode return -EIO; lock_buffer(bh); memset(bh->b_data, 0x00, sb->s_blocksize); - set_buffer_uptodate(bh); - unlock_buffer(bh); - mark_buffer_dirty_inode(bh, inode); aed = (struct allocExtDesc *)(bh->b_data); if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT)) { @@ -1890,6 +1887,9 @@ int udf_setup_indirect_aext(struct inode udf_new_tag(bh->b_data, TAG_IDENT_AED, ver, 1, block, sizeof(struct tag)); + set_buffer_uptodate(bh); + unlock_buffer(bh); + nepos.block = neloc; nepos.offset = sizeof(struct allocExtDesc); nepos.bh = bh; @@ -1913,6 +1913,8 @@ int udf_setup_indirect_aext(struct inode } else { __udf_add_aext(inode, epos, &nepos.block, sb->s_blocksize | EXT_NEXT_EXTENT_ALLOCDECS, 0); + /* Make sure completed AED gets written out */ + mark_buffer_dirty_inode(nepos.bh, inode); } brelse(epos->bh); ------------------------------------------------------------------------ Steven J. Magnani "I claim this network for MARS! www.digidescorp.com Earthling, return my space modulator!" #include <standard.disclaimer> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible UDF locking error? 2019-03-30 19:49 ` Steve Magnani @ 2019-04-03 8:07 ` Jan Kara 0 siblings, 0 replies; 5+ messages in thread From: Jan Kara @ 2019-04-03 8:07 UTC (permalink / raw) To: Steve Magnani; +Cc: Jan Kara, linux-kernel, linux-fsdevel Hi, On Sat 30-03-19 14:49:46, Steve Magnani wrote: > On 3/25/19 11:42 AM, Jan Kara wrote: > > Hi! > > > > On Sat 23-03-19 15:14:05, Steve Magnani wrote: > > > I have been hunting a UDF bug that occasionally results in generation > > > of an Allocation Extent Descriptor with an incorrect tagLocation. So > > > far I haven't been able to see a path through the code that could > > > cause that. But, I noticed some inconsistency in locking during > > > AED generation and wonder if it could result in random corruption. > > > > > > The function udf_update_inode() has this general pattern: > > > > > > bh = udf_tgetblk(...); // calls sb_getblk() > > > lock_buffer(bh); > > > memset(bh->b_data, 0, inode->i_sb->s_blocksize); > > > // <snip>other code to populate FE/EFE data in the block</snip> > > > set_buffer_uptodate(bh); > > > unlock_buffer(bh); > > > mark_buffer_dirty(bh); > > > > > > This I can understand - the lock is held for as long as the buffer > > > contents are being assembled. > > > > > > In contrast, udf_setup_indirect_aext(), which constructs an AED, > > > has this sequence: > > > > > > bh = udf_tgetblk(...); // calls sb_getblk() > > > lock_buffer(bh); > > > memset(bh->b_data, 0, inode->i_sb->s_blocksize); > > > > > > set_buffer_uptodate(bh); > > > unlock_buffer(bh); > > > mark_buffer_dirty_inode(bh); > > > > > > // <snip>other code to populate AED data in the block</snip> > > > > > > In this case the population of the block occurs without > > > the protection of the lock. > > > > > > Because the block has been marked dirty, does this mean that > > > writeback could occur at any point during population? > > Yes. Thanks for noticing this! > > > > > There is one path through udf_setup_indirect_aext() where > > > mark_buffer_dirty_inode() gets called again after population is > > > complete, which I suppose could heal a partial writeout, but there is > > > also another path in which the buffer does not get marked dirty again. > > Generally, we add new extents to the created indirect extent which dirties > > the buffer and that should fix the problem. But you are definitely right > > that the code is suspicious and should be fixed. Will you send a patch? > > I did a little archaeology to see how the code evolved to this point. It's > been like this a long time. > > I also did some research to understand why filesystems use lock_buffer() > sometimes but not others. For example, the FAT driver never calls it. I ran > across this thread from 2011: > > https://lkml.org/lkml/2011/5/16/402 > > ...from which I conclude that while it is correct in a strict sense to hold > a lock on a buffer any time its contents are being modified, performance > considerations make it preferable (or at least reasonable) to make some > modifications without a lock provided it's known that a subsequent write-out > will "fix" any potential partial write out before anyone else tries to read > the block. Understood but UDF (and neither FAT) are really that performance critical. If you look for performance, you'd certainly pick a different filesystem. UDF is mainly for data interchange so it should work reasonably for copy-in copy-out style of workloads, the rest isn't that important. So there correctness and simplicity is preferred over performance. > I doubt that UDF sees common use with DIF/DIX block devices, > which might make a decision in favor of performance a little easier. Since > the FAT driver doesn't contain Darrick's proposed changes I assume a > decision was made that performance was more important there. > > Certainly the call to udf_setup_indirect_aext() from udf_add_aext() meets > those criteria. But udf_table_free_blocks() may not dirty the AED block. > > So if this looks reasonable I will resend as a formal patch: > > --- a/fs/udf/inode.c 2019-03-30 11:28:38.637759458 -0500 > +++ b/fs/udf/inode.c 2019-03-30 11:33:00.357761250 -0500 > @@ -1873,9 +1873,6 @@ int udf_setup_indirect_aext(struct inode > return -EIO; > lock_buffer(bh); > memset(bh->b_data, 0x00, sb->s_blocksize); > - set_buffer_uptodate(bh); > - unlock_buffer(bh); > - mark_buffer_dirty_inode(bh, inode); > aed = (struct allocExtDesc *)(bh->b_data); > if (!UDF_QUERY_FLAG(sb, UDF_FLAG_STRICT)) { > @@ -1890,6 +1887,9 @@ int udf_setup_indirect_aext(struct inode > udf_new_tag(bh->b_data, TAG_IDENT_AED, ver, 1, block, > sizeof(struct tag)); > + set_buffer_uptodate(bh); > + unlock_buffer(bh); > + > nepos.block = neloc; > nepos.offset = sizeof(struct allocExtDesc); > nepos.bh = bh; > @@ -1913,6 +1913,8 @@ int udf_setup_indirect_aext(struct inode > } else { > __udf_add_aext(inode, epos, &nepos.block, > sb->s_blocksize | EXT_NEXT_EXTENT_ALLOCDECS, 0); > + /* Make sure completed AED gets written out */ > + mark_buffer_dirty_inode(nepos.bh, inode); Why do you mark the buffer as dirty only here? I'd just mark it dirty after unlocking. If __udf_add_aext() or udf_write_aext() modify the buffer, they will mark it as dirty as well... Thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-04-03 8:07 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-23 20:14 Possible UDF locking error? Steve Magnani 2019-03-25 16:42 ` Jan Kara 2019-03-25 18:23 ` Steve Magnani 2019-03-30 19:49 ` Steve Magnani 2019-04-03 8:07 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).