Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* potential data race on `delayed_rsv->full`
@ 2019-10-15 18:33 Meng Xu
  2019-11-01 15:45 ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Meng Xu @ 2019-10-15 18:33 UTC (permalink / raw)
  To: linux-btrfs

Hi Btrfs maintainers,

I am reporting a potential data race around the `delayed_rsv->full` field.

[thread 1] mount a btrfs image, a kernel thread of uuid_rescan will be created

btrfs_uuid_rescan_kthread
  btrfs_end_transaction
    __btrfs_end_transaction
      btrfs_trans_release_metadata
        btrfs_block_rsv_release
          __btrfs_block_rsv_release
            --> [READ] else if (block_rsv != global_rsv && !delayed_rsv->full)
                                                            ^^^^^^^^^^^^^^^^^


[thread 2] do a mkdir syscall on the mounted image

__do_sys_mkdir
  do_mkdirat
    vfs_mkdir
      btrfs_mkdir
        btrfs_new_inode
          btrfs_insert_empty_items
            btrfs_cow_block
              __btrfs_cow_block
                alloc_tree_block_no_bg_flush
                  btrfs_alloc_tree_block
                    btrfs_add_delayed_tree_ref
                      btrfs_update_delayed_refs_rsv
                        --> [WRITE] delayed_rsv->full = 0;
                                    ^^^^^^^^^^^^^^^^^^^^^


I could confirm that this is a data race by manually adding and adjusting
delays before the read and write statements although I am not very sure
about the implication of such a data race (e.g., crashing btrfs or causing
violations of assumptions). I would appreciate if you could help check on
this potential bug and advise whether this is a harmful data race or it
is intended.

Best Regards,
Meng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: potential data race on `delayed_rsv->full`
  2019-10-15 18:33 potential data race on `delayed_rsv->full` Meng Xu
@ 2019-11-01 15:45 ` David Sterba
  2019-11-01 17:09   ` Meng Xu
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2019-11-01 15:45 UTC (permalink / raw)
  To: Meng Xu; +Cc: linux-btrfs, josef

On Tue, Oct 15, 2019 at 02:33:11PM -0400, Meng Xu wrote:
> I am reporting a potential data race around the `delayed_rsv->full` field.

Thanks for the report.

> [thread 1] mount a btrfs image, a kernel thread of uuid_rescan will be created
> 
> btrfs_uuid_rescan_kthread
>   btrfs_end_transaction
>     __btrfs_end_transaction
>       btrfs_trans_release_metadata
>         btrfs_block_rsv_release
>           __btrfs_block_rsv_release
>             --> [READ] else if (block_rsv != global_rsv && !delayed_rsv->full)
>                                                             ^^^^^^^^^^^^^^^^^
> 
> 
> [thread 2] do a mkdir syscall on the mounted image
> 
> __do_sys_mkdir
>   do_mkdirat
>     vfs_mkdir
>       btrfs_mkdir
>         btrfs_new_inode
>           btrfs_insert_empty_items
>             btrfs_cow_block
>               __btrfs_cow_block
>                 alloc_tree_block_no_bg_flush
>                   btrfs_alloc_tree_block
>                     btrfs_add_delayed_tree_ref
>                       btrfs_update_delayed_refs_rsv
>                         --> [WRITE] delayed_rsv->full = 0;
>                                     ^^^^^^^^^^^^^^^^^^^^^
> 
> 
> I could confirm that this is a data race by manually adding and adjusting
> delays before the read and write statements although I am not very sure
> about the implication of such a data race (e.g., crashing btrfs or causing
> violations of assumptions). I would appreciate if you could help check on
> this potential bug and advise whether this is a harmful data race or it
> is intended.

The race is there, as the access is unprotected, but it does not seem to
have serious implications. The race is for space, which happens all the
time, and if the reservations cannot be satisfied there goes ENOSPC.

In this particular case I wonder if the uuid thread is important or if
this would happen with anything that calls btrfs_end_transaction.

Depending on the value of ->full, __btrfs_block_rsv_release decides
where to return the reservation, and block_rsv_release_bytes handles a
NULL pointer for block_rsv and if it's not NULL then it double checks
the full status under a lock.

So the unlocked and racy access is only advisory and in the worst case
the block reserve is found !full and becomes full in the meantime,
but properly handled.

I've CCed Josef to review the analysis.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: potential data race on `delayed_rsv->full`
  2019-11-01 15:45 ` David Sterba
@ 2019-11-01 17:09   ` Meng Xu
  2019-11-01 18:16     ` Josef Bacik
  0 siblings, 1 reply; 5+ messages in thread
From: Meng Xu @ 2019-11-01 17:09 UTC (permalink / raw)
  To: dsterba, linux-btrfs, josef

Hi David,

Thank you for the confirmation and the additional information.

I feel the same that this race may not lead to serious issues, but would 
rather prefer a confirmation from the developers. Thank you again for 
your time!

Best Regards,
Meng


On 11/1/19 11:45 AM, David Sterba wrote:
> On Tue, Oct 15, 2019 at 02:33:11PM -0400, Meng Xu wrote:
>> I am reporting a potential data race around the `delayed_rsv->full` field.
> Thanks for the report.
>
>> [thread 1] mount a btrfs image, a kernel thread of uuid_rescan will be created
>>
>> btrfs_uuid_rescan_kthread
>>    btrfs_end_transaction
>>      __btrfs_end_transaction
>>        btrfs_trans_release_metadata
>>          btrfs_block_rsv_release
>>            __btrfs_block_rsv_release
>>              --> [READ] else if (block_rsv != global_rsv && !delayed_rsv->full)
>>                                                              ^^^^^^^^^^^^^^^^^
>>
>>
>> [thread 2] do a mkdir syscall on the mounted image
>>
>> __do_sys_mkdir
>>    do_mkdirat
>>      vfs_mkdir
>>        btrfs_mkdir
>>          btrfs_new_inode
>>            btrfs_insert_empty_items
>>              btrfs_cow_block
>>                __btrfs_cow_block
>>                  alloc_tree_block_no_bg_flush
>>                    btrfs_alloc_tree_block
>>                      btrfs_add_delayed_tree_ref
>>                        btrfs_update_delayed_refs_rsv
>>                          --> [WRITE] delayed_rsv->full = 0;
>>                                      ^^^^^^^^^^^^^^^^^^^^^
>>
>>
>> I could confirm that this is a data race by manually adding and adjusting
>> delays before the read and write statements although I am not very sure
>> about the implication of such a data race (e.g., crashing btrfs or causing
>> violations of assumptions). I would appreciate if you could help check on
>> this potential bug and advise whether this is a harmful data race or it
>> is intended.
> The race is there, as the access is unprotected, but it does not seem to
> have serious implications. The race is for space, which happens all the
> time, and if the reservations cannot be satisfied there goes ENOSPC.
>
> In this particular case I wonder if the uuid thread is important or if
> this would happen with anything that calls btrfs_end_transaction.
>
> Depending on the value of ->full, __btrfs_block_rsv_release decides
> where to return the reservation, and block_rsv_release_bytes handles a
> NULL pointer for block_rsv and if it's not NULL then it double checks
> the full status under a lock.
>
> So the unlocked and racy access is only advisory and in the worst case
> the block reserve is found !full and becomes full in the meantime,
> but properly handled.
>
> I've CCed Josef to review the analysis.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: potential data race on `delayed_rsv->full`
  2019-11-01 17:09   ` Meng Xu
@ 2019-11-01 18:16     ` Josef Bacik
  2019-11-05 10:01       ` David Sterba
  0 siblings, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2019-11-01 18:16 UTC (permalink / raw)
  To: Meng Xu; +Cc: dsterba, linux-btrfs, josef

On Fri, Nov 01, 2019 at 01:09:30PM -0400, Meng Xu wrote:
> Hi David,
> 
> Thank you for the confirmation and the additional information.
> 
> I feel the same that this race may not lead to serious issues, but would
> rather prefer a confirmation from the developers. Thank you again for your
> time!
> 

Sorry I saw this while I was on vacation, I read through and determined that
there were no cases where this would bite us.  This is just used as a lock free
way to see if we should refill the delayed refs rsv.  Worst case we don't and
the next guy does it, it doesn't affect us in an practical way.

However given our recent fun with inode->i_size it may be worth it to wrap
access to ->full with WRITE_ONCE/READ_ONCE to make sure nothing squirrely
happens in the future.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: potential data race on `delayed_rsv->full`
  2019-11-01 18:16     ` Josef Bacik
@ 2019-11-05 10:01       ` David Sterba
  0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2019-11-05 10:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Meng Xu, linux-btrfs

On Fri, Nov 01, 2019 at 02:16:07PM -0400, Josef Bacik wrote:
> On Fri, Nov 01, 2019 at 01:09:30PM -0400, Meng Xu wrote:
> > Hi David,
> > 
> > Thank you for the confirmation and the additional information.
> > 
> > I feel the same that this race may not lead to serious issues, but would
> > rather prefer a confirmation from the developers. Thank you again for your
> > time!
> > 
> 
> Sorry I saw this while I was on vacation, I read through and determined that
> there were no cases where this would bite us.  This is just used as a lock free
> way to see if we should refill the delayed refs rsv.  Worst case we don't and
> the next guy does it, it doesn't affect us in an practical way.
> 
> However given our recent fun with inode->i_size it may be worth it to wrap
> access to ->full with WRITE_ONCE/READ_ONCE to make sure nothing squirrely
> happens in the future.  Thanks,

->full is used unlocked only in this one place, everywhere else it's
insde spinlock. With exception of
btrfs_clear_space_info_full/__find_space_info where it's only RCU
protection. Add in ONCE everywhere would be pointless, the one call can
be commented but adding READ_ONCE would not change anything as it's the
only access that must be loaded anyway.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15 18:33 potential data race on `delayed_rsv->full` Meng Xu
2019-11-01 15:45 ` David Sterba
2019-11-01 17:09   ` Meng Xu
2019-11-01 18:16     ` Josef Bacik
2019-11-05 10:01       ` David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git