All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: fdmanana@gmail.com, Qu Wenruo <wqu@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH RFC] btrfs: reflink: Flush before reflink any extent to prevent NOCOW write falling back to CoW without data reservation
Date: Sat, 4 May 2019 11:29:33 +0300	[thread overview]
Message-ID: <5e194c87-736a-ae0d-c7d5-d31420415a32@suse.com> (raw)
In-Reply-To: <CAL3q7H4xp9=Kw3Q1hoDz_2Tbek4NdaULhJX4s7wmUqmku=ex0A@mail.gmail.com>



On 3.05.19 г. 13:45 ч., Filipe Manana wrote:
> On Fri, May 3, 2019 at 11:18 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/5/3 下午5:21, Filipe Manana wrote:
>>> On Fri, May 3, 2019 at 2:46 AM Qu Wenruo <wqu@suse.com> wrote:
>>>
>>> What a great subject. The "reflink:" part is unnecessary, since the
>>> rest of the subject already mentions it, that makes it a bit shorter.
>>>
>>>>
>>>> [BUG]
>>>> The following command can lead to unexpected data COW:
>>>>
>>>>   #!/bin/bash
>>>>
>>>>   dev=/dev/test/test
>>>>   mnt=/mnt/btrfs
>>>>
>>>>   mkfs.btrfs -f $dev -b 1G > /dev/null
>>>>   mount $dev $mnt -o nospace_cache
>>>>
>>>>   xfs_io -f -c "falloc 8k 24k" -c "pwrite 12k 8k" $mnt/file1
>>>>   xfs_io -c "reflink $mnt/file1 8k 0 4k" $mnt/file1
>>>>   umount $dev
>>>>
>>>> The result extent will be
>>>>
>>>>         item 7 key (257 EXTENT_DATA 4096) itemoff 15760 itemsize 53
>>>>                 generation 6 type 2 (prealloc)
>>>>                 prealloc data disk byte 13631488 nr 28672
>>>>         item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53
>>>>                 generation 6 type 1 (regular)
>>>>                 extent data disk byte 13660160 nr 12288 <<< COW
>>>>         item 9 key (257 EXTENT_DATA 24576) itemoff 15654 itemsize 53
>>>>                 generation 6 type 2 (prealloc)
>>>>                 prealloc data disk byte 13631488 nr 28672
>>>>
>>>> Currently we always reserve space even for NOCOW buffered write, thus
>>>
>>> I would add 'data' between 'reserve' and 'space', to be clear.
>>>
>>>> under most case it shouldn't cause anything wrong even we fall back to
>>>> COW.
>>>
>>> even we ... -> even if we fallback to COW when running delalloc /
>>> starting writeback.
>>>
>>>>
>>>> However when we're out of data space, we fall back to skip data space if
>>>> we can do NOCOW write.
>>>
>>> we fall back to skip data space ... -> we fallback to NOCOW write
>>> without reserving data space.
>>>
>>>>
>>>> If such behavior happens under that case, we could hit the following
>>>> problems:
>>>
>>>> - data space bytes_may_use underflow
>>>>   This will cause kernel warning.
>>>
>>> Ok.
>>>
>>>>
>>>> - ENOSPC at delalloc time
>>>
>>> at delalloc time - that is an ambiguous term you use through the change log.
>>
>> In fact, I have a lot of uncertain terminology through kernel.
>>
>> Things like flush get referred multiple times in different context (e.g.
>> filemap flush, flushoncommit, super block flush).
>>
>> If we have a terminology list, we can't be more happy to follow.
> 
> So, some is kernel wide while others are btrfs specific.
> 
> A buffered write creates dealloc - copies data to pages, marks the
> pages as dirty and tags the range in the extent io tree as dellaloc.
> Running delalloc, flushes writeback (starts IO for the dirty pages and
> tags them as under writeback) and does other necessary things (like
> reserving extents).
> When writeback finishes, we add a task to a work queue to run
> btrfs_finish_ordered_io - after that happens we say that the ordered
> extent completed.
> 
> It can get ambiguous very often.


That's why I have created the following document which (tries) to
explain this:

https://github.com/btrfs/btrfs-dev-docs/blob/master/delalloc.txt

It's not perfect but it's better than nothing, feel free to contribute
improvements.

< snip>

  reply	other threads:[~2019-05-04  8:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-03  1:08 [PATCH RFC] btrfs: reflink: Flush before reflink any extent to prevent NOCOW write falling back to CoW without data reservation Qu Wenruo
2019-05-03  9:21 ` Filipe Manana
2019-05-03 10:18   ` Qu Wenruo
2019-05-03 10:45     ` Filipe Manana
2019-05-04  8:29       ` Nikolay Borisov [this message]
2019-05-06  2:04       ` Qu Wenruo
2019-05-07  7:49         ` Nikolay Borisov
2019-05-07  8:56         ` Filipe Manana
2019-05-07 11:13           ` Qu Wenruo
2019-05-07 11:36             ` Filipe Manana
2019-05-03 21:56 ` Zygo Blaxell
2019-05-04  0:32   ` Qu Wenruo
2019-05-05 15:07     ` Zygo Blaxell
2019-05-05 16:24       ` Filipe Manana
2019-05-06  0:06         ` Qu Wenruo
2019-05-07 17:36 ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e194c87-736a-ae0d-c7d5-d31420415a32@suse.com \
    --to=nborisov@suse.com \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.