All of lore.kernel.org
 help / color / mirror / Atom feed
From: robbieko <robbieko@synology.com>
To: fdmanana@gmail.com
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs : send, truncate first to enhance many small files
Date: Fri, 08 Dec 2017 14:29:18 +0800	[thread overview]
Message-ID: <b900ebec9dbb4adb487f47ac9bd32ac7@synology.com> (raw)
In-Reply-To: <CAL3q7H7nzjNpd8WWhemWzeRT3A3cVAnoKOBJxE8fHou1WgVQWw@mail.gmail.com>

Hi Filipe David Manana,

I'm sorry to reply so late, your patch is good for me.
Will you submit this patch to the upstream?
In addition, you mentioned other optimization, Can you share it?

I have another case, find_extent_clone will be too slow when there is 
too much backref in the big file.
1G file with fallocate, have 4 extent_item, then do some 4k random write 
IO, finally total 76767 EXTENT_DATA.
My test ignored find_extent_clone, much faster.
original: 16m8.387s
ignore find_extent_clone: 12s
Do you have a better solution?

Thanks.
Robbie Ko

Filipe Manana 於 2017-12-04 20:08 寫到:
> On Mon, Dec 4, 2017 at 11:28 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> 
> wrote:
>> 
>> 
>> On 2017年12月04日 19:13, Filipe Manana wrote:
>>> On Mon, Dec 4, 2017 at 7:19 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> 
>>> wrote:
>>>> 
>>>> 
>>>> On 2017年12月04日 15:02, robbieko wrote:
>>>>> From: Robbie Ko <robbieko@synology.com>
>>>>> 
>>>>> The commands generated by send contain the following step:
>>>>> 1. mkfile o1851-19-0
>>>>> 2. rename o1851-19-0 -> alsa-driver/alsa-kernel/isa/es1688/es1688.c
>>>>> 3. set_xattr alsa-driver/alsa-kernel/isa/es1688/es1688.c - 
>>>>> name=user.xattr data_len=4 data=test
>>>>> 4. write alsa-driver/alsa-kernel/isa/es1688/es1688.c - offset=0, 
>>>>> len=10458
>>>>> 5. truncate alsa-driver/alsa-kernel/isa/es1688/es1688.c size=10458
>>>>> 6. chown alsa-driver/alsa-kernel/isa/es1688/es1688.c - uid=1024, 
>>>>> gid=100
>>>>> 7. chmod alsa-driver/alsa-kernel/isa/es1688/es1688.c - mode=0644
>>>>> 8. utimes alsa-driver/alsa-kernel/isa/es1688/es1688.c
>>>>> 
>>>>> After writing file content, it will truncate file to the correct 
>>>>> size.
>>>>> Btrfs truncate will flush last page if size does not align to 
>>>>> sectorsize,
>>>>> and this will cause receive process to wait until flush finishes.
>>>>> In order to avoid waiting flushing data.This patch changes the 
>>>>> order so
>>>>> that truncate command is sent before write command.
>>>> 
>>>> Personally speaking, it's better to optimize the receive side.
>>>> 
>>>> For example, at receive side, if we already know that the file size 
>>>> is
>>>> not changed at all, then just skip the truncate command.
>>>> 
>>>> In your send dump, step 5 is not needed at all, and can be skipped 
>>>> to
>>>> speed up the receive procedure.
>>> 
>>> Better yet, is to not send the truncate commands at all, reduces the
>>> stream size and saves all receiver implementations
>>> from having such logic (which further requires stat calls to figure
>>> out the current file size).
>> 
>> Another one is to optimize btrfs file truncate to avoid the 
>> unnecessary
>> page writeback.
>> 
>> But at least, doing optimization in receive side is easier and less
>> bug-prone.
> 
> Not really, for this case at least.
>> 
>> If we could do the same in both send and receive side, then I prefer
>> optimization in receive side.
> 
> Can't agree much with that, for the reasons mentioned before.
> 
>> 
>> Thanks,
>> Qu
>> 
>>> 
>>> I still have a local branch around with several optimizations (from
>>> the time I was a volunteer), with one of them being precisely to 
>>> avoid
>>> unnecessary truncate commands, however I never ended up getting to 
>>> the
>>> point of doing benchmarks.
>>> Pasted below and at 
>>> https://www.friendpaste.com/1J1TuXf2wlU1125mX57k4r
>>> 
>>> 
>>> commit 362290ea03b52e4ce4edb43453a51a3b1f1cd33f
>>> Author: Filipe David Borba Manana <fdmanana@gmail.com>
>>> Date:   Sat Apr 5 22:34:45 2014 +0100
>>> 
>>>     Btrfs: send, don't issue unnecessary file truncate commands
>>> 
>>>     Every time a file is created or updated, send (either incremental
>>> or not) always
>>>     issues a file truncate command. In several commons scenarios this
>>> truncate is not
>>>     needed and yet it was still issued all the time. The unnecessary
>>> cases where the
>>>     truncate command is not needed are:
>>> 
>>>     * We have a new file, and its last extent isn't a hole, so that
>>> its last write
>>>       command ends at the file's size;
>>> 
>>>     * For an incremental send, we updated in place an existing file,
>>> without changing
>>>       its size;
>>> 
>>>     * For an incremental send, we updated the file, increased its
>>> size, and the last
>>>       file extent item doesn't correspond to a hole. In this case the 
>>> last write
>>>       command against the file ends at the file's new size;
>>> 
>>>     * For an incremental send, we didn't update the file's data at
>>> all, we only changed
>>>       its mode, group or access/update timestamps.
>>> 
>>>     Therefore don't send truncate commands for these cases, reducing
>>> the stream's size
>>>     and saving time.
>>> 
>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>> index fa378c7ebffd..46f52b4bbc21 100644
>>> --- a/fs/btrfs/send.c
>>> +++ b/fs/btrfs/send.c
>>> @@ -120,6 +120,7 @@ struct send_ctx {
>>>   u64 cur_inode_mode;
>>>   u64 cur_inode_rdev;
>>>   u64 cur_inode_last_extent;
>>> + u64 cur_inode_max_write_end;
>>> 
>>>   u64 send_progress;
>>>   u64 total_data_size;
>>> @@ -4494,6 +4495,8 @@ static int send_hole(struct send_ctx *sctx, u64 
>>> end)
>>>   break;
>>>   offset += len;
>>>   }
>>> + sctx->cur_inode_max_write_end = max(offset,
>>> +    sctx->cur_inode_max_write_end);
>>>  tlv_put_failure:
>>>   fs_path_free(p);
>>>   return ret;
>>> @@ -4529,6 +4532,10 @@ static int send_write_or_clone(struct send_ctx 
>>> *sctx,
>>>   len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
>>>   }
>>> 
>>> + if (offset >= sctx->cur_inode_size) {
>>> + ret = 0;
>>> + goto out;
>>> + }
>>>   if (offset + len > sctx->cur_inode_size)
>>>   len = sctx->cur_inode_size - offset;
>>>   if (len == 0) {
>>> @@ -4560,6 +4567,8 @@ static int send_write_or_clone(struct send_ctx 
>>> *sctx,
>>>   }
>>>   ret = 0;
>>>   }
>>> + sctx->cur_inode_max_write_end = max(offset + len,
>>> +    sctx->cur_inode_max_write_end);
>>>  out:
>>>   return ret;
>>>  }
>>> @@ -4982,6 +4991,7 @@ static int finish_inode_if_needed(struct
>>> send_ctx *sctx, int at_end)
>>>   u64 right_gid;
>>>   int need_chmod = 0;
>>>   int need_chown = 0;
>>> + int need_truncate = 1;
>>>   int pending_move = 0;
>>>   int refs_processed = 0;
>>> 
>>> @@ -5022,9 +5032,13 @@ static int finish_inode_if_needed(struct
>>> send_ctx *sctx, int at_end)
>>>   need_chown = 1;
>>>   if (!S_ISLNK(sctx->cur_inode_mode))
>>>   need_chmod = 1;
>>> + if (sctx->cur_inode_max_write_end == sctx->cur_inode_size)
>>> + need_truncate = 0;
>>>   } else {
>>> + u64 old_size;
>>> +
>>>   ret = get_inode_info(sctx->parent_root, sctx->cur_ino,
>>> - NULL, NULL, &right_mode, &right_uid,
>>> + &old_size, NULL, &right_mode, &right_uid,
>>>   &right_gid, NULL);
>>>   if (ret < 0)
>>>   goto out;
>>> @@ -5033,6 +5047,13 @@ static int finish_inode_if_needed(struct
>>> send_ctx *sctx, int at_end)
>>>   need_chown = 1;
>>>   if (!S_ISLNK(sctx->cur_inode_mode) && left_mode != right_mode)
>>>   need_chmod = 1;
>>> +
>>> + if (old_size == sctx->cur_inode_size &&
>>> +    sctx->cur_inode_max_write_end <= sctx->cur_inode_size)
>>> + need_truncate = 0;
>>> + else if (sctx->cur_inode_size > old_size &&
>>> + sctx->cur_inode_max_write_end == sctx->cur_inode_size)
>>> + need_truncate = 0;
>>>   }
>>> 
>>>  truncate_inode:
>>> @@ -5052,10 +5073,13 @@ static int finish_inode_if_needed(struct
>>> send_ctx *sctx, int at_end)
>>>   goto out;
>>>   }
>>>   }
>>> - ret = send_truncate(sctx, sctx->cur_ino, sctx->cur_inode_gen,
>>> - sctx->cur_inode_size);
>>> - if (ret < 0)
>>> - goto out;
>>> + if (need_truncate) {
>>> + ret = send_truncate(sctx, sctx->cur_ino,
>>> +    sctx->cur_inode_gen,
>>> +    sctx->cur_inode_size);
>>> + if (ret < 0)
>>> + goto out;
>>> + }
>>>   }
>>> 
>>>   if (need_chown) {
>>> @@ -5110,6 +5134,7 @@ static int changed_inode(struct send_ctx *sctx,
>>>   sctx->cur_ino = key->objectid;
>>>   sctx->cur_inode_new_gen = 0;
>>>   sctx->cur_inode_last_extent = (u64)-1;
>>> + sctx->cur_inode_max_write_end = 0;
>>> 
>>>   /*
>>>   * Set send_progress to current inode. This will tell all 
>>> get_cur_xxx
>>> 
>>>> 
>>>> Thanks,
>>>> Qu
>>>> 
>>>>> 
>>>>> Overall performance improves by 102 percent when sending 790000 
>>>>> small files.
>>>>> original: 32m45.311s
>>>>> patch: 16m8.387s
>>>>> 
>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>>> ---
>>>>>  fs/btrfs/send.c | 13 +++++++++----
>>>>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>>>> 
>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>> index 20d3300..7ae2347 100644
>>>>> --- a/fs/btrfs/send.c
>>>>> +++ b/fs/btrfs/send.c
>>>>> @@ -5857,10 +5857,6 @@ static int finish_inode_if_needed(struct 
>>>>> send_ctx *sctx, int at_end)
>>>>>                                       goto out;
>>>>>                       }
>>>>>               }
>>>>> -             ret = send_truncate(sctx, sctx->cur_ino, 
>>>>> sctx->cur_inode_gen,
>>>>> -                             sctx->cur_inode_size);
>>>>> -             if (ret < 0)
>>>>> -                     goto out;
>>>>>       }
>>>>> 
>>>>>       if (need_chown) {
>>>>> @@ -6044,6 +6040,15 @@ static int changed_inode(struct send_ctx 
>>>>> *sctx,
>>>>>                                       sctx->left_path->nodes[0], 
>>>>> left_ii);
>>>>>               }
>>>>>       }
>>>>> +     if (result == BTRFS_COMPARE_TREE_NEW ||
>>>>> +         result == BTRFS_COMPARE_TREE_CHANGED) {
>>>>> +             if (S_ISREG(sctx->cur_inode_mode)) {
>>>>> +                     ret = send_truncate(sctx, sctx->cur_ino, 
>>>>> sctx->cur_inode_gen,
>>>>> +                                             
>>>>> sctx->cur_inode_size);
>>>>> +                     if (ret < 0)
>>>>> +                             goto out;
>>>>> +             }
>>>>> +     }
>>>>> 
>>>>>  out:
>>>>>       return ret;
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 

  reply	other threads:[~2017-12-08  6:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-04  7:02 [PATCH] Btrfs : send, truncate first to enhance many small files robbieko
2017-12-04  7:19 ` Qu Wenruo
2017-12-04 11:13   ` Filipe Manana
2017-12-04 11:28     ` Qu Wenruo
2017-12-04 12:08       ` Filipe Manana
2017-12-08  6:29         ` robbieko [this message]
2017-12-08  6:37           ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b900ebec9dbb4adb487f47ac9bd32ac7@synology.com \
    --to=robbieko@synology.com \
    --cc=fdmanana@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.