linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Li GuiFu via Linux-erofs <linux-erofs@lists.ozlabs.org>
To: Gao Xiang <hsiangkao@aol.com>, linux-erofs@lists.ozlabs.org
Subject: Re: [PATCH v7 3/3] erofs-utils: optimize buffer allocation logic
Date: Sat, 6 Feb 2021 23:29:40 +0800	[thread overview]
Message-ID: <fc43d559-4067-8897-6c39-7c69b9748066@aliyun.com> (raw)
In-Reply-To: <20210122171153.27404-4-hsiangkao@aol.com>



On 2021/1/23 1:11, Gao Xiang via Linux-erofs wrote:
> From: Hu Weiwen <sehuww@mail.scut.edu.cn>
> 
> When using EROFS to pack our dataset which consists of millions of
> files, mkfs.erofs is very slow compared with mksquashfs.
> 
> The bottleneck is `erofs_balloc' and `erofs_mapbh' function, which
> iterate over all previously allocated buffer blocks, making the
> complexity of the algrithm O(N^2) where N is the number of files.
> 
> With this patch:
> 
> * global `last_mapped_block' is mantained to avoid full scan in
> `erofs_mapbh` function.
> 
> * global `mapped_buckets' maintains a list of already mapped buffer
> blocks for each type and for each possible used bytes in the last
> EROFS_BLKSIZ. Then it is used to identify the most suitable blocks in
> future `erofs_balloc', avoiding full scan. Note that not-mapped (and the
> last mapped) blocks can be expended, so we deal with them separately.
> 
> When I test it with ImageNet dataset (1.33M files, 147GiB), it takes
> about 4 hours. Most time is spent on IO.
> 
> Cc: Huang Jianan <jnhuang95@gmail.com>
> Signed-off-by: Hu Weiwen <sehuww@mail.scut.edu.cn>
> Signed-off-by: Gao Xiang <hsiangkao@aol.com>
> ---
>  include/erofs/cache.h |   1 +
>  lib/cache.c           | 105 ++++++++++++++++++++++++++++++++++++------
>  2 files changed, 93 insertions(+), 13 deletions(-)
> 

It looks good
Reviewed-by: Li Guifu <bluce.lee@aliyun.com>

Thanks,

      reply	other threads:[~2021-02-06 15:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210122171153.27404-1-hsiangkao.ref@aol.com>
2021-01-22 17:11 ` [PATCH v7 0/3] erofs-utils: optimize buffer allocation logic Gao Xiang via Linux-erofs
2021-01-22 17:11   ` [PATCH v7 1/3] erofs-utils: get rid of `end' argument from erofs_mapbh() Gao Xiang via Linux-erofs
2021-02-06 15:28     ` Li GuiFu via Linux-erofs
2021-01-22 17:11   ` [PATCH v7 2/3] erofs-utils: introduce erofs_bfind_for_attach() Gao Xiang via Linux-erofs
2021-02-06 15:29     ` Li GuiFu via Linux-erofs
2021-01-22 17:11   ` [PATCH v7 3/3] erofs-utils: optimize buffer allocation logic Gao Xiang via Linux-erofs
2021-02-06 15:29     ` Li GuiFu via Linux-erofs [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc43d559-4067-8897-6c39-7c69b9748066@aliyun.com \
    --to=linux-erofs@lists.ozlabs.org \
    --cc=bluce.lee@aliyun.com \
    --cc=hsiangkao@aol.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).