From: Omar Sandoval <osandov@osandov.com>
To: dsterba@suse.cz, Nick Terrell <terrelln@fb.com>,
kernel-team@fb.com, Omar Sandoval <osandov@fb.com>,
linux-btrfs@vger.kernel.org, Jennifer Liu <jenniferliu620@fb.com>
Subject: Re: [PATCH v2] btrfs: add zstd compression level support
Date: Mon, 19 Nov 2018 12:05:54 -0800 [thread overview]
Message-ID: <20181119200554.GB25682@vader> (raw)
In-Reply-To: <20181113003332.GV24115@twin.jikos.cz>
On Tue, Nov 13, 2018 at 01:33:32AM +0100, David Sterba wrote:
> On Wed, Oct 31, 2018 at 11:11:08AM -0700, Nick Terrell wrote:
> > From: Jennifer Liu <jenniferliu620@fb.com>
> >
> > Adds zstd compression level support to btrfs. Zstd requires
> > different amounts of memory for each level, so the design had
> > to be modified to allow set_level() to allocate memory. We
> > preallocate one workspace of the maximum size to guarantee
> > forward progress. This feature is expected to be useful for
> > read-mostly filesystems, or when creating images.
> >
> > Benchmarks run in qemu on Intel x86 with a single core.
> > The benchmark measures the time to copy the Silesia corpus [0] to
> > a btrfs filesystem 10 times, then read it back.
> >
> > The two important things to note are:
> > - The decompression speed and memory remains constant.
> > The memory required to decompress is the same as level 1.
> > - The compression speed and ratio will vary based on the source.
> >
> > Level Ratio Compression Decompression Compression Memory
> > 1 2.59 153 MB/s 112 MB/s 0.8 MB
> > 2 2.67 136 MB/s 113 MB/s 1.0 MB
> > 3 2.72 106 MB/s 115 MB/s 1.3 MB
> > 4 2.78 86 MB/s 109 MB/s 0.9 MB
> > 5 2.83 69 MB/s 109 MB/s 1.4 MB
> > 6 2.89 53 MB/s 110 MB/s 1.5 MB
> > 7 2.91 40 MB/s 112 MB/s 1.4 MB
> > 8 2.92 34 MB/s 110 MB/s 1.8 MB
> > 9 2.93 27 MB/s 109 MB/s 1.8 MB
> > 10 2.94 22 MB/s 109 MB/s 1.8 MB
> > 11 2.95 17 MB/s 114 MB/s 1.8 MB
> > 12 2.95 13 MB/s 113 MB/s 1.8 MB
> > 13 2.95 10 MB/s 111 MB/s 2.3 MB
> > 14 2.99 7 MB/s 110 MB/s 2.6 MB
> > 15 3.03 6 MB/s 110 MB/s 2.6 MB
> >
> > [0] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
> >
> > Signed-off-by: Jennifer Liu <jenniferliu620@fb.com>
> > Signed-off-by: Nick Terrell <terrelln@fb.com>
> > Reviewed-by: Omar Sandoval <osandov@fb.com>
> > ---
> > v1 -> v2:
> > - Don't reflow the unchanged line.
> >
[snip]
> > -static struct list_head *zstd_alloc_workspace(void)
> > +static bool zstd_set_level(struct list_head *ws, unsigned int level)
> > +{
> > + struct workspace *workspace = list_entry(ws, struct workspace, list);
> > + ZSTD_parameters params;
> > + int size;
> > +
> > + if (level > BTRFS_ZSTD_MAX_LEVEL)
> > + level = BTRFS_ZSTD_MAX_LEVEL;
> > +
> > + if (level == 0)
> > + level = BTRFS_ZSTD_DEFAULT_LEVEL;
> > +
> > + params = ZSTD_getParams(level, ZSTD_BTRFS_MAX_INPUT, 0);
> > + size = max_t(size_t,
> > + ZSTD_CStreamWorkspaceBound(params.cParams),
> > + ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
> > + if (size > workspace->size) {
> > + if (!zstd_reallocate_mem(workspace, size))
>
> This can allocate memory and this can appen on the writeout path, ie.
> one of the reasons for that might be that system needs more memory.
>
> By the table above, the size can be up to 2.6MiB, which is a lot in
> terms of kernel memory as there must be either contiguous unmapped
> memory, the virtual mappings must be created. Both are scarce resource
> or should be treated as such.
>
> Given that there's no logic that would try to minimize the usage for
> workspaces, this can allocate many workspaces of that size.
>
> Currently the workspace allocations have been moved to the early module
> loading phase so that they don't happen later and we don't have to
> allocate memory nor handle the failures. Your patch brings that back.
Even before this patch, we may try to allocate a workspace. See
__find_workspace():
https://github.com/kdave/btrfs-devel/blob/fd0f5617a8a2ee92dd461d01cf9c5c37363ccc8d/fs/btrfs/compression.c#L897
We already limit it to one per CPU, and only allocate when needed.
Anything greater than that has to wait. Maybe we should improve that to
also include a limit on the total amount of memory allocated? That would
be more flexible than your approach below of making the > default case
special, and I like it more than Timofey's idea of falling back to a
lower level.
> The solution I'm currently thinking about can make the levels work but
> would be limited in throughput as a trade-off for the memory
> consumption.
>
> - preallocate one workspace for level 15 per mounted filesystem, using
> get_free_pages_exact or kvmalloc
>
> - preallocate workspaces for the default level, the number same as for
> lzo/zlib
>
> - add logic to select the zstd:15 workspace last for other compressors,
> ie. make it almost exclusive for zstd levels > default
>
> Further refinement could allocate the workspaces on-demand and free if
> not used. Allocation failures would still be handled gracefully at the
> cost of waiting for the remainig worspaces, at least one would be always
> available.
prev parent reply other threads:[~2018-11-19 20:05 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-31 18:11 [PATCH v2] btrfs: add zstd compression level support Nick Terrell
2018-11-01 9:57 ` Timofey Titovets
2018-11-01 15:40 ` Nick Terrell
2018-11-13 0:01 ` David Sterba
2018-11-13 0:33 ` David Sterba
2018-11-13 1:49 ` Nick Terrell
2018-11-13 13:29 ` Timofey Titovets
2018-11-16 1:42 ` Nick Terrell
2018-11-19 19:57 ` Omar Sandoval
2018-11-19 20:05 ` Omar Sandoval [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181119200554.GB25682@vader \
--to=osandov@osandov.com \
--cc=dsterba@suse.cz \
--cc=jenniferliu620@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=osandov@fb.com \
--cc=terrelln@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).