linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: dsterba@suse.cz, Nick Terrell <terrelln@fb.com>,
	kernel-team@fb.com, Omar Sandoval <osandov@fb.com>,
	linux-btrfs@vger.kernel.org, Jennifer Liu <jenniferliu620@fb.com>
Subject: Re: [PATCH v2] btrfs: add zstd compression level support
Date: Mon, 19 Nov 2018 12:05:54 -0800	[thread overview]
Message-ID: <20181119200554.GB25682@vader> (raw)
In-Reply-To: <20181113003332.GV24115@twin.jikos.cz>

On Tue, Nov 13, 2018 at 01:33:32AM +0100, David Sterba wrote:
> On Wed, Oct 31, 2018 at 11:11:08AM -0700, Nick Terrell wrote:
> > From: Jennifer Liu <jenniferliu620@fb.com>
> > 
> > Adds zstd compression level support to btrfs. Zstd requires
> > different amounts of memory for each level, so the design had
> > to be modified to allow set_level() to allocate memory. We
> > preallocate one workspace of the maximum size to guarantee
> > forward progress. This feature is expected to be useful for
> > read-mostly filesystems, or when creating images.
> > 
> > Benchmarks run in qemu on Intel x86 with a single core.
> > The benchmark measures the time to copy the Silesia corpus [0] to
> > a btrfs filesystem 10 times, then read it back.
> > 
> > The two important things to note are:
> > - The decompression speed and memory remains constant.
> >   The memory required to decompress is the same as level 1.
> > - The compression speed and ratio will vary based on the source.
> > 
> > Level	Ratio	Compression	Decompression	Compression Memory
> > 1    	2.59 	153 MB/s   	112 MB/s     	0.8 MB
> > 2    	2.67 	136 MB/s   	113 MB/s     	1.0 MB
> > 3    	2.72 	106 MB/s   	115 MB/s     	1.3 MB
> > 4    	2.78 	86  MB/s   	109 MB/s     	0.9 MB
> > 5    	2.83 	69  MB/s   	109 MB/s     	1.4 MB
> > 6    	2.89 	53  MB/s   	110 MB/s     	1.5 MB
> > 7    	2.91 	40  MB/s   	112 MB/s     	1.4 MB
> > 8    	2.92 	34  MB/s   	110 MB/s     	1.8 MB
> > 9    	2.93 	27  MB/s   	109 MB/s     	1.8 MB
> > 10   	2.94 	22  MB/s   	109 MB/s     	1.8 MB
> > 11   	2.95 	17  MB/s   	114 MB/s     	1.8 MB
> > 12   	2.95 	13  MB/s   	113 MB/s     	1.8 MB
> > 13   	2.95 	10  MB/s   	111 MB/s     	2.3 MB
> > 14   	2.99 	7   MB/s   	110 MB/s     	2.6 MB
> > 15   	3.03 	6   MB/s   	110 MB/s     	2.6 MB
> > 
> > [0] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
> > 
> > Signed-off-by: Jennifer Liu <jenniferliu620@fb.com>
> > Signed-off-by: Nick Terrell <terrelln@fb.com>
> > Reviewed-by: Omar Sandoval <osandov@fb.com>
> > ---
> > v1 -> v2:
> > - Don't reflow the unchanged line.
> > 

[snip]

> > -static struct list_head *zstd_alloc_workspace(void)
> > +static bool zstd_set_level(struct list_head *ws, unsigned int level)
> > +{
> > +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> > +	ZSTD_parameters params;
> > +	int size;
> > +
> > +	if (level > BTRFS_ZSTD_MAX_LEVEL)
> > +		level = BTRFS_ZSTD_MAX_LEVEL;
> > +
> > +	if (level == 0)
> > +		level = BTRFS_ZSTD_DEFAULT_LEVEL;
> > +
> > +	params = ZSTD_getParams(level, ZSTD_BTRFS_MAX_INPUT, 0);
> > +	size = max_t(size_t,
> > +			ZSTD_CStreamWorkspaceBound(params.cParams),
> > +			ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
> > +	if (size > workspace->size) {
> > +		if (!zstd_reallocate_mem(workspace, size))
> 
> This can allocate memory and this can appen on the writeout path, ie.
> one of the reasons for that might be that system needs more memory.
> 
> By the table above, the size can be up to 2.6MiB, which is a lot in
> terms of kernel memory as there must be either contiguous unmapped
> memory, the virtual mappings must be created. Both are scarce resource
> or should be treated as such.
> 
> Given that there's no logic that would try to minimize the usage for
> workspaces, this can allocate many workspaces of that size.
> 
> Currently the workspace allocations have been moved to the early module
> loading phase so that they don't happen later and we don't have to
> allocate memory nor handle the failures. Your patch brings that back.

Even before this patch, we may try to allocate a workspace. See
__find_workspace():

https://github.com/kdave/btrfs-devel/blob/fd0f5617a8a2ee92dd461d01cf9c5c37363ccc8d/fs/btrfs/compression.c#L897

We already limit it to one per CPU, and only allocate when needed.
Anything greater than that has to wait. Maybe we should improve that to
also include a limit on the total amount of memory allocated? That would
be more flexible than your approach below of making the > default case
special, and I like it more than Timofey's idea of falling back to a
lower level.

> The solution I'm currently thinking about can make the levels work but
> would be limited in throughput as a trade-off for the memory
> consumption.
> 
> - preallocate one workspace for level 15 per mounted filesystem, using
>   get_free_pages_exact or kvmalloc
> 
> - preallocate workspaces for the default level, the number same as for
>   lzo/zlib
> 
> - add logic to select the zstd:15 workspace last for other compressors,
>   ie. make it almost exclusive for zstd levels > default
> 
> Further refinement could allocate the workspaces on-demand and free if
> not used. Allocation failures would still be handled gracefully at the
> cost of waiting for the remainig worspaces, at least one would be always
> available.

      parent reply	other threads:[~2018-11-19 20:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-31 18:11 [PATCH v2] btrfs: add zstd compression level support Nick Terrell
2018-11-01  9:57 ` Timofey Titovets
2018-11-01 15:40   ` Nick Terrell
2018-11-13  0:01 ` David Sterba
2018-11-13  0:33 ` David Sterba
2018-11-13  1:49   ` Nick Terrell
2018-11-13 13:29     ` Timofey Titovets
2018-11-16  1:42       ` Nick Terrell
2018-11-19 19:57       ` Omar Sandoval
2018-11-19 20:05   ` Omar Sandoval [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181119200554.GB25682@vader \
    --to=osandov@osandov.com \
    --cc=dsterba@suse.cz \
    --cc=jenniferliu620@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=osandov@fb.com \
    --cc=terrelln@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).