Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Dennis Zhou <dennis@kernel.org>, David Sterba <dsterba@suse.com>,
	Josef Bacik <josef@toxicpanda.com>, Chris Mason <clm@fb.com>,
	Omar Sandoval <osandov@osandov.com>,
	Nick Terrell <terrelln@fb.com>
Cc: kernel-team@fb.com, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, Omar Sandoval <osandov@fb.com>
Subject: Re: [PATCH 11/11] btrfs: add zstd compression level support
Date: Tue, 29 Jan 2019 09:25:54 +0200
Message-ID: <c92b0233-5406-42a8-4444-a5ab503f726f@suse.com> (raw)
In-Reply-To: <20190128212437.11597-12-dennis@kernel.org>



On 28.01.19 г. 23:24 ч., Dennis Zhou wrote:
> Zstd compression requires different amounts of memory for each level of
> compression. The prior patches implemented indirection to allow for each
> compression type to manage their workspaces independently. This patch
> uses this indirection to implement compression level support for zstd.
> 
> As mentioned above, a requirement that differs zstd from zlib is that
> higher levels of compression require more memory. To manage this, each
> compression level has its own queue of workspaces. A global LRU is used
> to help with reclaim. To guarantee forward progress, a max level
> workspace is preallocated and hidden from the LRU.
> 
> When getting a workspace, it uses a bitmap to identify the levels that
> are populated and scans up. If it finds a workspace that is greater than
> it, it uses it, but does not update the last_used time and the
> corresponding place in the LRU. This provides a mechanism to decrease
> memory utilization as we only keep around workspaces that are sized
> appropriately for the in use compression levels.
> 
> By knowing which compression levels have available workspaces, we can
> recycle rather than always create new workspaces as well as take
> advantage of the preallocated max level for forward progress. If we hit
> memory pressure, we sleep on the max level workspace. We continue to
> rescan in case we can use a smaller workspace, but eventually should be
> able to obtain the max level workspace or allocate one again should
> memory pressure subside. The memory requirement for decompression is the
> same as level 1, and therefore can use any of available workspace.
> 
> The number of workspaces is bound by an upper limit of the workqueue's
> limit which currently is 2 (percpu limit). Second, a reclaim timer is
> used to free inactive/improperly sized workspaces. The reclaim timer is
> set to 67s to avoid colliding with transaction commit (every 30s) and
> attempts to reclaim any unused workspace older than 45s.
> 
> Repeating the experiment from v2 [1], the Silesia corpus was copied to a
> btrfs filesystem 10 times and then read back after dropping the caches.
> The btrfs filesystem was on an SSD.
> 
> Level   Ratio   Compression (MB/s)  Decompression (MB/s)
> 1       2.658        438.47                910.51
> 2       2.744        364.86                886.55
> 3       2.801        336.33                828.41
> 4       2.858        286.71                886.55
> 5       2.916        212.77                556.84
> 6       2.363        119.82                990.85
> 7       3.000        154.06                849.30
> 8       3.011        159.54                875.03
> 9       3.025        100.51                940.15
> 10      3.033        118.97                616.26
> 11      3.036         94.19                802.11
> 12      3.037         73.45                931.49
> 13      3.041         55.17                835.26
> 14      3.087         44.70                716.78
> 15      3.126         37.30                878.84
> 
> [1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@fb.com/
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Cc: Nick Terrell <terrelln@fb.com>
> Cc: Omar Sandoval <osandov@fb.com>
> ---
>  fs/btrfs/super.c |   6 +-
>  fs/btrfs/zstd.c  | 229 +++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 226 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index b28dff207383..0ecc513cb56c 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -544,9 +544,13 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>  				btrfs_clear_opt(info->mount_opt, NODATASUM);
>  				btrfs_set_fs_incompat(info, COMPRESS_LZO);
>  				no_compress = 0;
> -			} else if (strcmp(args[0].from, "zstd") == 0) {
> +			} else if (strncmp(args[0].from, "zstd", 4) == 0) {
>  				compress_type = "zstd";
>  				info->compress_type = BTRFS_COMPRESS_ZSTD;
> +				info->compress_level =
> +					btrfs_compress_str2level(
> +							 BTRFS_COMPRESS_ZSTD,
> +							 args[0].from + 4);
>  				btrfs_set_opt(info->mount_opt, COMPRESS);
>  				btrfs_clear_opt(info->mount_opt, NODATACOW);
>  				btrfs_clear_opt(info->mount_opt, NODATASUM);
> diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
> index a951d4fe77f7..ce9b466c197f 100644
> --- a/fs/btrfs/zstd.c
> +++ b/fs/btrfs/zstd.c
> @@ -6,20 +6,27 @@
>   */
>  
>  #include <linux/bio.h>
> +#include <linux/bitmap.h>
>  #include <linux/err.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
> +#include <linux/sched/mm.h>
>  #include <linux/pagemap.h>
>  #include <linux/refcount.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/zstd.h>
>  #include "compression.h"
> +#include "ctree.h"
>  
>  #define ZSTD_BTRFS_MAX_WINDOWLOG 17
>  #define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
>  #define ZSTD_BTRFS_DEFAULT_LEVEL 3
> +#define ZSTD_BTRFS_MAX_LEVEL 15
> +#define ZSTD_BTRFS_RECLAIM_NS (45 * NSEC_PER_SEC)
> +/* 67s to avoid clashing with transaction commit (every 30s) */
> +#define ZSTD_BTRFS_RECLAIM_JIFFIES (67 * HZ)

This is valid provided that transaction commit time is not overriden by
Opt_commit_interval. If it is such a problem to not clash with trans
commit maybe this should be calculated upon mount?

<snip>

  reply index

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-28 21:24 [PATCH 00/11] " Dennis Zhou
2019-01-28 21:24 ` [PATCH 01/11] btrfs: add macros for compression type and level Dennis Zhou
2019-01-29  7:26   ` Nikolay Borisov
2019-01-29 17:57   ` Josef Bacik
2019-01-29 22:30   ` Omar Sandoval
2019-01-31 16:00   ` David Sterba
2019-01-31 16:17     ` Dennis Zhou
2019-01-28 21:24 ` [PATCH 02/11] btrfs: rename workspaces_list to workspace_manager Dennis Zhou
2019-01-29  7:27   ` Nikolay Borisov
2019-01-29 17:58   ` Josef Bacik
2019-01-28 21:24 ` [PATCH 03/11] btrfs: manage heuristic workspace as index 0 Dennis Zhou
2019-01-29  7:53   ` Nikolay Borisov
2019-01-29 22:43     ` Dennis Zhou
2019-01-29 18:02   ` Josef Bacik
2019-01-31 16:10   ` David Sterba
2019-01-28 21:24 ` [PATCH 04/11] btrfs: unify compression ops with workspace_manager Dennis Zhou
2019-01-29  7:54   ` Nikolay Borisov
2019-01-29 18:03   ` Josef Bacik
2019-01-28 21:24 ` [PATCH 05/11] btrfs: add helper methods for workspace manager init and cleanup Dennis Zhou
2019-01-29  7:58   ` Nikolay Borisov
2019-01-29 18:04   ` Josef Bacik
2019-01-28 21:24 ` [PATCH 06/11] btrfs: add compression interface in (get/put)_workspace() Dennis Zhou
2019-01-29 18:06   ` Josef Bacik
2019-01-28 21:24 ` [PATCH 07/11] btrfs: move to fn pointers for get/put workspaces Dennis Zhou
2019-01-29  8:22   ` Nikolay Borisov
2019-01-29 23:35     ` Dennis Zhou
2019-01-29 18:17   ` Josef Bacik
2019-01-29 18:44     ` Josef Bacik
2019-01-28 21:24 ` [PATCH 08/11] btrfs: plumb level through the compression interface Dennis Zhou
2019-01-29  8:08   ` Nikolay Borisov
2019-01-29 18:17   ` Josef Bacik
2019-01-28 21:24 ` [PATCH 09/11] btrfs: change set_level() to bound the level passed in Dennis Zhou
2019-01-29  8:14   ` Nikolay Borisov
2019-01-30 22:06     ` Dennis Zhou
2019-01-28 21:24 ` [PATCH 10/11] btrfs: zstd use the passed through level instead of default Dennis Zhou
2019-01-29  8:15   ` Nikolay Borisov
2019-01-28 21:24 ` [PATCH 11/11] btrfs: add zstd compression level support Dennis Zhou
2019-01-29  7:25   ` Nikolay Borisov [this message]
2019-01-29 22:50     ` Dennis Zhou
2019-01-31 18:10   ` David Sterba
2019-01-31 18:13   ` David Sterba
2019-01-29 17:18 ` [PATCH 00/11] " David Sterba
2019-01-29 21:12   ` Nick Terrell
2019-01-30 17:40   ` Dennis Zhou
2019-01-31 14:04     ` David Sterba
2019-01-31 15:56       ` Dennis Zhou

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c92b0233-5406-42a8-4444-a5ab503f726f@suse.com \
    --to=nborisov@suse.com \
    --cc=clm@fb.com \
    --cc=dennis@kernel.org \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=osandov@fb.com \
    --cc=osandov@osandov.com \
    --cc=terrelln@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox