From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B282C4CECE for ; Mon, 14 Oct 2019 20:04:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EDE52217D9 for ; Mon, 14 Oct 2019 20:04:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1571083450; bh=EbEKnGjRSCGocbPg2RWRJM/n5io73dC7iuz4HxGdlCE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=2L4kGBxUT9awNHF1+yfqsRWfoWbOxSWos11taKYQYNfD+1TEm1lqcgvPb1XSIrHGi rgbyGKJ9kULeOnyzljVn/R+5pJsjOC7jod1fOcx5iGrtJ5zl5BMRw0aGQ0kiVbYAXn e2lXGCH8M+5nNosKEUkw2KQRzuccKRHGcEbZhDOY= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731502AbfJNUEJ (ORCPT ); Mon, 14 Oct 2019 16:04:09 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:32935 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730668AbfJNUEJ (ORCPT ); Mon, 14 Oct 2019 16:04:09 -0400 Received: by mail-qt1-f195.google.com with SMTP id r5so27225521qtd.0 for ; Mon, 14 Oct 2019 13:04:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=iHRud3qmPNr62Bt9OGfA1BZj9uWE7FZ1csdx+VeZoc0=; b=omINAt8J3UMe5iVssT2Azfcz9TR3YCPBzX2Jq6MbN0TZ0bxBOxMef4CJ6gVqaxufZ/ t1sHhVgzlNNhVH2fokAZH2FkOaYZOJxFMgVBEU/lNF6iG/llw0KnRwGPWzL7sYa3MbFs 7vKGBS9OcNVT/aj9fFt5DcaB5efTq8YDwIES7Xi+p2c5J1K3Dr5Nc+mbu5Pd25jzJpUT HN1mgBefE2tFnGqg2ROEGuXg25yG5y9KgyO2ooQXyM6Wasirp9jT8KX6xTrXKrkRIdIa TUeaCaGIeVL3km4Lr+ItPV4h7nXQzhk8dU8AkpOU9pRZy/og9OlskfVa9pJGHYIDpjr3 137A== X-Gm-Message-State: APjAAAX0plojFj4e45U7sL2v1HKy+6S3IGz/Im4B/9vzMkAh6aVlmYMD 0hcqBDHGZh2y2EQ+sflvvyU= X-Google-Smtp-Source: APXvYqwxTZzhx2fUZECCVE1oKl6R/Yralv9iCTjCaG88kXmngeu26PMPEVPbGgfzlt42WUSyPJLizg== X-Received: by 2002:ad4:538b:: with SMTP id i11mr14102613qvv.211.1571083447600; Mon, 14 Oct 2019 13:04:07 -0700 (PDT) Received: from dennisz-mbp.dhcp.thefacebook.com ([2620:10d:c091:500::2:b1f8]) by smtp.gmail.com with ESMTPSA id a19sm12118048qtc.58.2019.10.14.13.04.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 14 Oct 2019 13:04:06 -0700 (PDT) Date: Mon, 14 Oct 2019 16:04:05 -0400 From: Dennis Zhou To: Josef Bacik Cc: David Sterba , Chris Mason , Omar Sandoval , kernel-team@fb.com, linux-btrfs@vger.kernel.org Subject: Re: [PATCH 13/19] btrfs: have multiple discard lists Message-ID: <20191014200405.GE40077@dennisz-mbp.dhcp.thefacebook.com> References: <87b5ef751e8febd481065e475bd53b276e670ff6.1570479299.git.dennis@kernel.org> <20191010165100.nz7tfn7kpclp6dkl@macbook-pro-91.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191010165100.nz7tfn7kpclp6dkl@macbook-pro-91.dhcp.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Thu, Oct 10, 2019 at 12:51:01PM -0400, Josef Bacik wrote: > On Mon, Oct 07, 2019 at 04:17:44PM -0400, Dennis Zhou wrote: > > Non-block group destruction discarding currently only had a single list > > with no minimum discard length. This can lead to caravaning more > > meaningful discards behind a heavily fragmented block group. > > > > This adds support for multiple lists with minimum discard lengths to > > prevent the caravan effect. We promote block groups back up when we > > exceed the BTRFS_DISCARD_MAX_FILTER size, currently we support only 2 > > lists with filters of 1MB and 32KB respectively. > > > > Signed-off-by: Dennis Zhou > > --- > > fs/btrfs/ctree.h | 2 +- > > fs/btrfs/discard.c | 60 +++++++++++++++++++++++++++++++++---- > > fs/btrfs/discard.h | 4 +++ > > fs/btrfs/free-space-cache.c | 37 +++++++++++++++-------- > > fs/btrfs/free-space-cache.h | 2 +- > > 5 files changed, 85 insertions(+), 20 deletions(-) > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > > index e81f699347e0..b5608f8dc41a 100644 > > --- a/fs/btrfs/ctree.h > > +++ b/fs/btrfs/ctree.h > > @@ -439,7 +439,7 @@ struct btrfs_full_stripe_locks_tree { > > }; > > > > /* discard control */ > > -#define BTRFS_NR_DISCARD_LISTS 2 > > +#define BTRFS_NR_DISCARD_LISTS 3 > > > > struct btrfs_discard_ctl { > > struct workqueue_struct *discard_workers; > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > > index 072c73f48297..296cbffc5957 100644 > > --- a/fs/btrfs/discard.c > > +++ b/fs/btrfs/discard.c > > @@ -20,6 +20,10 @@ > > #define BTRFS_DISCARD_MAX_DELAY (10000UL) > > #define BTRFS_DISCARD_MAX_IOPS (10UL) > > > > +/* montonically decreasing filters after 0 */ > > +static int discard_minlen[BTRFS_NR_DISCARD_LISTS] = {0, > > + BTRFS_DISCARD_MAX_FILTER, BTRFS_DISCARD_MIN_FILTER}; > > + > > static struct list_head * > > btrfs_get_discard_list(struct btrfs_discard_ctl *discard_ctl, > > struct btrfs_block_group_cache *cache) > > @@ -120,7 +124,7 @@ find_next_cache(struct btrfs_discard_ctl *discard_ctl, u64 now) > > } > > > > static struct btrfs_block_group_cache * > > -peek_discard_list(struct btrfs_discard_ctl *discard_ctl) > > +peek_discard_list(struct btrfs_discard_ctl *discard_ctl, int *discard_index) > > { > > struct btrfs_block_group_cache *cache; > > u64 now = ktime_get_ns(); > > @@ -132,6 +136,7 @@ peek_discard_list(struct btrfs_discard_ctl *discard_ctl) > > > > if (cache && now > cache->discard_delay) { > > discard_ctl->cache = cache; > > + *discard_index = cache->discard_index; > > if (cache->discard_index == 0 && > > cache->free_space_ctl->free_space != cache->key.offset) { > > __btrfs_add_to_discard_list(discard_ctl, cache); > > @@ -150,6 +155,36 @@ peek_discard_list(struct btrfs_discard_ctl *discard_ctl) > > return cache; > > } > > > > +void btrfs_discard_check_filter(struct btrfs_block_group_cache *cache, > > + u64 bytes) > > +{ > > + struct btrfs_discard_ctl *discard_ctl; > > + > > + if (!cache || !btrfs_test_opt(cache->fs_info, DISCARD_ASYNC)) > > + return; > > + > > + discard_ctl = &cache->fs_info->discard_ctl; > > + > > + if (cache && cache->discard_index > 1 && > > + bytes >= BTRFS_DISCARD_MAX_FILTER) { > > + remove_from_discard_list(discard_ctl, cache); > > + cache->discard_index = 1; > > Really need names here, I have no idea what 1 is. > Yep, done. > > + btrfs_add_to_discard_list(discard_ctl, cache); > > + } > > +} > > + > > +static void btrfs_update_discard_index(struct btrfs_discard_ctl *discard_ctl, > > + struct btrfs_block_group_cache *cache) > > +{ > > + cache->discard_index++; > > + if (cache->discard_index == BTRFS_NR_DISCARD_LISTS) { > > + cache->discard_index = 1; > > + return; > > + } > > + > > + btrfs_add_to_discard_list(discard_ctl, cache); > > +} > > + > > void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, > > struct btrfs_block_group_cache *cache) > > { > > @@ -202,23 +237,34 @@ static void btrfs_discard_workfn(struct work_struct *work) > > { > > struct btrfs_discard_ctl *discard_ctl; > > struct btrfs_block_group_cache *cache; > > + int discard_index = 0; > > u64 trimmed = 0; > > + u64 minlen = 0; > > > > discard_ctl = container_of(work, struct btrfs_discard_ctl, work.work); > > > > again: > > - cache = peek_discard_list(discard_ctl); > > + cache = peek_discard_list(discard_ctl, &discard_index); > > if (!cache || !btrfs_run_discard_work(discard_ctl)) > > return; > > > > - if (btrfs_discard_bitmaps(cache)) > > + minlen = discard_minlen[discard_index]; > > + > > + if (btrfs_discard_bitmaps(cache)) { > > + u64 maxlen = 0; > > + > > + if (discard_index) > > + maxlen = discard_minlen[discard_index - 1]; > > + > > btrfs_trim_block_group_bitmaps(cache, &trimmed, > > cache->discard_cursor, > > btrfs_block_group_end(cache), > > - 0, true); > > - else > > + minlen, maxlen, true); > > + } else { > > btrfs_trim_block_group(cache, &trimmed, cache->discard_cursor, > > - btrfs_block_group_end(cache), 0, true); > > + btrfs_block_group_end(cache), > > + minlen, true); > > + } > > > > discard_ctl->prev_discard = trimmed; > > > > @@ -231,6 +277,8 @@ static void btrfs_discard_workfn(struct work_struct *work) > > cache->key.offset) > > btrfs_add_to_discard_free_list(discard_ctl, > > cache); > > + else > > + btrfs_update_discard_index(discard_ctl, cache); > > } else { > > cache->discard_cursor = cache->key.objectid; > > cache->discard_flags |= BTRFS_DISCARD_BITMAPS; > > diff --git a/fs/btrfs/discard.h b/fs/btrfs/discard.h > > index 898dd92dbf8f..1daa8da4a1b5 100644 > > --- a/fs/btrfs/discard.h > > +++ b/fs/btrfs/discard.h > > @@ -18,6 +18,8 @@ > > > > /* discard size limits */ > > #define BTRFS_DISCARD_MAX_SIZE (SZ_64M) > > +#define BTRFS_DISCARD_MAX_FILTER (SZ_1M) > > +#define BTRFS_DISCARD_MIN_FILTER (SZ_32K) > > > > /* discard flags */ > > #define BTRFS_DISCARD_RESET_CURSOR (1UL << 0) > > @@ -39,6 +41,8 @@ void btrfs_add_to_discard_list(struct btrfs_discard_ctl *discard_ctl, > > struct btrfs_block_group_cache *cache); > > void btrfs_add_to_discard_free_list(struct btrfs_discard_ctl *discard_ctl, > > struct btrfs_block_group_cache *cache); > > +void btrfs_discard_check_filter(struct btrfs_block_group_cache *cache, > > + u64 bytes); > > void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info); > > > > void btrfs_discard_cancel_work(struct btrfs_discard_ctl *discard_ctl, > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > > index ce33803a45b2..ed35dc090df6 100644 > > --- a/fs/btrfs/free-space-cache.c > > +++ b/fs/btrfs/free-space-cache.c > > @@ -2471,6 +2471,7 @@ int __btrfs_add_free_space(struct btrfs_fs_info *fs_info, > > if (ret) > > kmem_cache_free(btrfs_free_space_cachep, info); > > out: > > + btrfs_discard_check_filter(cache, bytes); > > So we're only accounting the new space? What if we merge with a larger area > here? We should probably make our decision based on the actual trimable area. > Thanks that's a good catch. I've updated it to be info->bytes if possible. > > btrfs_discard_update_discardable(cache, ctl); > > spin_unlock(&ctl->tree_lock); > > > > @@ -3409,7 +3410,13 @@ static int trim_no_bitmap(struct btrfs_block_group_cache *block_group, > > goto next; > > } > > unlink_free_space(ctl, entry); > > - if (bytes > BTRFS_DISCARD_MAX_SIZE) { > > + /* > > + * Let bytes = BTRFS_MAX_DISCARD_SIZE + X. > > + * If X < BTRFS_DISCARD_MIN_FILTER, we won't trim X when > > + * we come back around. So trim it now. > > + */ > > + if (bytes > (BTRFS_DISCARD_MAX_SIZE + > > + BTRFS_DISCARD_MIN_FILTER)) { > > bytes = extent_bytes = BTRFS_DISCARD_MAX_SIZE; > > entry->offset += BTRFS_DISCARD_MAX_SIZE; > > entry->bytes -= BTRFS_DISCARD_MAX_SIZE; > > @@ -3510,7 +3517,7 @@ static void end_trimming_bitmap(struct btrfs_free_space_ctl *ctl, > > > > static int trim_bitmaps(struct btrfs_block_group_cache *block_group, > > u64 *total_trimmed, u64 start, u64 end, u64 minlen, > > - bool async) > > + u64 maxlen, bool async) > > { > > struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl; > > struct btrfs_free_space *entry; > > @@ -3535,7 +3542,7 @@ static int trim_bitmaps(struct btrfs_block_group_cache *block_group, > > } > > > > entry = tree_search_offset(ctl, offset, 1, 0); > > - if (!entry || (async && start == offset && > > + if (!entry || (async && minlen && start == offset && > > btrfs_free_space_trimmed(entry))) { > > Huh? Why do we care if minlen is set if our entry is already trimmed? If we're > already trimmed we should just skip it even with minlen set, right? Thanks, > Yeah this definitely needs a comment. The reason for this is minlen is used to check if we're in discard_index 0 which is the free path discarding. In this case, because bitmaps are lossily marked trimmed, (we skip lone pages less than BTRFS_ASYNC_DISCARD_MIN_FILTER), we need to just go back and double check. The goal is to maintain the invariant everything is discarded when we forget about a block group. I think in practice, with the FORCE_EXTENT_THRESHOLD patch later on, this shouldn't really be common case as it should have been picked out by some coalescing. Thanks, Dennis