All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Subject: Re: [PATCH v14.4 01/15] btrfs: improve inode's outstanding_extents computation
Date: Mon, 24 Jul 2017 16:00:38 -0400	[thread overview]
Message-ID: <20170724200037.GA15181@destiny> (raw)
In-Reply-To: <20170712085002.23241-2-lufq.fnst@cn.fujitsu.com>

On Wed, Jul 12, 2017 at 04:49:48PM +0800, Lu Fengqi wrote:
> From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
> 
> This issue was revealed by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB,
> When modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often
> gets these warnings from btrfs_destroy_inode():
> 	WARN_ON(BTRFS_I(inode)->outstanding_extents);
> 	WARN_ON(BTRFS_I(inode)->reserved_extents);
> 
> Simple test program below can reproduce this issue steadily.
> Note: you need to modify BTRFS_MAX_EXTENT_SIZE to 64KB to have test,
> otherwise there won't be such WARNING.
> 	#include <string.h>
> 	#include <unistd.h>
> 	#include <sys/types.h>
> 	#include <sys/stat.h>
> 	#include <fcntl.h>
> 
> 	int main(void)
> 	{
> 		int fd;
> 		char buf[68 *1024];
> 
> 		memset(buf, 0, 68 * 1024);
> 		fd = open("testfile", O_CREAT | O_EXCL | O_RDWR);
> 		pwrite(fd, buf, 68 * 1024, 64 * 1024);
> 		return;
> 	}
> 
> When BTRFS_MAX_EXTENT_SIZE is 64KB, and buffered data range is:
> 64KB						128K		132KB
> |-----------------------------------------------|---------------|
>                          64 + 4KB
> 
> 1) for above data range, btrfs_delalloc_reserve_metadata() will reserve
> metadata and set BTRFS_I(inode)->outstanding_extents to 2.
> (68KB + 64KB - 1) / 64KB == 2
> 
> Outstanding_extents: 2
> 
> 2) then btrfs_dirty_page() will be called to dirty pages and set
> EXTENT_DELALLOC flag. In this case, btrfs_set_bit_hook() will be called
> twice.
> The 1st set_bit_hook() call will set DEALLOC flag for the first 64K.
> 64KB						128KB
> |-----------------------------------------------|
> 	64KB DELALLOC
> Outstanding_extents: 2
> 
> Set_bit_hooks() uses FIRST_DELALLOC flag to avoid re-increase
> outstanding_extents counter.
> So for 1st set_bit_hooks() call, it won't modify outstanding_extents,
> it's still 2.
> 
> Then FIRST_DELALLOC flag is *CLEARED*.
> 
> 3) 2nd btrfs_set_bit_hook() call.
> Because FIRST_DELALLOC have been cleared by previous set_bit_hook(),
> btrfs_set_bit_hook() will increase BTRFS_I(inode)->outstanding_extents by
> one, so now BTRFS_I(inode)->outstanding_extents is 3.
> 64KB                                            128KB            132KB
> |-----------------------------------------------|----------------|
> 	64K DELALLOC				   4K DELALLOC
> Outstanding_extents: 3
> 
> But the correct outstanding_extents number should be 2, not 3.
> The 2nd btrfs_set_bit_hook() call just screwed up this, and leads to the
> WARN_ON().
> 
> Normally, we can solve it by only increasing outstanding_extents in
> set_bit_hook().
> But the problem is for delalloc_reserve/release_metadata(), we only have
> a 'length' parameter, and calculate in-accurate outstanding_extents.
> If we only rely on set_bit_hook() release_metadata() will crew things up
> as it will decrease inaccurate number.
> 
> So the fix we use is:
> 1) Increase *INACCURATE* outstanding_extents at delalloc_reserve_meta
>    Just as a place holder.
> 2) Increase *accurate* outstanding_extents at set_bit_hooks()
>    This is the real increaser.
> 3) Decrease *INACCURATE* outstanding_extents before returning
>    This makes outstanding_extents to correct value.
> 
> For 128M BTRFS_MAX_EXTENT_SIZE, due to limitation of
> __btrfs_buffered_write(), each iteration will only handle about 2MB
> data.
> So btrfs_dirty_pages() won't need to handle cases cross 2 extents.
> 

NACK on this, it just makes a crappy problem harder to understand.  We need to
stop using outstanding_extents as both a reservation thing _and_ an accurate
count.  I'll rework this and fix the other issues related to over-reservation
with compression.  Thanks,

Josef

  reply	other threads:[~2017-07-24 20:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-12  8:49 [PATCH v14.4 00/15] Btrfs In-band De-duplication Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 01/15] btrfs: improve inode's outstanding_extents computation Lu Fengqi
2017-07-24 20:00   ` Josef Bacik [this message]
2017-07-25  1:04     ` Qu Wenruo
2017-07-12  8:49 ` [PATCH v14.4 02/15] btrfs: introduce type based delalloc metadata reserve Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 03/15] btrfs: Introduce COMPRESS reserve type to fix false enospc for compression Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 04/15] btrfs: dedupe: Introduce dedupe framework and its header Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 05/15] btrfs: dedupe: Introduce function to initialize dedupe info Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 06/15] btrfs: dedupe: Introduce function to add hash into in-memory tree Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 07/15] btrfs: dedupe: Introduce function to remove hash from " Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 08/15] btrfs: delayed-ref: Add support for increasing data ref under spinlock Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 09/15] btrfs: dedupe: Introduce function to search for an existing hash Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 10/15] btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 11/15] btrfs: ordered-extent: Add support for dedupe Lu Fengqi
2017-07-12  8:49 ` [PATCH v14.4 12/15] btrfs: dedupe: Inband in-memory only de-duplication implement Lu Fengqi
2017-07-12  8:50 ` [PATCH v14.4 13/15] btrfs: dedupe: Add ioctl for inband dedupelication Lu Fengqi
2017-07-12  8:50 ` [PATCH v14.4 14/15] btrfs: relocation: Enhance error handling to avoid BUG_ON Lu Fengqi
2017-07-12  8:50 ` [PATCH v14.4 15/15] btrfs: dedupe: Introduce new reconfigure ioctl Lu Fengqi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170724200037.GA15181@destiny \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lufq.fnst@cn.fujitsu.com \
    --cc=wangxg.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.