All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	rookxu <brookxu.cn@gmail.com>,
	Ritesh Harjani <ritesh.list@gmail.com>
Subject: Re: [PATCH v3 7/8] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list
Date: Wed, 8 Feb 2023 16:55:05 +0530	[thread overview]
Message-ID: <Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com> (raw)
In-Reply-To: <Y9zHkMx7w4Io0TTv@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>

On Fri, Feb 03, 2023 at 02:06:56PM +0530, Ojaswin Mujoo wrote:
> On Fri, Jan 27, 2023 at 03:43:12PM +0100, Jan Kara wrote:
> > 
> > Well, I think cond_resched() + goto retry would be OK here. We could also
> > cycle the corresponding group lock which would wait for
> > ext4_mb_discard_group_preallocations() to finish but that is going to burn
> > the CPU even more than the cond_resched() + retry as we'll be just spinning
> > on the spinlock. Sleeping is IMHO not warranted as the whole
> > ext4_mb_discard_group_preallocations() is running under a spinlock anyway
> > so it should better be a very short sleep.
> > 
> > Or actually I have one more possible solution: What the adjusting function
> > is doing that it looks up PA before and after ac->ac_o_ex.fe_logical and
> > trims start & end to not overlap these PAs. So we could just lookup these
> > two PAs (ignoring the deleted state) and then just iterate from these with
> > rb_prev() & rb_next() until we find not-deleted ones. What do you think? 
> 
> Hey Jan, 
> 
> Just thought I'd update you, I'm trying this solution out, and it looks
> good but I'm hitting a few bugs in the implementation. Will update here
> once I have it working correctly.

Alright, so after spending some time on these bugs I'm hitting I'm
seeing some strange behavior. Basically, it seems like in scenarios
where we are not able to allocate as many block as the normalized goal
request, we can sometimes end up adding a PA that overlaps with existing
PAs in the inode PA list/tree. This behavior exists even before this
particular patchset. Due to presence of such overlapping PAs, the above
logic was failing in some cases.

From my understanding of the code, this seems to be a BUG. We should not
be adding overlapping PA ranges as that causes us to preallocate
multiple blocks for the same logical offset in a file, however I would
also like to know if my understanding is incorrect and if this is an
intended behavior.

----- Analysis of the issue ------

Here's my analysis of the behavior, which I did by adding some BUG_ONs
and running generic/269 (4k bs). It happens pretty often, like once
every 5-10 runs. Testing was done without applying this patch series on
the Ted's dev branch.

1. So taking an example of a real scenario I hit. After we find the best
len possible, we enter the ext4_mb_new_inode_pa() function with the
following values for start and end of the extents:

## format: <start>/<end>(<len>)
orig_ex:503/510(7) goal_ex:0/512(512) best_ex:0/394(394)

2. Since (best_ex len < goal_ex len) we enter the PA window adjustment
if condition here:

	if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len)
		...
	}

3. Here, we calc wins, winl and off and adjust logical start and end of
the best found extent. The idea is to make sure that the best extent
atleast covers the original request. In this example, the values are:

winl:503 wins:387 off:109

and win = min(winl, wins, off) = 109

4. We then adjust the logical start of the best ex as:

		ac->ac_b_ex.fe_logical = ac->ac_o_ex.fe_logical - EXT4_NUM_B2C(sbi, win);

which makes the new best extent as:

best_ex: 394/788(394)

As we can see, the best extent overflows outside the goal range, and
hence we don't have any guarentee anymore that it will not overlap with
another PA since we only check overlaps with the goal start and end.
We then initialze the new PA with the logical start and end of the best
extent and finaly add it to the inode PA list.

In my testing I was able to actually see overlapping PAs being added to
the inode list.

----------- END ---------------

Again, I would like to know if this is a BUG or intended. If its a BUG,
is it okay for us to make sure the adjusted best extent length doesn't 
extend the goal length? 

Also, another thing I noticed is that after ext4_mb_normalize_request(),
sometimes the original range can also exceed the normalized goal range,
which is again was a bit surprising to me since my understanding was
that normalized range would always encompass the orignal range.

Hoping to get some insights into the above points.

Regards,
ojaswin

  reply	other threads:[~2023-02-08 11:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16  8:02 [PATCH v3 0/8] ext4: Convert inode preallocation list to an rbtree Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 1/8] ext4: Stop searching if PA doesn't satisfy non-extent file Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 2/8] ext4: Refactor code related to freeing PAs Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 3/8] ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated() Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 4/8] ext4: Move overlap assert logic into a separate function Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 5/8] ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request() Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 6/8] ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 7/8] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list Ojaswin Mujoo
2023-01-16 12:23   ` Jan Kara
2023-01-17 10:30     ` Ojaswin Mujoo
2023-01-17 11:03       ` Jan Kara
2023-01-19  6:27         ` Ojaswin Mujoo
2023-01-27 14:43           ` Jan Kara
2023-02-03  8:36             ` Ojaswin Mujoo
2023-02-08 11:25               ` Ojaswin Mujoo [this message]
2023-02-09 10:54                 ` Jan Kara
2023-02-09 17:54                   ` Ojaswin Mujoo
2023-02-10 14:37                     ` Jan Kara
2023-02-13 17:58                       ` Ojaswin Mujoo
2023-02-14  8:50                         ` Jan Kara
2023-02-16 17:07                       ` Andreas Dilger
2023-03-17 12:40                         ` Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 8/8] ext4: Remove the logic to trim inode PAs Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com \
    --to=ojaswin@linux.ibm.com \
    --cc=brookxu.cn@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=riteshh@linux.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.