All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	rookxu <brookxu.cn@gmail.com>,
	Ritesh Harjani <ritesh.list@gmail.com>
Subject: Re: [PATCH v3 7/8] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list
Date: Tue, 14 Feb 2023 09:50:08 +0100	[thread overview]
Message-ID: <20230214085008.gix3dvvyg7wcf3bz@quack3> (raw)
In-Reply-To: <Y+p6URWLBPd6VUHD@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>


Hi Ojaswin!

On Mon 13-02-23 23:28:41, Ojaswin Mujoo wrote:
> On Fri, Feb 10, 2023 at 03:37:53PM +0100, Jan Kara wrote:
> > > See my comments at the end for more info.
> > > 
> > > > 
> > > > > Also, another thing I noticed is that after ext4_mb_normalize_request(),
> > > > > sometimes the original range can also exceed the normalized goal range,
> > > > > which is again was a bit surprising to me since my understanding was
> > > > > that normalized range would always encompass the orignal range.
> > > > 
> > > > Well, isn't that because (part of) the requested original range is already
> > > > preallocated? Or what causes the goal range to be shortened?
> > > > 
> > > Yes I think that pre existing PAs could be one of the cases.
> > > 
> > > Other than that, I'm also seeing some cases of sparse writes which can cause
> > > ext4_mb_normalize_request() to result in having an original range that
> > > overflows out of the goal range. For example, I observed these values just
> > > after the if else if else conditions in the function, before we check if range
> > > overlaps pre existing PAs:
> > > 
> > > orig_ex:2045/2055(len:10) normalized_range:0/2048, orig_isize:8417280
> > > 
> > > Basically, since isize is large and we are doing a sparse write, we end
> > > up in the following if condition:
> > > 
> > > 	} else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len,
> > > 								(8<<20)>>bsbits, max, 8 * 1024)) {
> > > 		start_off = ((loff_t)ac->ac_o_ex.fe_logical >> (23 - bsbits)) << 23;
> > > 		size = 8 * 1024 * 1024;
> > >  }
> > > 
> > > Resulting in normalized range less than original range.
> > 
> > I see.
> > 
> > > Now, in any case, once we get such an overflow, if we try to enter the PA
> > > adjustment window in ext4_mb_new_inode_pa() function, we will again end
> > > up with a best extent overflowing out of goal extent since we would try
> > > to cover the original extent. 
> > > 
> > > So yeah, seems like there are 2 cases where we could result in
> > > overlapping PAs:
> > > 
> > > 1. Due to off calculation in PA adjustment window, as we discussed.  2.
> > > Due to original extent overflowing out of goal extent.
> > > 
> > > I think the 3 step solution you proposed works well to counter 1 but not
> > > 2, so we probably need some more logic on top of your solution to take
> > > care of that.  I'll think some more on how to fix this but I think this
> > > will be as a separate patch.
> > 
> > Well, my algorithm will still result in preallocation being within the goal
> > range AFAICS. In the example above we had:
> > 
> > Orig 2045/2055 Goal 0/2048
> > 
> > Suppose we found 200 blocks. So we try placing preallocation like:
> > 
> > 1848/2048, it covers the original starting block 2045 so we are fine and
> > create preallocation 1848/2048. Nothing has overflown the goal window...
> 
> Ahh okay, I though you meant checking if we covered the complete
> original range instead of just the original start. In that case we
> should be good.
> 
> However, I still feel that the example where we unnecessarily end up with 
> a lesser goal len than original len should not happen. Maybe in
> ext4_mb_normalize_request(), instead of hardcording the goal lengths for
> different i_sizes, we can select the next power of 2 greater than our
> original length or some similar heuristic. What do you think?

I agree that seems suboptimal but I'd leave tweaking this heuristic for a
separate patchset. In this one I'd just fix the minimum to make ranges not
overlap. The current heuristic makes sense as an anti-fragmentation measure
when there's enough free space. We can tweak the goal window heuristic of
mballoc but it would require some deeper thought and measurements, how it
affects fragmentation for various workloads so it is not just about
changing those several lines of code...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2023-02-14  8:50 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16  8:02 [PATCH v3 0/8] ext4: Convert inode preallocation list to an rbtree Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 1/8] ext4: Stop searching if PA doesn't satisfy non-extent file Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 2/8] ext4: Refactor code related to freeing PAs Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 3/8] ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated() Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 4/8] ext4: Move overlap assert logic into a separate function Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 5/8] ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request() Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 6/8] ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 7/8] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list Ojaswin Mujoo
2023-01-16 12:23   ` Jan Kara
2023-01-17 10:30     ` Ojaswin Mujoo
2023-01-17 11:03       ` Jan Kara
2023-01-19  6:27         ` Ojaswin Mujoo
2023-01-27 14:43           ` Jan Kara
2023-02-03  8:36             ` Ojaswin Mujoo
2023-02-08 11:25               ` Ojaswin Mujoo
2023-02-09 10:54                 ` Jan Kara
2023-02-09 17:54                   ` Ojaswin Mujoo
2023-02-10 14:37                     ` Jan Kara
2023-02-13 17:58                       ` Ojaswin Mujoo
2023-02-14  8:50                         ` Jan Kara [this message]
2023-02-16 17:07                       ` Andreas Dilger
2023-03-17 12:40                         ` Ojaswin Mujoo
2023-01-16  8:02 ` [PATCH v3 8/8] ext4: Remove the logic to trim inode PAs Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230214085008.gix3dvvyg7wcf3bz@quack3 \
    --to=jack@suse.cz \
    --cc=brookxu.cn@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=riteshh@linux.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.