Re: [PATCH] ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim

From: Lukas Czerner <lczerner@redhat.com>
To: Reindl Harald <h.reindl@thelounge.net>
Cc: Wang Shilong <wangshilong1991@gmail.com>,
	linux-ext4@vger.kernel.org, Wang Shilong <wshilong@ddn.com>,
	Shuichi Ihara <sihara@ddn.com>,
	Andreas Dilger <adilger@dilger.ca>
Subject: Re: [PATCH] ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim
Date: Wed, 27 May 2020 12:32:14 +0200	[thread overview]
Message-ID: <20200527103214.knm2vmnwjt64j55l@work> (raw)
In-Reply-To: <59df4f2f-f168-99a1-e929-82742693f8ee@thelounge.net>

On Wed, May 27, 2020 at 12:11:52PM +0200, Reindl Harald wrote:
> 
> Am 27.05.20 um 11:57 schrieb Lukas Czerner:
> > On Wed, May 27, 2020 at 11:32:02AM +0200, Reindl Harald wrote:
> >>
> >>
> >> Am 27.05.20 um 11:19 schrieb Lukas Czerner:
> >>> On Wed, May 27, 2020 at 04:38:50PM +0900, Wang Shilong wrote:
> >>>> From: Wang Shilong <wshilong@ddn.com>
> >>>>
> >>>> Currently WAS_TRIMMED flag is not persistent, whenever filesystem was
> >>>> remounted, fstrim need walk all block groups again, the problem with
> >>>> this is FSTRIM could be slow on very large LUN SSD based filesystem.
> >>>>
> >>>> To avoid this kind of problem, we introduce a block group flag
> >>>> EXT4_BG_WAS_TRIMMED, the side effect of this is we need introduce
> >>>> extra one block group dirty write after trimming block group.
> >>
> >> would that also fix the issue that *way too much* is trimmed all the
> >> time, no matter if it's a thin provisioned vmware disk or a phyiscal
> >> RAID10 with SSD
> > 
> > no, the mechanism remains the same, but the proposal is to make it
> > pesisten across re-mounts.
> > 
> >>
> >> no way of 315 MB deletes within 2 hours or so on a system with just 485M
> >> used
> > 
> > The reason is that we're working on block group granularity. So if you
> > have almost free block group, and you free some blocks from it, the flag
> > gets freed and next time you run fstrim it'll trim all the free space in
> > the group. Then again if you free some blocks from the group, the flags
> > gets cleared again ...
> > 
> > But I don't think this is a problem at all. Certainly not worth tracking
> > free/trimmed extents to solve it.
> 
> it is a problem
> 
> on a daily "fstrim -av" you trim gigabytes of alredy trimmed blocks
> which for example on a vmware thin provisioned vdisk makes it down to
> CBT (changed-block-tracking)
> 
> so instead completly ignore that untouched space thanks to CBT it's
> considered as changed and verified in the follow up backup run which
> takes magnitutdes longer than needed

Looks like you identified the problem then ;)

But seriously, trim/discard was always considered advisory and the
storage is completely free to do whatever it wants to do with the
information. I might even be the case that the discard requests are
ignored and we might not even need optimization like this. But
regardless it does take time to go through the block gropus and as a
result this optimization is useful in the fs itself.

However it seems to me that the situation you're describing calls for
optimization on a storage side (TP vdisk in your case), not file system
side.

And again, for fine grained discard you can use -o discard.

-Lukas

> 
> without that behavior our daily backups would take 3 minutes instead 1
> hour but without fstrim the backup grows with useless temp data over time
>