All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ext4: Avoid trim error on fs with small groups
@ 2021-11-12 15:22 Jan Kara
  2021-11-12 15:26 ` Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jan Kara @ 2021-11-12 15:22 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara, Lukas Czerner

A user reported FITRIM ioctl failing for him on ext4 on some devices
without apparent reason.  After some debugging we've found out that
these devices (being LVM volumes) report rather large discard
granularity of 42MB and the filesystem had 1k blocksize and thus group
size of 8MB. Because ext4 FITRIM implementation puts discard
granularity into minlen, ext4_trim_fs() declared the trim request as
invalid. However just silently doing nothing seems to be a more
appropriate reaction to such combination of parameters since user did
not specify anything wrong.

CC: Lukas Czerner <lczerner@redhat.com>
Fixes: 5c2ed62fd447 ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ioctl.c   | 2 --
 fs/ext4/mballoc.c | 8 ++++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 606dee9e08a3..220a4c8178b5 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1117,8 +1117,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		    sizeof(range)))
 			return -EFAULT;
 
-		range.minlen = max((unsigned int)range.minlen,
-				   q->limits.discard_granularity);
 		ret = ext4_trim_fs(sb, &range);
 		if (ret < 0)
 			return ret;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 72bfac2d6dce..7174add7b153 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6405,6 +6405,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
  */
 int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 {
+	struct request_queue *q = bdev_get_queue(sb->s_bdev);
 	struct ext4_group_info *grp;
 	ext4_group_t group, first_group, last_group;
 	ext4_grpblk_t cnt = 0, first_cluster, last_cluster;
@@ -6423,6 +6424,13 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
 	    start >= max_blks ||
 	    range->len < sb->s_blocksize)
 		return -EINVAL;
+	/* No point to try to trim less than discard granularity */
+	if (range->minlen < q->limits.discard_granularity) {
+		minlen = EXT4_NUM_B2C(EXT4_SB(sb),
+			q->limits.discard_granularity >> sb->s_blocksize_bits);
+		if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
+			goto out;
+	}
 	if (end >= max_blks)
 		end = max_blks - 1;
 	if (end <= first_data_blk)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2021-11-12 15:22 [PATCH] ext4: Avoid trim error on fs with small groups Jan Kara
@ 2021-11-12 15:26 ` Jan Kara
  2021-11-15 11:48 ` Lukas Czerner
  2022-01-05  3:14 ` Theodore Ts'o
  2 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2021-11-12 15:26 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara, Lukas Czerner

On Fri 12-11-21 16:22:02, Jan Kara wrote:
> A user reported FITRIM ioctl failing for him on ext4 on some devices
> without apparent reason.  After some debugging we've found out that
> these devices (being LVM volumes) report rather large discard
> granularity of 42MB and the filesystem had 1k blocksize and thus group
> size of 8MB. Because ext4 FITRIM implementation puts discard
> granularity into minlen, ext4_trim_fs() declared the trim request as
> invalid. However just silently doing nothing seems to be a more
> appropriate reaction to such combination of parameters since user did
> not specify anything wrong.
> 
> CC: Lukas Czerner <lczerner@redhat.com>
> Fixes: 5c2ed62fd447 ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl")
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/ioctl.c   | 2 --
>  fs/ext4/mballoc.c | 8 ++++++++
>  2 files changed, 8 insertions(+), 2 deletions(-)

I wanted to add one more comment: Alternatively we could return EOPNOTSUPP
in this case to indicate trim is never going to work on this fs. But just
doing nothing since we cannot submit useful discard request seems
appropriate to me.

								Honza


> 
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 606dee9e08a3..220a4c8178b5 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -1117,8 +1117,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		    sizeof(range)))
>  			return -EFAULT;
>  
> -		range.minlen = max((unsigned int)range.minlen,
> -				   q->limits.discard_granularity);
>  		ret = ext4_trim_fs(sb, &range);
>  		if (ret < 0)
>  			return ret;
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 72bfac2d6dce..7174add7b153 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6405,6 +6405,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>   */
>  int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  {
> +	struct request_queue *q = bdev_get_queue(sb->s_bdev);
>  	struct ext4_group_info *grp;
>  	ext4_group_t group, first_group, last_group;
>  	ext4_grpblk_t cnt = 0, first_cluster, last_cluster;
> @@ -6423,6 +6424,13 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  	    start >= max_blks ||
>  	    range->len < sb->s_blocksize)
>  		return -EINVAL;
> +	/* No point to try to trim less than discard granularity */
> +	if (range->minlen < q->limits.discard_granularity) {
> +		minlen = EXT4_NUM_B2C(EXT4_SB(sb),
> +			q->limits.discard_granularity >> sb->s_blocksize_bits);
> +		if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
> +			goto out;
> +	}
>  	if (end >= max_blks)
>  		end = max_blks - 1;
>  	if (end <= first_data_blk)
> -- 
> 2.26.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2021-11-12 15:22 [PATCH] ext4: Avoid trim error on fs with small groups Jan Kara
  2021-11-12 15:26 ` Jan Kara
@ 2021-11-15 11:48 ` Lukas Czerner
       [not found]   ` <20211115125141.GD23412@quack2.suse.cz>
  2022-01-05  3:14 ` Theodore Ts'o
  2 siblings, 1 reply; 12+ messages in thread
From: Lukas Czerner @ 2021-11-15 11:48 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On Fri, Nov 12, 2021 at 04:22:02PM +0100, Jan Kara wrote:
> A user reported FITRIM ioctl failing for him on ext4 on some devices
> without apparent reason.  After some debugging we've found out that
> these devices (being LVM volumes) report rather large discard
> granularity of 42MB and the filesystem had 1k blocksize and thus group
> size of 8MB. Because ext4 FITRIM implementation puts discard
> granularity into minlen, ext4_trim_fs() declared the trim request as
> invalid. However just silently doing nothing seems to be a more
> appropriate reaction to such combination of parameters since user did
> not specify anything wrong.

Hi Jan,

I agree that it's better to silently do nothing rather than returning
-ENOTSUPP in this case and the patch looks mostly fine.

However currently we return the adjusted minlen back to the user and it
is also stated in the fstrim man page. I think it's worth keeping that
behavior.

When I think about it, it would probably be worth updating fstrim to
notify the user that the minlen changed, I can send a patch for that.

Thanks!
-Lukas

> 
> CC: Lukas Czerner <lczerner@redhat.com>
> Fixes: 5c2ed62fd447 ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl")
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/ioctl.c   | 2 --
>  fs/ext4/mballoc.c | 8 ++++++++
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 606dee9e08a3..220a4c8178b5 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -1117,8 +1117,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		    sizeof(range)))
>  			return -EFAULT;
>  
> -		range.minlen = max((unsigned int)range.minlen,
> -				   q->limits.discard_granularity);
>  		ret = ext4_trim_fs(sb, &range);
>  		if (ret < 0)
>  			return ret;
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 72bfac2d6dce..7174add7b153 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6405,6 +6405,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
>   */
>  int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  {
> +	struct request_queue *q = bdev_get_queue(sb->s_bdev);
>  	struct ext4_group_info *grp;
>  	ext4_group_t group, first_group, last_group;
>  	ext4_grpblk_t cnt = 0, first_cluster, last_cluster;
> @@ -6423,6 +6424,13 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
>  	    start >= max_blks ||
>  	    range->len < sb->s_blocksize)
>  		return -EINVAL;
> +	/* No point to try to trim less than discard granularity */
> +	if (range->minlen < q->limits.discard_granularity) {
> +		minlen = EXT4_NUM_B2C(EXT4_SB(sb),
> +			q->limits.discard_granularity >> sb->s_blocksize_bits);
> +		if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
> +			goto out;
> +	}
>  	if (end >= max_blks)
>  		end = max_blks - 1;
>  	if (end <= first_data_blk)
> -- 
> 2.26.2
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
       [not found]         ` <4e4d1ac7735c38f1395db19b97025bf411982b60.camel@suse.com>
@ 2021-11-16 11:56           ` Lukas Czerner
  2021-11-17  4:35             ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Lukas Czerner @ 2021-11-16 11:56 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Jan Kara, Ted Tso, mwilck, martin.petersen, linux-fsdevel

On Mon, Nov 15, 2021 at 04:10:12PM +0100, Martin Wilck wrote:
> On Mon, 2021-11-15 at 15:53 +0100, Lukas Czerner wrote:
> > On Mon, Nov 15, 2021 at 03:22:02PM +0100, Martin Wilck wrote:
> > > On Mon, 2021-11-15 at 13:51 +0100, Jan Kara wrote:
> > > > [Added Martin to CC who originally found the problem]
> > > > 
> > > > On Mon 15-11-21 12:48:21, Lukas Czerner wrote:
> > > > > On Fri, Nov 12, 2021 at 04:22:02PM +0100, Jan Kara wrote:
> > > > > > A user reported FITRIM ioctl failing for him on ext4 on some
> > > > > > devices
> > > > > > without apparent reason.  After some debugging we've found
> > > > > > out
> > > > > > that
> > > > > > these devices (being LVM volumes) report rather large discard
> > > > > > granularity of 42MB and the filesystem had 1k blocksize and
> > > > > > thus
> > > > > > group
> > > > > > size of 8MB. Because ext4 FITRIM implementation puts discard
> > > > > > granularity into minlen, ext4_trim_fs() declared the trim
> > > > > > request
> > > > > > as
> > > > > > invalid. However just silently doing nothing seems to be a
> > > > > > more
> > > > > > appropriate reaction to such combination of parameters since
> > > > > > user
> > > > > > did
> > > > > > not specify anything wrong.
> > > > > 
> > > > > Hi Jan,
> > > > > 
> > > > > I agree that it's better to silently do nothing rather than
> > > > > returning
> > > > > -ENOTSUPP in this case and the patch looks mostly fine.
> > > > > 
> > > > > However currently we return the adjusted minlen back to the
> > > > > user
> > > > > and it
> > > > > is also stated in the fstrim man page. I think it's worth
> > > > > keeping
> > > > > that
> > > > > behavior.
> > > > 
> > > > OK.
> > > > 
> > > > > When I think about it, it would probably be worth updating
> > > > > fstrim
> > > > > to
> > > > > notify the user that the minlen changed, I can send a patch for
> > > > > that.
> > > > 
> > > > I've added Martin to this conversation because he is of the
> > > > opinion
> > > > that
> > > > the filesystem actually should not increase the minlen based on
> > > > discard
> > > > granularity at all. It should leave this to the lower layers or
> > > > userspace.
> > > > Honestly I don't have strong opinion either way so I wanted to
> > > > throw
> > > > Martin's opinion into the mix as a possibility. Also maybe you
> > > > remember
> > > > whether the motivation of the original commit 5c2ed62fd447 was
> > > > motivated by
> > > > some real world experience or just theoretical concerns?
> > > > 
> > > >                                                                 H
> > > > onza
> > > 
> > > Thanks for notifying me, Jan. FTR, below's my argument, quoted from
> > > bugzilla:
> > > 
> > > Whether or not the discard *can* be carried out, and whether it
> > > makes
> > > sense to do so on any given device, is not the file system's
> > > business.
> > > It's up to the block layer and the device itself. And whether or
> > > not
> > > blocks *should* be discarded isn't the filesystem's business,
> > > either;
> > > it's the decision of user space (*).
> > > 
> > > I find it strange that file system code starts reasoning whether
> > > block
> > > IO commands are "likely" to succeed or fail. IMO the only valid
> > > reason
> > > for the filesystem to intervene would be if the result of the
> > > FITRIM
> > > would be a suboptimal block allocation, causing performance
> > > deterioration for future filesystem I/O. I don't see that that's
> > > the
> > > case here. 
> > 
> > It is not a matter of probability, or chance as you seem to imply.
> > The
> > discard granularity indicates the internal allocation unit and if the
> > discard request is smaller than that it will do nothing. So from my
> > POV
> > it's just an optimization to avoid sending such pointless requests.
> > 
> > Am I wrong in thinking that such discard requests are useless ?
> 
> Please have a look at __blkdev_issue_discard():
> https://elixir.bootlin.com/linux/latest/source/block/blk-lib.c#L26
> 
> Unless I'm misreading the code, it does very well pass REQ_OP_DISCARD
> bios to the device which don't meet the granularity or alignment
> requirements. It tries to align as well as possible, but the first and
> last bio submitted don't necessarily match the granularity.
> 
> For SCSI at least, unmap granularity is mainly a means for
> optimization: "An unmap request with a number of logical blocks that is
> not a multiple of this value may result in unmap operations on fewer
> LBAs than requested" (SBC5, §6.6.4). So, devices _may_ ignore such
> requests, but that's not necessarily the case.

My understanding always was that the request needs to be both properly
aligned and of a certain minimum size (discard_granularity) to take
effect.

If what you're saying is true then I feel like the documentation of
discard_granularity

Documentation/ABI/testing/sysfs-block
Documentation/block/queue-sysfs.rst

should change because that's not how I understand the notion of internal
allocation unit. However the sentence you quoted is not entirely clear
to me either so I'll leave that decision to someone who understands it
better than me.

Martin could you please clarify it for us ?

Adding Martin Petersen to cc and somehow we're off the list so add
fsdevel as well.

Thanks!
-Lukas

> 
> Another point is that only the block layer has full information about
> the alignment, which depends on partition start sectors. I believe the
> block layer should be the authority to decide whether or not the
> request is valid for the device.
> 
> Regards,
> Martin
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2021-11-16 11:56           ` Lukas Czerner
@ 2021-11-17  4:35             ` Martin K. Petersen
  2021-11-19 15:43               ` Martin Wilck
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2021-11-17  4:35 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Martin Wilck, Jan Kara, Ted Tso, mwilck, martin.petersen, linux-fsdevel


Lukas,

> My understanding always was that the request needs to be both properly
> aligned and of a certain minimum size (discard_granularity) to take
> effect.
>
> If what you're saying is true then I feel like the documentation of
> discard_granularity
>
> Documentation/ABI/testing/sysfs-block
> Documentation/block/queue-sysfs.rst
>
> should change because that's not how I understand the notion of
> internal allocation unit. However the sentence you quoted is not
> entirely clear to me either so I'll leave that decision to someone who
> understands it better than me.
>
> Martin could you please clarify it for us ?

The rationale behind exposing the discard granularity to userland was to
facilitate mkfs.* picking allocation units that were friendly wrt. the
device's internal granularity. The nature of that granularity depends on
the type of device (thin provisioned, resource provisioned, SSD, etc.).

Just like the other queue limits the discard granularity was meant as a
hint (some of that hintiness got lost as a result of conflating zeroing
and deallocating but that has since been resolved).

It is true that some devices get confused if you submit discards that
are smaller than their internal granularity. However, those devices are
typically the ones that don't actually support reporting that
granularity in the first place! Whereas SCSI devices generally don't
care. They'll happily ignore any parts of the request that are smaller
than whatever size they use internally.

One of the problems with "optimizing" away pieces that are smaller than
the reported granularity is when you combine devices. Say you have a
RAID1 of an SSD that reports 4096 bytes and one that reports 256MB. The
stacked device will then have a discard granularity of 256MB and thus
the SSD with the smaller granularity won't receive any of the discards
that might otherwise help it manage its media.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2021-11-17  4:35             ` Martin K. Petersen
@ 2021-11-19 15:43               ` Martin Wilck
  2021-11-22 13:53                 ` Lukas Czerner
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Wilck @ 2021-11-19 15:43 UTC (permalink / raw)
  To: Martin K. Petersen, Lukas Czerner
  Cc: Jan Kara, Ted Tso, mwilck, linux-fsdevel

Hi Martin, Lukas,

On Tue, 2021-11-16 at 23:35 -0500, Martin K. Petersen wrote:
> 
> Lukas,
> 
> > My understanding always was that the request needs to be both
> > properly
> > aligned and of a certain minimum size (discard_granularity) to take
> > effect.
> > 
> > If what you're saying is true then I feel like the documentation of
> > discard_granularity
> > 
> > Documentation/ABI/testing/sysfs-block
> > Documentation/block/queue-sysfs.rst
> > 
> > should change because that's not how I understand the notion of
> > internal allocation unit. However the sentence you quoted is not
> > entirely clear to me either so I'll leave that decision to someone
> > who
> > understands it better than me.
> > 
> > Martin could you please clarify it for us ?
> 
> The rationale behind exposing the discard granularity to userland was
> to
> facilitate mkfs.* picking allocation units that were friendly wrt. the
> device's internal granularity. The nature of that granularity depends
> on
> the type of device (thin provisioned, resource provisioned, SSD, etc.).
> 
> Just like the other queue limits the discard granularity was meant as a
> hint (some of that hintiness got lost as a result of conflating zeroing
> and deallocating but that has since been resolved).
> 
> It is true that some devices get confused if you submit discards that
> are smaller than their internal granularity. However, those devices are
> typically the ones that don't actually support reporting that
> granularity in the first place! Whereas SCSI devices generally don't
> care. They'll happily ignore any parts of the request that are smaller
> than whatever size they use internally.
> 
> One of the problems with "optimizing" away pieces that are smaller than
> the reported granularity is when you combine devices. Say you have a
> RAID1 of an SSD that reports 4096 bytes and one that reports 256MB. The
> stacked device will then have a discard granularity of 256MB and thus
> the SSD with the smaller granularity won't receive any of the discards
> that might otherwise help it manage its media.

I've checked btrfs and xfs, and they treat the minlen like Jan's patch
would - the minlen is set to the device's granularity, and discarding
smaller ranges is silently not even attempted.

So this seems to be common practice among file system implementations,
and thus Jan's patch would make ext4 behave like the others. But I'm
still uncertain if this behavior is ideal, and your remarks seem to
confirm that.

The fstrim man page text is highly misleading. "The default value is
zero, discarding every free block" is obviously wrong, given the
current actual behavior of the major filesystems.

The way this is currently implemented, the user has no way to actually
"discard every free block" to the extent supported by the device. I
think that being able to do this would be desirable, at least in
certain situations.

If we changed this, with the default value of 0 used by fstrim, file
systems would attempt to free every block, which is slow and would
likely either fail or have no effect on may devices. Maybe we should
treat the value "0" like "automatic", i.e. adjust it the minlen to the
device's granularity like we do now, but leave the value unchanged if
the user explicitly sets a small non-zero value? That way "fstrim -m
512" could be used to force discarding every block that can be
discarded.

Regards
Martin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2021-11-19 15:43               ` Martin Wilck
@ 2021-11-22 13:53                 ` Lukas Czerner
  2022-01-03 15:34                   ` Martin Wilck
  0 siblings, 1 reply; 12+ messages in thread
From: Lukas Czerner @ 2021-11-22 13:53 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Martin K. Petersen, Jan Kara, Ted Tso, mwilck, linux-fsdevel

On Fri, Nov 19, 2021 at 04:43:53PM +0100, Martin Wilck wrote:
> Hi Martin, Lukas,
> 
> On Tue, 2021-11-16 at 23:35 -0500, Martin K. Petersen wrote:
> > 
> > Lukas,
> > 
> > > My understanding always was that the request needs to be both
> > > properly
> > > aligned and of a certain minimum size (discard_granularity) to take
> > > effect.
> > > 
> > > If what you're saying is true then I feel like the documentation of
> > > discard_granularity
> > > 
> > > Documentation/ABI/testing/sysfs-block
> > > Documentation/block/queue-sysfs.rst
> > > 
> > > should change because that's not how I understand the notion of
> > > internal allocation unit. However the sentence you quoted is not
> > > entirely clear to me either so I'll leave that decision to someone
> > > who
> > > understands it better than me.
> > > 
> > > Martin could you please clarify it for us ?
> > 
> > The rationale behind exposing the discard granularity to userland was
> > to
> > facilitate mkfs.* picking allocation units that were friendly wrt. the
> > device's internal granularity. The nature of that granularity depends
> > on
> > the type of device (thin provisioned, resource provisioned, SSD, etc.).
> > 
> > Just like the other queue limits the discard granularity was meant as a
> > hint (some of that hintiness got lost as a result of conflating zeroing
> > and deallocating but that has since been resolved).
> > 
> > It is true that some devices get confused if you submit discards that
> > are smaller than their internal granularity. However, those devices are
> > typically the ones that don't actually support reporting that
> > granularity in the first place! Whereas SCSI devices generally don't
> > care. They'll happily ignore any parts of the request that are smaller
> > than whatever size they use internally.
> > 
> > One of the problems with "optimizing" away pieces that are smaller than
> > the reported granularity is when you combine devices. Say you have a
> > RAID1 of an SSD that reports 4096 bytes and one that reports 256MB. The
> > stacked device will then have a discard granularity of 256MB and thus
> > the SSD with the smaller granularity won't receive any of the discards
> > that might otherwise help it manage its media.

Thanks you Martin P. for clarifying.

> 
> I've checked btrfs and xfs, and they treat the minlen like Jan's patch
> would - the minlen is set to the device's granularity, and discarding
> smaller ranges is silently not even attempted.
> 
> So this seems to be common practice among file system implementations,
> and thus Jan's patch would make ext4 behave like the others. But I'm
> still uncertain if this behavior is ideal, and your remarks seem to
> confirm that.

Yeah I modeled my change for ext4 on the work in xfs done by Christoph
and so it kind of spread organically to other file systems.

> 
> The fstrim man page text is highly misleading. "The default value is
> zero, discarding every free block" is obviously wrong, given the
> current actual behavior of the major filesystems.

Originally there was no discard granularity optimization so it is what
it meant. And it also says "fstrim will adjust the minimum if it's
smaller than the device's minimum". So I am not sure if it's necessarily
misleading.

Still I think it should be best effort from the fs, not guarantee.

> 
> The way this is currently implemented, the user has no way to actually
> "discard every free block" to the extent supported by the device. I
> think that being able to do this would be desirable, at least in
> certain situations.
> 
> If we changed this, with the default value of 0 used by fstrim, file
> systems would attempt to free every block, which is slow and would
> likely either fail or have no effect on may devices. Maybe we should
> treat the value "0" like "automatic", i.e. adjust it the minlen to the
> device's granularity like we do now, but leave the value unchanged if
> the user explicitly sets a small non-zero value? That way "fstrim -m
> 512" could be used to force discarding every block that can be
> discarded.

Perhaps, this sounds like a reasonable solution to me.

> 
> Regards
> Martin
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2021-11-22 13:53                 ` Lukas Czerner
@ 2022-01-03 15:34                   ` Martin Wilck
  2022-01-03 18:59                     ` Lukas Czerner
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Wilck @ 2022-01-03 15:34 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Martin K. Petersen, Jan Kara, Ted Tso, mwilck, linux-fsdevel

On Mon, 2021-11-22 at 14:53 +0100, Lukas Czerner wrote:
> On Fri, Nov 19, 2021 at 04:43:53PM +0100, Martin Wilck wrote:
> > Hi Martin, Lukas,
> > 
> > On Tue, 2021-11-16 at 23:35 -0500, Martin K. Petersen wrote:
> > 
> 
> Thanks you Martin P. for clarifying.
> 
> > 
> > I've checked btrfs and xfs, and they treat the minlen like Jan's
> > patch
> > would - the minlen is set to the device's granularity, and
> > discarding
> > smaller ranges is silently not even attempted.
> > 
> > So this seems to be common practice among file system
> > implementations,
> > and thus Jan's patch would make ext4 behave like the others. But
> > I'm
> > still uncertain if this behavior is ideal, and your remarks seem to
> > confirm that.
> 
> Yeah I modeled my change for ext4 on the work in xfs done by
> Christoph
> and so it kind of spread organically to other file systems.

Ok. I still believe that it's not ideal this way, but this logic is
only applied in corner cases AFAICS, so it doesn't really hurt.

I'm fine with Jan's patch for the time being.

> 
> > 
> > The fstrim man page text is highly misleading. "The default value
> > is
> > zero, discarding every free block" is obviously wrong, given the
> > current actual behavior of the major filesystems.
> 
> Originally there was no discard granularity optimization so it is
> what
> it meant. 

Not quite. That man page text is from 2019, commit ce3d198d7 ("fstrim:
document kernel return minlen explicitly"). At that time,
discard_granularity had existed for 10 years already.


> And it also says "fstrim will adjust the minimum if it's
> smaller than the device's minimum". So I am not sure if it's
> necessarily
> misleading.

It is misleading, because it's not fstrim that adjusts anything, but
the file system code in the kernel.

> 
> Still I think it should be best effort from the fs, not guarantee.
> 
> > 
> > The way this is currently implemented, the user has no way to
> > actually
> > "discard every free block" to the extent supported by the device. I
> > think that being able to do this would be desirable, at least in
> > certain situations.
> > 
> > If we changed this, with the default value of 0 used by fstrim,
> > file
> > systems would attempt to free every block, which is slow and would
> > likely either fail or have no effect on may devices. Maybe we
> > should
> > treat the value "0" like "automatic", i.e. adjust it the minlen to
> > the
> > device's granularity like we do now, but leave the value unchanged
> > if
> > the user explicitly sets a small non-zero value? That way "fstrim -
> > m
> > 512" could be used to force discarding every block that can be
> > discarded.
> 
> Perhaps, this sounds like a reasonable solution to me.

This could be implemented in a second step, then.

Thanks,
Martin




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2022-01-03 15:34                   ` Martin Wilck
@ 2022-01-03 18:59                     ` Lukas Czerner
  2022-01-04 14:55                       ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Lukas Czerner @ 2022-01-03 18:59 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Jan Kara, Ted Tso, mwilck, linux-fsdevel

On Mon, Jan 03, 2022 at 04:34:40PM +0100, Martin Wilck wrote:
> On Mon, 2021-11-22 at 14:53 +0100, Lukas Czerner wrote:
> > On Fri, Nov 19, 2021 at 04:43:53PM +0100, Martin Wilck wrote:
> > > Hi Martin, Lukas,
> > > 
> > > On Tue, 2021-11-16 at 23:35 -0500, Martin K. Petersen wrote:
> > > 
> > 
> > Thanks you Martin P. for clarifying.
> > 
> > > 
> > > I've checked btrfs and xfs, and they treat the minlen like Jan's
> > > patch
> > > would - the minlen is set to the device's granularity, and
> > > discarding
> > > smaller ranges is silently not even attempted.
> > > 
> > > So this seems to be common practice among file system
> > > implementations,
> > > and thus Jan's patch would make ext4 behave like the others. But
> > > I'm
> > > still uncertain if this behavior is ideal, and your remarks seem to
> > > confirm that.
> > 
> > Yeah I modeled my change for ext4 on the work in xfs done by
> > Christoph
> > and so it kind of spread organically to other file systems.
> 
> Ok. I still believe that it's not ideal this way, but this logic is
> only applied in corner cases AFAICS, so it doesn't really hurt.
> 
> I'm fine with Jan's patch for the time being.
> 
> > 
> > > 
> > > The fstrim man page text is highly misleading. "The default value
> > > is
> > > zero, discarding every free block" is obviously wrong, given the
> > > current actual behavior of the major filesystems.
> > 
> > Originally there was no discard granularity optimization so it is
> > what
> > it meant. 
> 
> Not quite. That man page text is from 2019, commit ce3d198d7 ("fstrim:
> document kernel return minlen explicitly"). At that time,
> discard_granularity had existed for 10 years already.

Not really, the sentence you're pointing out above was there from the
beginning. Commit ce3d198d7 didn't change that.

> 
> 
> > And it also says "fstrim will adjust the minimum if it's
> > smaller than the device's minimum". So I am not sure if it's
> > necessarily
> > misleading.
> 
> It is misleading, because it's not fstrim that adjusts anything, but
> the file system code in the kernel.

This makes absolutely no difference from the user POV. Enough nitpicking,
feel free to cc me if you decide to send a patch.

Thanks!
-Lukas


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2022-01-03 18:59                     ` Lukas Czerner
@ 2022-01-04 14:55                       ` Jan Kara
  2022-01-04 15:05                         ` Martin Wilck
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2022-01-04 14:55 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Martin Wilck, Jan Kara, Ted Tso, mwilck, linux-fsdevel

On Mon 03-01-22 19:59:40, Lukas Czerner wrote:
> On Mon, Jan 03, 2022 at 04:34:40PM +0100, Martin Wilck wrote:
> > On Mon, 2021-11-22 at 14:53 +0100, Lukas Czerner wrote:
> > > On Fri, Nov 19, 2021 at 04:43:53PM +0100, Martin Wilck wrote:
> > > > Hi Martin, Lukas,
> > > > 
> > > > On Tue, 2021-11-16 at 23:35 -0500, Martin K. Petersen wrote:
> > > > 
> > > 
> > > Thanks you Martin P. for clarifying.
> > > 
> > > > 
> > > > I've checked btrfs and xfs, and they treat the minlen like Jan's
> > > > patch
> > > > would - the minlen is set to the device's granularity, and
> > > > discarding
> > > > smaller ranges is silently not even attempted.
> > > > 
> > > > So this seems to be common practice among file system
> > > > implementations,
> > > > and thus Jan's patch would make ext4 behave like the others. But
> > > > I'm
> > > > still uncertain if this behavior is ideal, and your remarks seem to
> > > > confirm that.
> > > 
> > > Yeah I modeled my change for ext4 on the work in xfs done by
> > > Christoph
> > > and so it kind of spread organically to other file systems.
> > 
> > Ok. I still believe that it's not ideal this way, but this logic is
> > only applied in corner cases AFAICS, so it doesn't really hurt.
> > 
> > I'm fine with Jan's patch for the time being.
> > 
> > > 
> > > > 
> > > > The fstrim man page text is highly misleading. "The default value
> > > > is
> > > > zero, discarding every free block" is obviously wrong, given the
> > > > current actual behavior of the major filesystems.
> > > 
> > > Originally there was no discard granularity optimization so it is
> > > what
> > > it meant. 
> > 
> > Not quite. That man page text is from 2019, commit ce3d198d7 ("fstrim:
> > document kernel return minlen explicitly"). At that time,
> > discard_granularity had existed for 10 years already.
> 
> Not really, the sentence you're pointing out above was there from the
> beginning. Commit ce3d198d7 didn't change that.
> 
> > 
> > 
> > > And it also says "fstrim will adjust the minimum if it's
> > > smaller than the device's minimum". So I am not sure if it's
> > > necessarily
> > > misleading.
> > 
> > It is misleading, because it's not fstrim that adjusts anything, but
> > the file system code in the kernel.
> 
> This makes absolutely no difference from the user POV. Enough nitpicking,
> feel free to cc me if you decide to send a patch.

So I think the conclusion is that we go with my original patch? Just I
should update it to return computed minlen back to the user, correct?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2022-01-04 14:55                       ` Jan Kara
@ 2022-01-04 15:05                         ` Martin Wilck
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Wilck @ 2022-01-04 15:05 UTC (permalink / raw)
  To: Jan Kara, Lukas Czerner; +Cc: Ted Tso, mwilck, linux-fsdevel

On Tue, 2022-01-04 at 15:55 +0100, Jan Kara wrote:
> 
> So I think the conclusion is that we go with my original patch? Just
> I
> should update it to return computed minlen back to the user, correct?
> 
>                                                                 Honza

Yes, that's my understanding.

Martin




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ext4: Avoid trim error on fs with small groups
  2021-11-12 15:22 [PATCH] ext4: Avoid trim error on fs with small groups Jan Kara
  2021-11-12 15:26 ` Jan Kara
  2021-11-15 11:48 ` Lukas Czerner
@ 2022-01-05  3:14 ` Theodore Ts'o
  2 siblings, 0 replies; 12+ messages in thread
From: Theodore Ts'o @ 2022-01-05  3:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: Theodore Ts'o, Lukas Czerner, linux-ext4

On Fri, 12 Nov 2021 16:22:02 +0100, Jan Kara wrote:
> A user reported FITRIM ioctl failing for him on ext4 on some devices
> without apparent reason.  After some debugging we've found out that
> these devices (being LVM volumes) report rather large discard
> granularity of 42MB and the filesystem had 1k blocksize and thus group
> size of 8MB. Because ext4 FITRIM implementation puts discard
> granularity into minlen, ext4_trim_fs() declared the trim request as
> invalid. However just silently doing nothing seems to be a more
> appropriate reaction to such combination of parameters since user did
> not specify anything wrong.
> 
> [...]

Applied, thanks!

[1/1] ext4: Avoid trim error on fs with small groups
      commit: a4934e25c01ed056dc4af8bef086616e3b083a14

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-01-06 15:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-12 15:22 [PATCH] ext4: Avoid trim error on fs with small groups Jan Kara
2021-11-12 15:26 ` Jan Kara
2021-11-15 11:48 ` Lukas Czerner
     [not found]   ` <20211115125141.GD23412@quack2.suse.cz>
     [not found]     ` <59b60aae9401a043f7d7cec0f8004f2ca7d4f4db.camel@suse.com>
     [not found]       ` <20211115145312.g4ptf22rl55jf37l@work>
     [not found]         ` <4e4d1ac7735c38f1395db19b97025bf411982b60.camel@suse.com>
2021-11-16 11:56           ` Lukas Czerner
2021-11-17  4:35             ` Martin K. Petersen
2021-11-19 15:43               ` Martin Wilck
2021-11-22 13:53                 ` Lukas Czerner
2022-01-03 15:34                   ` Martin Wilck
2022-01-03 18:59                     ` Lukas Czerner
2022-01-04 14:55                       ` Jan Kara
2022-01-04 15:05                         ` Martin Wilck
2022-01-05  3:14 ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.