linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mkfs: inform during block discarding
@ 2019-11-21 21:44 Pavel Reichl
  2019-11-21 21:44 ` [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB Pavel Reichl
  2019-11-21 21:44 ` [PATCH 2/2] mkfs: Show progress during block discard Pavel Reichl
  0 siblings, 2 replies; 22+ messages in thread
From: Pavel Reichl @ 2019-11-21 21:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Pavel Reichl

1st patch breaks discarding into smaller chunks and thus make interruption and
logging possible.

2nd patch writes messages about discarding process into stderr. Sample output
looks like this:
Discarding:  0% done
Discarding: 20% done
Discarding: 40% done
Discarding: 60% done
Discarding: 80% done
Discarding is done.

Pavel Reichl (2):
  mkfs: Break block discard into chunks of 2 GB
  mkfs: Show progress during block discard

 mkfs/xfs_mkfs.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 21:44 [PATCH 0/2] mkfs: inform during block discarding Pavel Reichl
@ 2019-11-21 21:44 ` Pavel Reichl
  2019-11-21 21:55   ` Darrick J. Wong
                     ` (2 more replies)
  2019-11-21 21:44 ` [PATCH 2/2] mkfs: Show progress during block discard Pavel Reichl
  1 sibling, 3 replies; 22+ messages in thread
From: Pavel Reichl @ 2019-11-21 21:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Pavel Reichl

Signed-off-by: Pavel Reichl <preichl@redhat.com>
---
 mkfs/xfs_mkfs.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 18338a61..a02d6f66 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -1242,15 +1242,33 @@ done:
 static void
 discard_blocks(dev_t dev, uint64_t nsectors)
 {
-	int fd;
+	int		fd;
+	uint64_t	offset		= 0;
+	/* Maximal chunk of bytes to discard is 2GB */
+	const uint64_t	step		= (uint64_t)2<<30;
+	/* Sector size is 512 bytes */
+	const uint64_t	count		= nsectors << 9;
 
-	/*
-	 * We intentionally ignore errors from the discard ioctl.  It is
-	 * not necessary for the mkfs functionality but just an optimization.
-	 */
 	fd = libxfs_device_to_fd(dev);
-	if (fd > 0)
-		platform_discard_blocks(fd, 0, nsectors << 9);
+	if (fd <= 0)
+		return;
+
+	while (offset < count) {
+		uint64_t	tmp_step = step;
+
+		if ((offset + step) > count)
+			tmp_step = count - offset;
+
+		/*
+		 * We intentionally ignore errors from the discard ioctl. It is
+		 * not necessary for the mkfs functionality but just an
+		 * optimization. However we should stop on error.
+		 */
+		if (platform_discard_blocks(fd, offset, tmp_step))
+			return;
+
+		offset += tmp_step;
+	}
 }
 
 static __attribute__((noreturn)) void
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-21 21:44 [PATCH 0/2] mkfs: inform during block discarding Pavel Reichl
  2019-11-21 21:44 ` [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB Pavel Reichl
@ 2019-11-21 21:44 ` Pavel Reichl
  2019-11-21 21:59   ` Darrick J. Wong
  2019-11-21 23:41   ` Dave Chinner
  1 sibling, 2 replies; 22+ messages in thread
From: Pavel Reichl @ 2019-11-21 21:44 UTC (permalink / raw)
  To: linux-xfs; +Cc: Pavel Reichl

Signed-off-by: Pavel Reichl <preichl@redhat.com>
---
 mkfs/xfs_mkfs.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index a02d6f66..07b8bd78 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -1248,6 +1248,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
 	const uint64_t	step		= (uint64_t)2<<30;
 	/* Sector size is 512 bytes */
 	const uint64_t	count		= nsectors << 9;
+	uint64_t	prev_done	= (uint64_t) ~0;
 
 	fd = libxfs_device_to_fd(dev);
 	if (fd <= 0)
@@ -1255,6 +1256,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
 
 	while (offset < count) {
 		uint64_t	tmp_step = step;
+		uint64_t	done = offset * 100 / count;
 
 		if ((offset + step) > count)
 			tmp_step = count - offset;
@@ -1268,7 +1270,13 @@ discard_blocks(dev_t dev, uint64_t nsectors)
 			return;
 
 		offset += tmp_step;
+
+		if (prev_done != done) {
+			prev_done = done;
+			fprintf(stderr, _("Discarding: %2lu%% done\n"), done);
+		}
 	}
+	fprintf(stderr, _("Discarding is done.\n"));
 }
 
 static __attribute__((noreturn)) void
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 21:44 ` [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB Pavel Reichl
@ 2019-11-21 21:55   ` Darrick J. Wong
  2019-11-22 14:46     ` Pavel Reichl
  2019-11-22 21:07     ` Eric Sandeen
  2019-11-21 23:18   ` Dave Chinner
  2019-11-26 20:53   ` Eric Sandeen
  2 siblings, 2 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-11-21 21:55 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
> Signed-off-by: Pavel Reichl <preichl@redhat.com>
> ---
>  mkfs/xfs_mkfs.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 18338a61..a02d6f66 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -1242,15 +1242,33 @@ done:
>  static void
>  discard_blocks(dev_t dev, uint64_t nsectors)
>  {
> -	int fd;
> +	int		fd;
> +	uint64_t	offset		= 0;
> +	/* Maximal chunk of bytes to discard is 2GB */
> +	const uint64_t	step		= (uint64_t)2<<30;

You don't need the tabs after the variable name, e.g.

	/* Maximal chunk of bytes to discard is 2GB */
	const uint64_t	step = 2ULL << 30;

> +	/* Sector size is 512 bytes */
> +	const uint64_t	count		= nsectors << 9;

count = BBTOB(nsectors)?

>  
> -	/*
> -	 * We intentionally ignore errors from the discard ioctl.  It is
> -	 * not necessary for the mkfs functionality but just an optimization.
> -	 */
>  	fd = libxfs_device_to_fd(dev);
> -	if (fd > 0)
> -		platform_discard_blocks(fd, 0, nsectors << 9);
> +	if (fd <= 0)
> +		return;
> +
> +	while (offset < count) {
> +		uint64_t	tmp_step = step;

tmp_step = min(step, count - offset); ?

Otherwise seems reasonable to me, if nothing else to avoid the problem
where you ask mkfs to discard and can't cancel it....

--D

> +
> +		if ((offset + step) > count)
> +			tmp_step = count - offset;
> +
> +		/*
> +		 * We intentionally ignore errors from the discard ioctl. It is
> +		 * not necessary for the mkfs functionality but just an
> +		 * optimization. However we should stop on error.
> +		 */
> +		if (platform_discard_blocks(fd, offset, tmp_step))
> +			return;
> +
> +		offset += tmp_step;
> +	}
>  }
>  
>  static __attribute__((noreturn)) void
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-21 21:44 ` [PATCH 2/2] mkfs: Show progress during block discard Pavel Reichl
@ 2019-11-21 21:59   ` Darrick J. Wong
  2019-11-22 16:27     ` Pavel Reichl
  2019-11-21 23:41   ` Dave Chinner
  1 sibling, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2019-11-21 21:59 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Thu, Nov 21, 2019 at 10:44:45PM +0100, Pavel Reichl wrote:
> Signed-off-by: Pavel Reichl <preichl@redhat.com>
> ---
>  mkfs/xfs_mkfs.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index a02d6f66..07b8bd78 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -1248,6 +1248,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
>  	const uint64_t	step		= (uint64_t)2<<30;
>  	/* Sector size is 512 bytes */
>  	const uint64_t	count		= nsectors << 9;
> +	uint64_t	prev_done	= (uint64_t) ~0;
>  
>  	fd = libxfs_device_to_fd(dev);
>  	if (fd <= 0)
> @@ -1255,6 +1256,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
>  
>  	while (offset < count) {
>  		uint64_t	tmp_step = step;
> +		uint64_t	done = offset * 100 / count;
>  
>  		if ((offset + step) > count)
>  			tmp_step = count - offset;
> @@ -1268,7 +1270,13 @@ discard_blocks(dev_t dev, uint64_t nsectors)
>  			return;
>  
>  		offset += tmp_step;
> +
> +		if (prev_done != done) {

Hmm... so this prints the status message every increase percentage
point, right?

> +			prev_done = done;
> +			fprintf(stderr, _("Discarding: %2lu%% done\n"), done);

This isn't an error, so why output to stderr?

FWIW if it's a tty you might consider ending that string with \r so the
status messages don't scroll off the screen.  Or possibly only reporting
status if stdout is a tty?

--D

> +		}
>  	}
> +	fprintf(stderr, _("Discarding is done.\n"));
>  }
>  
>  static __attribute__((noreturn)) void
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 21:44 ` [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB Pavel Reichl
  2019-11-21 21:55   ` Darrick J. Wong
@ 2019-11-21 23:18   ` Dave Chinner
  2019-11-22 15:38     ` Darrick J. Wong
                       ` (2 more replies)
  2019-11-26 20:53   ` Eric Sandeen
  2 siblings, 3 replies; 22+ messages in thread
From: Dave Chinner @ 2019-11-21 23:18 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
> Signed-off-by: Pavel Reichl <preichl@redhat.com>
> ---

This is mixing an explanation about why the change is being made
and what was considered when making decisions about the change.

e.g. my first questions on looking at the patch were:

	- why do we need to break up the discards into 2GB chunks?
	- why 2GB?
	- why not use libblkid to query the maximum discard size
	  and use that as the step size instead?
	- is there any performance impact from breaking up large
	  discards that might be optimised by the kernel into many
	  overlapping async operations into small, synchronous
	  discards?

i.e. the reviewer can read what the patch does, but that deosn't
explain why the patch does this. Hence it's a good idea to explain
the problem being solved or the feature requirements that have lead
to the changes in the patch....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-21 21:44 ` [PATCH 2/2] mkfs: Show progress during block discard Pavel Reichl
  2019-11-21 21:59   ` Darrick J. Wong
@ 2019-11-21 23:41   ` Dave Chinner
  2019-11-22 16:43     ` Pavel Reichl
  1 sibling, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2019-11-21 23:41 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Thu, Nov 21, 2019 at 10:44:45PM +0100, Pavel Reichl wrote:
> Signed-off-by: Pavel Reichl <preichl@redhat.com>
> ---
>  mkfs/xfs_mkfs.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index a02d6f66..07b8bd78 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -1248,6 +1248,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
>  	const uint64_t	step		= (uint64_t)2<<30;
>  	/* Sector size is 512 bytes */
>  	const uint64_t	count		= nsectors << 9;
> +	uint64_t	prev_done	= (uint64_t) ~0;
>  
>  	fd = libxfs_device_to_fd(dev);
>  	if (fd <= 0)
> @@ -1255,6 +1256,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
>  
>  	while (offset < count) {
>  		uint64_t	tmp_step = step;
> +		uint64_t	done = offset * 100 / count;

That will overflow on a EB-scale (2^60 bytes) filesystems, won't it?

>  
>  		if ((offset + step) > count)
>  			tmp_step = count - offset;
> @@ -1268,7 +1270,13 @@ discard_blocks(dev_t dev, uint64_t nsectors)
>  			return;
>  
>  		offset += tmp_step;
> +
> +		if (prev_done != done) {
> +			prev_done = done;
> +			fprintf(stderr, _("Discarding: %2lu%% done\n"), done);
> +		}
>  	}
> +	fprintf(stderr, _("Discarding is done.\n"));

Hmmm - this output doesn't get suppressed when the "quiet" (-q)
option is used. mkfs is supposed to be silent when this option is
specified.

I also suspect that it breaks a few fstests, too, as a some of them
capture and filter mkfs output. They'll need filters to drop these
new messages.

FWIW, a 100 lines of extra mkfs output is going to cause workflow
issues. I know it will cause me problems, because I often mkfs 500TB
filesystems tens of times a day on a discard enabled device. This
extra output will scroll all the context of the previous test run
I'm about to compare against off my terminal screen and so now I
will have to scroll the terminal to look at the results of
back-to-back runs. IOWs, I'm going to immediately want to turn this
output off and have it stay off permanently.

Hence I think that, by default, just outputting a single "Discard in
progress" line before starting the discard would be sufficient
indication of what mkfs is currently doing. If someone wants more
verbose progress output, then we should probably introduce a
"verbose" CLI option to go along with the "quiet" option that
suppresses all output.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 21:55   ` Darrick J. Wong
@ 2019-11-22 14:46     ` Pavel Reichl
  2019-11-22 21:07     ` Eric Sandeen
  1 sibling, 0 replies; 22+ messages in thread
From: Pavel Reichl @ 2019-11-22 14:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Thanks Darrick for the comments. It makes sense to me, the next
iteration of the patchset will address that.

On Thu, Nov 21, 2019 at 10:57 PM Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>
> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
> > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > ---
> >  mkfs/xfs_mkfs.c | 32 +++++++++++++++++++++++++-------
> >  1 file changed, 25 insertions(+), 7 deletions(-)
> >
> > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> > index 18338a61..a02d6f66 100644
> > --- a/mkfs/xfs_mkfs.c
> > +++ b/mkfs/xfs_mkfs.c
> > @@ -1242,15 +1242,33 @@ done:
> >  static void
> >  discard_blocks(dev_t dev, uint64_t nsectors)
> >  {
> > -     int fd;
> > +     int             fd;
> > +     uint64_t        offset          = 0;
> > +     /* Maximal chunk of bytes to discard is 2GB */
> > +     const uint64_t  step            = (uint64_t)2<<30;
>
> You don't need the tabs after the variable name, e.g.
>
>         /* Maximal chunk of bytes to discard is 2GB */
>         const uint64_t  step = 2ULL << 30;
>
> > +     /* Sector size is 512 bytes */
> > +     const uint64_t  count           = nsectors << 9;
>
> count = BBTOB(nsectors)?
>
> >
> > -     /*
> > -      * We intentionally ignore errors from the discard ioctl.  It is
> > -      * not necessary for the mkfs functionality but just an optimization.
> > -      */
> >       fd = libxfs_device_to_fd(dev);
> > -     if (fd > 0)
> > -             platform_discard_blocks(fd, 0, nsectors << 9);
> > +     if (fd <= 0)
> > +             return;
> > +
> > +     while (offset < count) {
> > +             uint64_t        tmp_step = step;
>
> tmp_step = min(step, count - offset); ?
>
> Otherwise seems reasonable to me, if nothing else to avoid the problem
> where you ask mkfs to discard and can't cancel it....
>
> --D
>
> > +
> > +             if ((offset + step) > count)
> > +                     tmp_step = count - offset;
> > +
> > +             /*
> > +              * We intentionally ignore errors from the discard ioctl. It is
> > +              * not necessary for the mkfs functionality but just an
> > +              * optimization. However we should stop on error.
> > +              */
> > +             if (platform_discard_blocks(fd, offset, tmp_step))
> > +                     return;
> > +
> > +             offset += tmp_step;
> > +     }
> >  }
> >
> >  static __attribute__((noreturn)) void
> > --
> > 2.23.0
> >
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 23:18   ` Dave Chinner
@ 2019-11-22 15:38     ` Darrick J. Wong
  2019-11-22 15:59       ` Pavel Reichl
  2019-11-22 16:09     ` Pavel Reichl
  2019-11-22 21:10     ` Eric Sandeen
  2 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2019-11-22 15:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Pavel Reichl, linux-xfs

On Fri, Nov 22, 2019 at 10:18:38AM +1100, Dave Chinner wrote:
> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
> > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > ---
> 
> This is mixing an explanation about why the change is being made
> and what was considered when making decisions about the change.
> 
> e.g. my first questions on looking at the patch were:
> 
> 	- why do we need to break up the discards into 2GB chunks?
> 	- why 2GB?

Yeah, I'm wondering that too.

> 	- why not use libblkid to query the maximum discard size
> 	  and use that as the step size instead?

FWIW my SATA SSDs the discard-max is 2G whereas on the NVME it's 2T.  I
guess firmwares have gotten 1000x better in the past few years, possibly
because of the hundred or so 10x programmers that they've all been hiring.

> 	- is there any performance impact from breaking up large
> 	  discards that might be optimised by the kernel into many
> 	  overlapping async operations into small, synchronous
> 	  discards?

Also:
What is the end goal that you have in mind?  Is the progress reporting
the ultimate goal?  Or is it to break up the BLKDISCARD calls so that
someone can ^C a mkfs operation and not have it just sit there
continuing to run?

--D

> i.e. the reviewer can read what the patch does, but that deosn't
> explain why the patch does this. Hence it's a good idea to explain
> the problem being solved or the feature requirements that have lead
> to the changes in the patch....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-22 15:38     ` Darrick J. Wong
@ 2019-11-22 15:59       ` Pavel Reichl
  2019-11-22 21:00         ` Dave Chinner
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Reichl @ 2019-11-22 15:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Fri, Nov 22, 2019 at 4:38 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Fri, Nov 22, 2019 at 10:18:38AM +1100, Dave Chinner wrote:
> > On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
> > > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > > ---
> >
> > This is mixing an explanation about why the change is being made
> > and what was considered when making decisions about the change.
> >
> > e.g. my first questions on looking at the patch were:
> >
> >       - why do we need to break up the discards into 2GB chunks?
> >       - why 2GB?
>
> Yeah, I'm wondering that too.

OK, thank you both for the question - simple answer is that I took
what is used in e2fsprogs as default and I expected a discussion about
proper value during review process :-)
>
> >       - why not use libblkid to query the maximum discard size
> >         and use that as the step size instead?
>
> FWIW my SATA SSDs the discard-max is 2G whereas on the NVME it's 2T.  I
> guess firmwares have gotten 1000x better in the past few years, possibly
> because of the hundred or so 10x programmers that they've all been hiring.
>
> >       - is there any performance impact from breaking up large
> >         discards that might be optimised by the kernel into many
> >         overlapping async operations into small, synchronous
> >         discards?
>
> Also:
> What is the end goal that you have in mind?  Is the progress reporting
> the ultimate goal?  Or is it to break up the BLKDISCARD calls so that
> someone can ^C a mkfs operation and not have it just sit there
> continuing to run?

The goal is mainly the progress reporting but the possibility to do ^C
is also convenient. It seems that some users are not happy about the
BLKDISCARD taking too long and at the same time not being informed
about that - so they think that the command actually hung.

>
> --D
>
> > i.e. the reviewer can read what the patch does, but that deosn't
> > explain why the patch does this. Hence it's a good idea to explain
> > the problem being solved or the feature requirements that have lead
> > to the changes in the patch....
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 23:18   ` Dave Chinner
  2019-11-22 15:38     ` Darrick J. Wong
@ 2019-11-22 16:09     ` Pavel Reichl
  2019-11-22 21:10     ` Eric Sandeen
  2 siblings, 0 replies; 22+ messages in thread
From: Pavel Reichl @ 2019-11-22 16:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Nov 22, 2019 at 12:18 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
> > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > ---
>
> This is mixing an explanation about why the change is being made
> and what was considered when making decisions about the change.

Thanks, I'll try to improve that.
>
> e.g. my first questions on looking at the patch were:
>
>         - why do we need to break up the discards into 2GB chunks?
>         - why 2GB?
>         - why not use libblkid to query the maximum discard size
>           and use that as the step size instead?

This is new for me, please let me learn more about that.


>         - is there any performance impact from breaking up large
>           discards that might be optimised by the kernel into many
>           overlapping async operations into small, synchronous
>           discards?

Honestly, I don't have an answer for that ATM - it's quite possible.
It certainly needs more investigating. On the other hand - current
lack of feedback causes user discomfort. So I'd like to know your
opinion - should the change proposed by this patch be default
behaviour (as it may be more user friendly) and should we add an
option that would 'revert' to current behaviour (that would be for
informed user).

>
> i.e. the reviewer can read what the patch does, but that deosn't
> explain why the patch does this. Hence it's a good idea to explain
> the problem being solved or the feature requirements that have lead
> to the changes in the patch....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-21 21:59   ` Darrick J. Wong
@ 2019-11-22 16:27     ` Pavel Reichl
  2019-11-22 16:31       ` Darrick J. Wong
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Reichl @ 2019-11-22 16:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Nov 21, 2019 at 10:59 PM Darrick J. Wong
<darrick.wong@oracle.com> wrote:
>
> On Thu, Nov 21, 2019 at 10:44:45PM +0100, Pavel Reichl wrote:
> > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > ---
> >  mkfs/xfs_mkfs.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> > index a02d6f66..07b8bd78 100644
> > --- a/mkfs/xfs_mkfs.c
> > +++ b/mkfs/xfs_mkfs.c
> > @@ -1248,6 +1248,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> >       const uint64_t  step            = (uint64_t)2<<30;
> >       /* Sector size is 512 bytes */
> >       const uint64_t  count           = nsectors << 9;
> > +     uint64_t        prev_done       = (uint64_t) ~0;
> >
> >       fd = libxfs_device_to_fd(dev);
> >       if (fd <= 0)
> > @@ -1255,6 +1256,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> >
> >       while (offset < count) {
> >               uint64_t        tmp_step = step;
> > +             uint64_t        done = offset * 100 / count;
> >
> >               if ((offset + step) > count)
> >                       tmp_step = count - offset;
> > @@ -1268,7 +1270,13 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> >                       return;
> >
> >               offset += tmp_step;
> > +
> > +             if (prev_done != done) {
>
> Hmm... so this prints the status message every increase percentage
> point, right?

Not at all, the 'least change' it prints is one percent but that's the
maximum granularity i.e. I tested with 10 GB file and the output was:

Discarding:  0% done
Discarding: 20% done
Discarding: 40% done
Discarding: 60% done
Discarding: 80% done
Discarding is done.

So ATM there could be up to 102 lines - please propose a different idea.


>
> > +                     prev_done = done;
> > +                     fprintf(stderr, _("Discarding: %2lu%% done\n"), done);
>
> This isn't an error, so why output to stderr?
My bad, sorry.

>
> FWIW if it's a tty you might consider ending that string with \r so the
> status messages don't scroll off the screen.  Or possibly only reporting
> status if stdout is a tty?

Do I get it right that you propose to not flow the terminal with
dozens of lines which just update the percentage but instead keep
updating the same line? If so, I do like that.

>
> --D
>
> > +             }
> >       }
> > +     fprintf(stderr, _("Discarding is done.\n"));
> >  }
> >
> >  static __attribute__((noreturn)) void
> > --
> > 2.23.0
> >
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-22 16:27     ` Pavel Reichl
@ 2019-11-22 16:31       ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-11-22 16:31 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Fri, Nov 22, 2019 at 05:27:42PM +0100, Pavel Reichl wrote:
> On Thu, Nov 21, 2019 at 10:59 PM Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 10:44:45PM +0100, Pavel Reichl wrote:
> > > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > > ---
> > >  mkfs/xfs_mkfs.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> > > index a02d6f66..07b8bd78 100644
> > > --- a/mkfs/xfs_mkfs.c
> > > +++ b/mkfs/xfs_mkfs.c
> > > @@ -1248,6 +1248,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> > >       const uint64_t  step            = (uint64_t)2<<30;
> > >       /* Sector size is 512 bytes */
> > >       const uint64_t  count           = nsectors << 9;
> > > +     uint64_t        prev_done       = (uint64_t) ~0;
> > >
> > >       fd = libxfs_device_to_fd(dev);
> > >       if (fd <= 0)
> > > @@ -1255,6 +1256,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> > >
> > >       while (offset < count) {
> > >               uint64_t        tmp_step = step;
> > > +             uint64_t        done = offset * 100 / count;
> > >
> > >               if ((offset + step) > count)
> > >                       tmp_step = count - offset;
> > > @@ -1268,7 +1270,13 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> > >                       return;
> > >
> > >               offset += tmp_step;
> > > +
> > > +             if (prev_done != done) {
> >
> > Hmm... so this prints the status message every increase percentage
> > point, right?
> 
> Not at all, the 'least change' it prints is one percent but that's the
> maximum granularity i.e. I tested with 10 GB file and the output was:
> 
> Discarding:  0% done
> Discarding: 20% done
> Discarding: 40% done
> Discarding: 60% done
> Discarding: 80% done
> Discarding is done.
> 
> So ATM there could be up to 102 lines - please propose a different idea.

if (device supports discard) {
	if (!quiet)
		printf(_("Discarding blocks, this may take some time..."));
	<discard loop>
}
<the rest of mkfs>

> 
> >
> > > +                     prev_done = done;
> > > +                     fprintf(stderr, _("Discarding: %2lu%% done\n"), done);
> >
> > This isn't an error, so why output to stderr?
> My bad, sorry.
> 
> >
> > FWIW if it's a tty you might consider ending that string with \r so the
> > status messages don't scroll off the screen.  Or possibly only reporting
> > status if stdout is a tty?
> 
> Do I get it right that you propose to not flow the terminal with
> dozens of lines which just update the percentage but instead keep
> updating the same line? If so, I do like that.

Correct.

--D

> >
> > --D
> >
> > > +             }
> > >       }
> > > +     fprintf(stderr, _("Discarding is done.\n"));
> > >  }
> > >
> > >  static __attribute__((noreturn)) void
> > > --
> > > 2.23.0
> > >
> >
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-21 23:41   ` Dave Chinner
@ 2019-11-22 16:43     ` Pavel Reichl
  2019-11-22 21:11       ` Dave Chinner
  2019-11-22 21:19       ` Eric Sandeen
  0 siblings, 2 replies; 22+ messages in thread
From: Pavel Reichl @ 2019-11-22 16:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Nov 22, 2019 at 12:42 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Nov 21, 2019 at 10:44:45PM +0100, Pavel Reichl wrote:
> > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > ---
> >  mkfs/xfs_mkfs.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> > index a02d6f66..07b8bd78 100644
> > --- a/mkfs/xfs_mkfs.c
> > +++ b/mkfs/xfs_mkfs.c
> > @@ -1248,6 +1248,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> >       const uint64_t  step            = (uint64_t)2<<30;
> >       /* Sector size is 512 bytes */
> >       const uint64_t  count           = nsectors << 9;
> > +     uint64_t        prev_done       = (uint64_t) ~0;
> >
> >       fd = libxfs_device_to_fd(dev);
> >       if (fd <= 0)
> > @@ -1255,6 +1256,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> >
> >       while (offset < count) {
> >               uint64_t        tmp_step = step;
> > +             uint64_t        done = offset * 100 / count;
>
> That will overflow on a EB-scale (2^60 bytes) filesystems, won't it?

I guess that can happen, sorry. I'll try to come out with computation
based on a floating point arithmetic. There should not be any
performance or actual precision problem.
(well actually I'll drop this line completely, no ratio will be
computed in the end)

>
> >
> >               if ((offset + step) > count)
> >                       tmp_step = count - offset;
> > @@ -1268,7 +1270,13 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> >                       return;
> >
> >               offset += tmp_step;
> > +
> > +             if (prev_done != done) {
> > +                     prev_done = done;
> > +                     fprintf(stderr, _("Discarding: %2lu%% done\n"), done);
> > +             }
> >       }
> > +     fprintf(stderr, _("Discarding is done.\n"));
>
> Hmmm - this output doesn't get suppressed when the "quiet" (-q)
> option is used. mkfs is supposed to be silent when this option is
> specified.

OK, my bad. I'll fix that.
>
> I also suspect that it breaks a few fstests, too, as a some of them
> capture and filter mkfs output. They'll need filters to drop these
> new messages.
>
> FWIW, a 100 lines of extra mkfs output is going to cause workflow
> issues. I know it will cause me problems, because I often mkfs 500TB
> filesystems tens of times a day on a discard enabled device. This
> extra output will scroll all the context of the previous test run
> I'm about to compare against off my terminal screen and so now I
> will have to scroll the terminal to look at the results of
> back-to-back runs. IOWs, I'm going to immediately want to turn this
> output off and have it stay off permanently.
>
> Hence I think that, by default, just outputting a single "Discard in
> progress" line before starting the discard would be sufficient

OK, maybe just one line "Discard in progress" is actually what users
need. The computing of % done was probably just overkill from my side.
Sorry about that.

> indication of what mkfs is currently doing. If someone wants more
> verbose progress output, then we should probably introduce a
> "verbose" CLI option to go along with the "quiet" option that
> suppresses all output.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-22 15:59       ` Pavel Reichl
@ 2019-11-22 21:00         ` Dave Chinner
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2019-11-22 21:00 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: Darrick J. Wong, linux-xfs

On Fri, Nov 22, 2019 at 04:59:21PM +0100, Pavel Reichl wrote:
> On Fri, Nov 22, 2019 at 4:38 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > Also:
> > What is the end goal that you have in mind?  Is the progress reporting
> > the ultimate goal?  Or is it to break up the BLKDISCARD calls so that
> > someone can ^C a mkfs operation and not have it just sit there
> > continuing to run?
> 
> The goal is mainly the progress reporting but the possibility to do ^C
> is also convenient. It seems that some users are not happy about the
> BLKDISCARD taking too long and at the same time not being informed
> about that - so they think that the command actually hung.

Ok, that's a good summary to put in the commit description - it
tells the reviewer exactly what you are trying to acheive, and gives
them context to evaluate it against.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 21:55   ` Darrick J. Wong
  2019-11-22 14:46     ` Pavel Reichl
@ 2019-11-22 21:07     ` Eric Sandeen
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Sandeen @ 2019-11-22 21:07 UTC (permalink / raw)
  To: Darrick J. Wong, Pavel Reichl; +Cc: linux-xfs

On 11/21/19 3:55 PM, Darrick J. Wong wrote:
> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:

concur w/ others that a reason for the change (and a reason for the
size selection) would be appropriate to have in the changelog.

>> Signed-off-by: Pavel Reichl <preichl@redhat.com>
>> ---
>>  mkfs/xfs_mkfs.c | 32 +++++++++++++++++++++++++-------
>>  1 file changed, 25 insertions(+), 7 deletions(-)
>>
>> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
>> index 18338a61..a02d6f66 100644
>> --- a/mkfs/xfs_mkfs.c
>> +++ b/mkfs/xfs_mkfs.c
>> @@ -1242,15 +1242,33 @@ done:
>>  static void
>>  discard_blocks(dev_t dev, uint64_t nsectors)
>>  {
>> -	int fd;
>> +	int		fd;
>> +	uint64_t	offset		= 0;
>> +	/* Maximal chunk of bytes to discard is 2GB */
>> +	const uint64_t	step		= (uint64_t)2<<30;
> 
> You don't need the tabs after the variable name, e.g.
> 
> 	/* Maximal chunk of bytes to discard is 2GB */
> 	const uint64_t	step = 2ULL << 30;
> 
>> +	/* Sector size is 512 bytes */
>> +	const uint64_t	count		= nsectors << 9;
> 
> count = BBTOB(nsectors)?

FYI this is a macro that xfs developers have learned about. ;)  It stands for
"Basic Block TO Byte" where "basic block" pretty much means "512-byte sector."

-Eric


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 23:18   ` Dave Chinner
  2019-11-22 15:38     ` Darrick J. Wong
  2019-11-22 16:09     ` Pavel Reichl
@ 2019-11-22 21:10     ` Eric Sandeen
  2019-11-22 21:30       ` Eric Sandeen
  2 siblings, 1 reply; 22+ messages in thread
From: Eric Sandeen @ 2019-11-22 21:10 UTC (permalink / raw)
  To: Dave Chinner, Pavel Reichl; +Cc: linux-xfs

On 11/21/19 5:18 PM, Dave Chinner wrote:
> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
>> Signed-off-by: Pavel Reichl <preichl@redhat.com>
>> ---
> 
> This is mixing an explanation about why the change is being made
> and what was considered when making decisions about the change.
> 
> e.g. my first questions on looking at the patch were:
> 
> 	- why do we need to break up the discards into 2GB chunks?
> 	- why 2GB?
> 	- why not use libblkid to query the maximum discard size
> 	  and use that as the step size instead?

Just wondering, can we trust that to be reasonably performant?
(the whole motivation here is for hardware that takes inordinately
long to do discard, I wonder if we can count on such hardware to
properly fill out this info....)

> 	- is there any performance impact from breaking up large
> 	  discards that might be optimised by the kernel into many
> 	  overlapping async operations into small, synchronous
> 	  discards?

FWIW, I had simply suggested to Pavel that he follow e2fsprogs' lead
here - afaik they haven't had issues/complaints with their 2g iteration,
and at one point Lukas did some investigation into the size selection...

Thanks,
-Eric
 
> i.e. the reviewer can read what the patch does, but that deosn't
> explain why the patch does this. Hence it's a good idea to explain
> the problem being solved or the feature requirements that have lead
> to the changes in the patch....
> 
> Cheers,
> 
> Dave.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-22 16:43     ` Pavel Reichl
@ 2019-11-22 21:11       ` Dave Chinner
  2019-11-22 21:19       ` Eric Sandeen
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2019-11-22 21:11 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Fri, Nov 22, 2019 at 05:43:55PM +0100, Pavel Reichl wrote:
> On Fri, Nov 22, 2019 at 12:42 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 10:44:45PM +0100, Pavel Reichl wrote:
> > > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > > ---
> > >  mkfs/xfs_mkfs.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> > > index a02d6f66..07b8bd78 100644
> > > --- a/mkfs/xfs_mkfs.c
> > > +++ b/mkfs/xfs_mkfs.c
> > > @@ -1248,6 +1248,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> > >       const uint64_t  step            = (uint64_t)2<<30;
> > >       /* Sector size is 512 bytes */
> > >       const uint64_t  count           = nsectors << 9;
> > > +     uint64_t        prev_done       = (uint64_t) ~0;
> > >
> > >       fd = libxfs_device_to_fd(dev);
> > >       if (fd <= 0)
> > > @@ -1255,6 +1256,7 @@ discard_blocks(dev_t dev, uint64_t nsectors)
> > >
> > >       while (offset < count) {
> > >               uint64_t        tmp_step = step;
> > > +             uint64_t        done = offset * 100 / count;
> >
> > That will overflow on a EB-scale (2^60 bytes) filesystems, won't it?
> 
> I guess that can happen, sorry. I'll try to come out with computation
> based on a floating point arithmetic. There should not be any
> performance or actual precision problem.
> (well actually I'll drop this line completely, no ratio will be
> computed in the end)

No need to apologise for not realising huge filesystems need to
work. It takes time to get used to having to consider 64 bit
overflows everywhere... :)

Maybe the easiest way to do this sort of thing is to calculate
reporting interval prior to the loop, and every time it is exceeded
issue a report and then reset the report counter to zero. No fancy
math required there. If we want 1% increments:

	report_interval = count / 100;
	
	while (offset < count) {
	....
		offset += tmp_step;
		report_offset += tmp_step;

		if (report_offset > report_interval) {
			report_offset = 0;
			/* issue report */
		}
	}

And this is easy to adjust the number of reports issued (e.g. every
5% or 10% is just changing the report_interval division constant.

Another way of doing it is deciding on the -time- between reports.
e.g. issue a progress report every 60s. Then you can just report
the percentage done  based on offset and count without needing
intermediate accounting.

> > I also suspect that it breaks a few fstests, too, as a some of them
> > capture and filter mkfs output. They'll need filters to drop these
> > new messages.
> >
> > FWIW, a 100 lines of extra mkfs output is going to cause workflow
> > issues. I know it will cause me problems, because I often mkfs 500TB
> > filesystems tens of times a day on a discard enabled device. This
> > extra output will scroll all the context of the previous test run
> > I'm about to compare against off my terminal screen and so now I
> > will have to scroll the terminal to look at the results of
> > back-to-back runs. IOWs, I'm going to immediately want to turn this
> > output off and have it stay off permanently.
> >
> > Hence I think that, by default, just outputting a single "Discard in
> > progress" line before starting the discard would be sufficient
> 
> OK, maybe just one line "Discard in progress" is actually what users
> need. The computing of % done was probably just overkill from my side.
> Sorry about that.

Again, no need to apologise because there are different opinions on
how something should be done. If you didn't put progress reporting
in, I'm sure someone would have suggested it and we'd be having the
same discussion anyway. :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mkfs: Show progress during block discard
  2019-11-22 16:43     ` Pavel Reichl
  2019-11-22 21:11       ` Dave Chinner
@ 2019-11-22 21:19       ` Eric Sandeen
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Sandeen @ 2019-11-22 21:19 UTC (permalink / raw)
  To: Pavel Reichl, Dave Chinner; +Cc: linux-xfs

On 11/22/19 10:43 AM, Pavel Reichl wrote:
> On Fri, Nov 22, 2019 at 12:42 AM Dave Chinner <david@fromorbit.com> wrote:
>>
>> On Thu, Nov 21, 2019 at 10:44:45PM +0100, Pavel Reichl wrote:
...

>>
>> I also suspect that it breaks a few fstests, too, as a some of them
>> capture and filter mkfs output. They'll need filters to drop these
>> new messages.
>>
>> FWIW, a 100 lines of extra mkfs output is going to cause workflow
>> issues. I know it will cause me problems, because I often mkfs 500TB
>> filesystems tens of times a day on a discard enabled device. This
>> extra output will scroll all the context of the previous test run
>> I'm about to compare against off my terminal screen and so now I
>> will have to scroll the terminal to look at the results of
>> back-to-back runs. IOWs, I'm going to immediately want to turn this
>> output off and have it stay off permanently.
>>
>> Hence I think that, by default, just outputting a single "Discard in
>> progress" line before starting the discard would be sufficient

e2fsprogs simply does:

Discarding device blocks: done                            

("done" isn't printed until it's ... done)

so that might be a good convention to follow?  Though I'd probably do

printf("Discarding blocks... ");
....
printf("Done.\n");

because the ellipses tend to indicate waiting.  :)

Even the one line might require filtering-out in xfstests, but luckily we have
standard filters and it should be trivial to add.

> OK, maybe just one line "Discard in progress" is actually what users
> need. The computing of % done was probably just overkill from my side.
> Sorry about that.

No worries, that's why we discuss stuff.  :)
Thanks for taking this on,

-Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-22 21:10     ` Eric Sandeen
@ 2019-11-22 21:30       ` Eric Sandeen
  2019-11-26 19:40         ` Eric Sandeen
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Sandeen @ 2019-11-22 21:30 UTC (permalink / raw)
  To: Dave Chinner, Pavel Reichl; +Cc: linux-xfs

On 11/22/19 3:10 PM, Eric Sandeen wrote:
> On 11/21/19 5:18 PM, Dave Chinner wrote:
>> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
>>> Signed-off-by: Pavel Reichl <preichl@redhat.com>
>>> ---
>>
>> This is mixing an explanation about why the change is being made
>> and what was considered when making decisions about the change.
>>
>> e.g. my first questions on looking at the patch were:
>>
>> 	- why do we need to break up the discards into 2GB chunks?
>> 	- why 2GB?
>> 	- why not use libblkid to query the maximum discard size
>> 	  and use that as the step size instead?
> 
> Just wondering, can we trust that to be reasonably performant?
> (the whole motivation here is for hardware that takes inordinately
> long to do discard, I wonder if we can count on such hardware to
> properly fill out this info....)

Looking at the docs in kernel/Documentation/block/queue-sysfs.rst:

discard_max_hw_bytes (RO)
-------------------------
Devices that support discard functionality may have internal limits on
the number of bytes that can be trimmed or unmapped in a single operation.
The discard_max_bytes parameter is set by the device driver to the maximum
number of bytes that can be discarded in a single operation. Discard
requests issued to the device must not exceed this limit. A discard_max_bytes
value of 0 means that the device does not support discard functionality.

discard_max_bytes (RW)
----------------------
While discard_max_hw_bytes is the hardware limit for the device, this
setting is the software limit. Some devices exhibit large latencies when
large discards are issued, setting this value lower will make Linux issue
smaller discards and potentially help reduce latencies induced by large
discard operations.

it seems like a strong suggestion that the discard_max_hw_bytes value may
still be problematic, and discard_max_bytes can be hand-tuned to something
smaller if it's a problem.  To me that indicates that discard_max_hw_bytes
probably can't be trusted to be performant, and presumably discard_max_bytes
won't be either in that case unless it's been hand-tuned by the admin?

-Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-22 21:30       ` Eric Sandeen
@ 2019-11-26 19:40         ` Eric Sandeen
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Sandeen @ 2019-11-26 19:40 UTC (permalink / raw)
  To: Dave Chinner, Pavel Reichl; +Cc: linux-xfs, Lukáš Czerner

On 11/22/19 3:30 PM, Eric Sandeen wrote:
> On 11/22/19 3:10 PM, Eric Sandeen wrote:
>> On 11/21/19 5:18 PM, Dave Chinner wrote:
>>> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote:
>>>> Signed-off-by: Pavel Reichl <preichl@redhat.com>
>>>> ---
>>>
>>> This is mixing an explanation about why the change is being made
>>> and what was considered when making decisions about the change.
>>>
>>> e.g. my first questions on looking at the patch were:
>>>
>>> 	- why do we need to break up the discards into 2GB chunks?
>>> 	- why 2GB?
>>> 	- why not use libblkid to query the maximum discard size
>>> 	  and use that as the step size instead?
>>
>> Just wondering, can we trust that to be reasonably performant?
>> (the whole motivation here is for hardware that takes inordinately
>> long to do discard, I wonder if we can count on such hardware to
>> properly fill out this info....)
> 
> Looking at the docs in kernel/Documentation/block/queue-sysfs.rst:
> 
> discard_max_hw_bytes (RO)
> -------------------------
> Devices that support discard functionality may have internal limits on
> the number of bytes that can be trimmed or unmapped in a single operation.
> The discard_max_bytes parameter is set by the device driver to the maximum
> number of bytes that can be discarded in a single operation. Discard
> requests issued to the device must not exceed this limit. A discard_max_bytes
> value of 0 means that the device does not support discard functionality.
> 
> discard_max_bytes (RW)
> ----------------------
> While discard_max_hw_bytes is the hardware limit for the device, this
> setting is the software limit. Some devices exhibit large latencies when
> large discards are issued, setting this value lower will make Linux issue
> smaller discards and potentially help reduce latencies induced by large
> discard operations.
> 
> it seems like a strong suggestion that the discard_max_hw_bytes value may
> still be problematic, and discard_max_bytes can be hand-tuned to something
> smaller if it's a problem.  To me that indicates that discard_max_hw_bytes
> probably can't be trusted to be performant, and presumably discard_max_bytes
> won't be either in that case unless it's been hand-tuned by the admin?

Lukas, Jeff Moyer reminded me that you did a lot of investigation into this
behavior a while back.  Can you shed light on this, particularly how you
chose 2G as the discard granularity for mke2fs?

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB
  2019-11-21 21:44 ` [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB Pavel Reichl
  2019-11-21 21:55   ` Darrick J. Wong
  2019-11-21 23:18   ` Dave Chinner
@ 2019-11-26 20:53   ` Eric Sandeen
  2 siblings, 0 replies; 22+ messages in thread
From: Eric Sandeen @ 2019-11-26 20:53 UTC (permalink / raw)
  To: Pavel Reichl, linux-xfs

On 11/21/19 3:44 PM, Pavel Reichl wrote:
> Signed-off-by: Pavel Reichl <preichl@redhat.com>
> ---
>  mkfs/xfs_mkfs.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 18338a61..a02d6f66 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -1242,15 +1242,33 @@ done:
>  static void
>  discard_blocks(dev_t dev, uint64_t nsectors)
>  {
> -	int fd;
> +	int		fd;
> +	uint64_t	offset		= 0;
> +	/* Maximal chunk of bytes to discard is 2GB */
> +	const uint64_t	step		= (uint64_t)2<<30;

Regarding the discard step size, I would like to just see us keep 2G -
I see problems with the alternate suggestions proposed in the
threads on this patch review:

1) query block device for maximal discard size
-> block device folks I've talked to (Jeff Moyer in particular) stated
   that many devices are known for putting a huge value in here, and then
   taking far, far too long to process that size request.  In short,
   maximum size != fast.

2) discard one AG size at a time
-> this can be up to 1T, which puts us right back at our problem of large,
   slow discards.  And in particular, AG size has no relation at all to a
   device's discard behavior.  (further complicating this, we don't have
   this geometry available anywhere in the current chain of calls to the
   discard ioctl.)

Lukas did an investigation of discard behaviors (though it was some time
ago https://sourceforge.net/projects/test-discard/) and arrived at 2G as
a reasonable size after testing many different devices - I've not seen any
complaints from mke2fs users about problems doing discards in 2G chunks.

So I think just picking a fixed 2G size is the best plan for now.

(one nitpick, I'd fix the comment above to not say "Maximal" because that
sounds like some hard limit imposed by something other than the code; I'd
just say "Discard the device 2G at a time" or something like that.)

A comment above the loop explaining in more detail that we iterate in
step sizes so that the utility can be interrupted would probably be
helpful.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-11-26 20:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21 21:44 [PATCH 0/2] mkfs: inform during block discarding Pavel Reichl
2019-11-21 21:44 ` [PATCH 1/2] mkfs: Break block discard into chunks of 2 GB Pavel Reichl
2019-11-21 21:55   ` Darrick J. Wong
2019-11-22 14:46     ` Pavel Reichl
2019-11-22 21:07     ` Eric Sandeen
2019-11-21 23:18   ` Dave Chinner
2019-11-22 15:38     ` Darrick J. Wong
2019-11-22 15:59       ` Pavel Reichl
2019-11-22 21:00         ` Dave Chinner
2019-11-22 16:09     ` Pavel Reichl
2019-11-22 21:10     ` Eric Sandeen
2019-11-22 21:30       ` Eric Sandeen
2019-11-26 19:40         ` Eric Sandeen
2019-11-26 20:53   ` Eric Sandeen
2019-11-21 21:44 ` [PATCH 2/2] mkfs: Show progress during block discard Pavel Reichl
2019-11-21 21:59   ` Darrick J. Wong
2019-11-22 16:27     ` Pavel Reichl
2019-11-22 16:31       ` Darrick J. Wong
2019-11-21 23:41   ` Dave Chinner
2019-11-22 16:43     ` Pavel Reichl
2019-11-22 21:11       ` Dave Chinner
2019-11-22 21:19       ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).