All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] btrfs-progs: scrub: add start/end position for scrub
@ 2019-12-05  3:15 Zygo Blaxell
  2019-12-05  3:15 ` [PATCH " Zygo Blaxell
  2019-12-05 10:10 ` [RFC PATCH " Graham Cobb
  0 siblings, 2 replies; 3+ messages in thread
From: Zygo Blaxell @ 2019-12-05  3:15 UTC (permalink / raw)
  To: linux-btrfs

v2 changes:

So far this patch has had one user, who reported a bug:  the manual
page for btrfs scrub said the start/end position arguments were extent
bytenr aka logical addresses, when they are in fact device offset aka
physical addresses.

Removed the sanity question at the top of the cover text as this patch
has now seen real use.  Made some minor changes to the example procedure.

Summary:

This patch adds start (-s) and end (-e) position arguments to 'btrfs
scrub start', to enable focusing a scrub on specific areas of a device.
The positions are offsets from the start of the device.

The idea is that if you have a disk with a lot of errors, you do a
loop of:

        - start a scrub at the beginning of the disk
        - get some read/uncorrectable errors in dmesg
        - cancel scrub, or use -e to scrub in 1G increments
        - fix the errors (delete/replace files)
        - restart scrub at just before the offset of the first error
        - repeat from step 2

The last steps use the '-s' option to skip over parts of the disk that
have already been scrubbed.  Each pass starts reading just before the
first detected error in the previous pass to confirm that all references
to the offending data blocks have been removed from the filesystem.

Without these options, the process looks like this:

        - start a scrub at the beginning of the disk
        - get a random sample of read/uncorrectable errors in dmesg
        - wait for scrub to end
        - fix the errors (delete/replace files)
        - repeat from step 1

The current approach need a full scrub to be repeated many times, because
only a small percentage of a large number of errors will be sampled on
each pass due to dmesg ratelimiting.

It is possible to cancel the scrub, edit /var/lib/btrfs/scrub.status.*,
change the "last_physical" field to the desired start position, and then
resume the scrub to achieve a similar effect to this patch, but that's
somewhat ugly.

TODO:

This patch does nothing to correct the "Total bytes to scrub" or
"ETA" fields in various outputs, which are very wrong when the new
-s and -e options are used.  Fixing that will require joining the
device tree with block groups to estimate how many bytes will be
scrubbed.  Alternatively, we could just disable the ETA/TBS fields
in the status output when -s or -e are used.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2] btrfs-progs: scrub: add start/end position for scrub
  2019-12-05  3:15 [RFC PATCH v2] btrfs-progs: scrub: add start/end position for scrub Zygo Blaxell
@ 2019-12-05  3:15 ` Zygo Blaxell
  2019-12-05 10:10 ` [RFC PATCH " Graham Cobb
  1 sibling, 0 replies; 3+ messages in thread
From: Zygo Blaxell @ 2019-12-05  3:15 UTC (permalink / raw)
  To: linux-btrfs

Allow user to specify start (-s) and end (-e) position directly during
btrfs scrub start, by giving device offsets on the command line.
This allows scrubs to be targeted toward specific areas of disk.

These options may be used with either device names or mounted filesystem
paths, though it is probably more useful with the former.

The intended use case is to verify that data areas identified in previous
scrubs as being unreadable or containing uncorrectable errors have since
been remapped or removed, without having to rescan entire disks.

Note that some of the printed statistics (ETA, totals) will be
significantly inaccurate if these options are used.

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
---
 Documentation/btrfs-scrub.asciidoc |  6 +++++-
 cmds/scrub.c                       | 18 ++++++++++++++----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/btrfs-scrub.asciidoc b/Documentation/btrfs-scrub.asciidoc
index 03f7f008..467345d6 100644
--- a/Documentation/btrfs-scrub.asciidoc
+++ b/Documentation/btrfs-scrub.asciidoc
@@ -51,7 +51,7 @@ Does not start a new scrub if the last scrub finished successfully.
 +
 see *scrub start*.
 
-*start* [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata>] <path>|<device>::
+*start* [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata> -s <start_pos> -e <end_pos>] <path>|<device>::
 Start a scrub on all devices of the filesystem identified by 'path' or on
 a single 'device'. If a scrub is already running, the new one fails.
 +
@@ -77,6 +77,10 @@ raw print mode, print full data instead of summary
 set IO priority class (see `ionice`(1) manpage)
 -n <ioprio_classdata>::::
 set IO priority classdata (see `ionice`(1) manpage)
+-s <start_pos>::::
+set start position, device physical address (default 0)
+-e <end_pos>::::
+set end position, device physical address (default end of disk)
 -f::::
 force starting new scrub even if a scrub is already running,
 this can useful when scrub status file is damaged and reports a running
diff --git a/cmds/scrub.c b/cmds/scrub.c
index 9fe59822..e60505e0 100644
--- a/cmds/scrub.c
+++ b/cmds/scrub.c
@@ -1172,8 +1172,10 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 	DIR *dirstream = NULL;
 	int force = 0;
 	int nothing_to_resume = 0;
+	u64 start_pos = 0;
+	u64 end_pos = -1ULL;
 
-	while ((c = getopt(argc, argv, "BdqrRc:n:f")) != -1) {
+	while ((c = getopt(argc, argv, "BdqrRc:n:fs:e:")) != -1) {
 		switch (c) {
 		case 'B':
 			do_background = 0;
@@ -1198,6 +1200,12 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 		case 'n':
 			ioprio_classdata = (int)strtol(optarg, NULL, 10);
 			break;
+		case 's':
+			start_pos = strtoull(optarg, NULL, 0);
+			break;
+		case 'e':
+			end_pos = strtoull(optarg, NULL, 0);
+			break;
 		case 'f':
 			force = 1;
 			break;
@@ -1319,11 +1327,11 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 			continue;
 		} else {
 			++n_start;
-			sp[i].scrub_args.start = 0ll;
+			sp[i].scrub_args.start = start_pos;
 			sp[i].resumed = NULL;
 		}
 		sp[i].skip = 0;
-		sp[i].scrub_args.end = (u64)-1ll;
+		sp[i].scrub_args.end = end_pos;
 		sp[i].scrub_args.flags = readonly ? BTRFS_SCRUB_READONLY : 0;
 		sp[i].ioprio_class = ioprio_class;
 		sp[i].ioprio_classdata = ioprio_classdata;
@@ -1599,7 +1607,7 @@ out:
 }
 
 static const char * const cmd_scrub_start_usage[] = {
-	"btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] <path>|<device>",
+	"btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata -s start_pos -e end_pos] <path>|<device>",
 	"Start a new scrub. If a scrub is already running, the new one fails.",
 	"",
 	"-B     do not background",
@@ -1609,6 +1617,8 @@ static const char * const cmd_scrub_start_usage[] = {
 	"-R     raw print mode, print full data instead of summary",
 	"-c     set ioprio class (see ionice(1) manpage)",
 	"-n     set ioprio classdata (see ionice(1) manpage)",
+	"-s     start scrub at position (default 0)",
+	"-e     end scrub at position (default end of device)",
 	"-f     force starting new scrub even if a scrub is already running",
 	"       this is useful when scrub stats record file is damaged",
 	NULL
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH v2] btrfs-progs: scrub: add start/end position for scrub
  2019-12-05  3:15 [RFC PATCH v2] btrfs-progs: scrub: add start/end position for scrub Zygo Blaxell
  2019-12-05  3:15 ` [PATCH " Zygo Blaxell
@ 2019-12-05 10:10 ` Graham Cobb
  1 sibling, 0 replies; 3+ messages in thread
From: Graham Cobb @ 2019-12-05 10:10 UTC (permalink / raw)
  To: Zygo Blaxell, linux-btrfs

On 05/12/2019 03:15, Zygo Blaxell wrote:
> v2 changes:
> 
> So far this patch has had one user, who reported a bug:  the manual
> page for btrfs scrub said the start/end position arguments were extent
> bytenr aka logical addresses, when they are in fact device offset aka
> physical addresses.

I am wondering to what extent this feature depends on undocumented (and
possibly not even valid) behaviour of the scrub kernel operation.

I think that the current code only depends on the fact that the scrub
operation guarantees that "if the 'last physical` value from the
previous call is provided as the start point to the next call, the scrub
will continue where it left off and will eventually cover the whole
filesystem (assuming there have no been changes)".

This code depends on a much stronger statement which is that the scrub
proceeds in order and guarantees to scrub exactly the physical blocks
between start and end, in order.

The reason for raising this is that I believe I have seen cases (when
using cancelled and resumed scrubs) where 'last physical' has gone down.

I had planned to test this last statement to see if I could reproduce
the behaviour but I haven't had time.

Graham

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-12-05 10:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05  3:15 [RFC PATCH v2] btrfs-progs: scrub: add start/end position for scrub Zygo Blaxell
2019-12-05  3:15 ` [PATCH " Zygo Blaxell
2019-12-05 10:10 ` [RFC PATCH " Graham Cobb

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.