linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] btrfs-progs: scrub: add start/end position for scrub
@ 2019-12-02  3:44 Zygo Blaxell
  2019-12-02  3:44 ` [PATCH] " Zygo Blaxell
  0 siblings, 1 reply; 3+ messages in thread
From: Zygo Blaxell @ 2019-12-02  3:44 UTC (permalink / raw)
  To: linux-btrfs

This patch has some problems that will be a lot of work to fix, and
before doing any of that I thought I would check to see if anyone else
thinks the idea is sane.

This patch just adds start (-s) and end (-e) position arguments to 'btrfs
scrub start', to enable focusing a scrub on specific areas of a device.
The positions are offsets from the start of the device.

The idea is that if you have a disk with a lot of errors, you do a
loop of:

	- start a scrub at the beginning of the disk
	- get some read/uncorrectable errors in dmesg
	- cancel scrub
	- fix the errors (delete/replace files)
	- restart scrub at just before the offset of the first error
	- repeat from step 2

The last steps use the '-s' option to skip over parts of the disk that
have already been scrubbed.  Each pass starts reading just before the
first detected error in the previous pass to confirm that all references
to the offending data blocks have been removed from the filesystem.

Without these options, the process looks like this:

	- start a scrub at the beginning of the disk
	- get a random sample of read/uncorrectable errors in dmesg
	- wait for scrub to end
	- fix the errors (delete/replace files)
	- repeat from step 1

The current approach need a full scrub to be repeated many times, because
only a small percentage of a large number of errors will be sampled on
each pass due to dmesg ratelimiting.

It is possible to cancel the scrub, edit /var/lib/btrfs/scrub.status.*,
change the "last_physical" field to the desired start position, and then
resume the scrub to achieve a similar effect to this patch, but that's
somewhat ugly.

TODO:

This patch does nothing to correct the "Total bytes to scrub" or
"ETA" fields in various outputs, which are very wrong when the new
-s and -e options are used.  Fixing that will require joining the
device tree with block groups to estimate how many bytes will be
scrubbed.  Alternatively, we could just disable the ETA/TBS fields
in the status output when -s or -e are used.





^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] btrfs-progs: scrub: add start/end position for scrub
  2019-12-02  3:44 [RFC PATCH 0/1] btrfs-progs: scrub: add start/end position for scrub Zygo Blaxell
@ 2019-12-02  3:44 ` Zygo Blaxell
  2019-12-04 22:11   ` Vladimir Panteleev
  0 siblings, 1 reply; 3+ messages in thread
From: Zygo Blaxell @ 2019-12-02  3:44 UTC (permalink / raw)
  To: linux-btrfs

Allow user to specify start (-s) and end (-e) position directly during
btrfs scrub start, by giving device offsets on the command line.
This allows scrubs to be targeted toward specific areas of disk.

These options may be used with either device names or mounted filesystem
paths, though it is probably more useful with the former.

The intended use case is to verify that data areas identified in previous
scrubs as being unreadable or containing uncorrectable errors have since
been remapped or removed, without having to rescan entire disks.

Note that some of the printed statistics (ETA, totals) will be
significantly inaccurate if these options are used.

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
---
 Documentation/btrfs-scrub.asciidoc |  6 +++++-
 cmds/scrub.c                       | 18 ++++++++++++++----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/btrfs-scrub.asciidoc b/Documentation/btrfs-scrub.asciidoc
index 03f7f008..69ac96ff 100644
--- a/Documentation/btrfs-scrub.asciidoc
+++ b/Documentation/btrfs-scrub.asciidoc
@@ -51,7 +51,7 @@ Does not start a new scrub if the last scrub finished successfully.
 +
 see *scrub start*.
 
-*start* [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata>] <path>|<device>::
+*start* [-BdqrRf] [-c <ioprio_class> -n <ioprio_classdata> -s <start_pos> -e <end_pos>] <path>|<device>::
 Start a scrub on all devices of the filesystem identified by 'path' or on
 a single 'device'. If a scrub is already running, the new one fails.
 +
@@ -77,6 +77,10 @@ raw print mode, print full data instead of summary
 set IO priority class (see `ionice`(1) manpage)
 -n <ioprio_classdata>::::
 set IO priority classdata (see `ionice`(1) manpage)
+-s <start_pos>::::
+set start position by logical address (btrfs extent bytenr, default 0)
+-e <end_pos>::::
+set end position by logical address (btrfs extent bytenr, default end of filesystem)
 -f::::
 force starting new scrub even if a scrub is already running,
 this can useful when scrub status file is damaged and reports a running
diff --git a/cmds/scrub.c b/cmds/scrub.c
index 9fe59822..e60505e0 100644
--- a/cmds/scrub.c
+++ b/cmds/scrub.c
@@ -1172,8 +1172,10 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 	DIR *dirstream = NULL;
 	int force = 0;
 	int nothing_to_resume = 0;
+	u64 start_pos = 0;
+	u64 end_pos = -1ULL;
 
-	while ((c = getopt(argc, argv, "BdqrRc:n:f")) != -1) {
+	while ((c = getopt(argc, argv, "BdqrRc:n:fs:e:")) != -1) {
 		switch (c) {
 		case 'B':
 			do_background = 0;
@@ -1198,6 +1200,12 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 		case 'n':
 			ioprio_classdata = (int)strtol(optarg, NULL, 10);
 			break;
+		case 's':
+			start_pos = strtoull(optarg, NULL, 0);
+			break;
+		case 'e':
+			end_pos = strtoull(optarg, NULL, 0);
+			break;
 		case 'f':
 			force = 1;
 			break;
@@ -1319,11 +1327,11 @@ static int scrub_start(const struct cmd_struct *cmd, int argc, char **argv,
 			continue;
 		} else {
 			++n_start;
-			sp[i].scrub_args.start = 0ll;
+			sp[i].scrub_args.start = start_pos;
 			sp[i].resumed = NULL;
 		}
 		sp[i].skip = 0;
-		sp[i].scrub_args.end = (u64)-1ll;
+		sp[i].scrub_args.end = end_pos;
 		sp[i].scrub_args.flags = readonly ? BTRFS_SCRUB_READONLY : 0;
 		sp[i].ioprio_class = ioprio_class;
 		sp[i].ioprio_classdata = ioprio_classdata;
@@ -1599,7 +1607,7 @@ out:
 }
 
 static const char * const cmd_scrub_start_usage[] = {
-	"btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] <path>|<device>",
+	"btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata -s start_pos -e end_pos] <path>|<device>",
 	"Start a new scrub. If a scrub is already running, the new one fails.",
 	"",
 	"-B     do not background",
@@ -1609,6 +1617,8 @@ static const char * const cmd_scrub_start_usage[] = {
 	"-R     raw print mode, print full data instead of summary",
 	"-c     set ioprio class (see ionice(1) manpage)",
 	"-n     set ioprio classdata (see ionice(1) manpage)",
+	"-s     start scrub at position (default 0)",
+	"-e     end scrub at position (default end of device)",
 	"-f     force starting new scrub even if a scrub is already running",
 	"       this is useful when scrub stats record file is damaged",
 	NULL
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] btrfs-progs: scrub: add start/end position for scrub
  2019-12-02  3:44 ` [PATCH] " Zygo Blaxell
@ 2019-12-04 22:11   ` Vladimir Panteleev
  0 siblings, 0 replies; 3+ messages in thread
From: Vladimir Panteleev @ 2019-12-04 22:11 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Btrfs BTRFS

Thank you for this! I've put together a script which uses the added
switches to rescan observed errors in dmesg:

https://gist.github.com/CyberShadow/648a040103fb08738783b6435da376fe

Though, the approach described in the cover letter is probably a
better idea than what this script does.

On Mon, 2 Dec 2019 at 03:47, Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
> +-s <start_pos>::::
> +set start position by logical address (btrfs extent bytenr, default 0)
> +-e <end_pos>::::
> +set end position by logical address (btrfs extent bytenr, default end of filesystem)

As mentioned on IRC, I found it confusing that these use the term
"logical addresses" to describe what the dmesg errors refer to as
"physical" addresses.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-12-04 22:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-02  3:44 [RFC PATCH 0/1] btrfs-progs: scrub: add start/end position for scrub Zygo Blaxell
2019-12-02  3:44 ` [PATCH] " Zygo Blaxell
2019-12-04 22:11   ` Vladimir Panteleev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).